Data Preparation

A clean, suitably-structured, and well-documented data set is critical for efficient and accurate statistical analysis. Most commonly, data is imported into statistical analysis programs as a comma delimited text file. For easy and accurate importation of data into statistical software, it is essential that the data adhere to a regular structure with consistent entries.

While it is not required, using REDCap (Research Electronic Data Capture) can greatly simplify data collection and minimize costly and time-consuming data clean-up activities. REDCap is a secure web-based application for building and managing online databases for research and is supported by the CTSC Biomedical Informatics team.

Regardless of the software used to record data, adhering to the following guidelines will facilitate importation of the data into statistical software. In addition, every data set must include a data dictionary that describes each variable and identifies acceptable values. Additional information on data dictionaries is available on the UC Davis REDCap website.

Additional tips for data management are available in the PDF document, “Guidance for Database Developers for Efficient Import to Statistical Software.” 

Statistical Analysis Software

  • Interactive Statistical Calculation Pages
    Comprehensive list of sites for many statistical analyses, including power and sample size calculations. The website has a page listing websites for interactive analyses (“Interactive Stats”) and for free software (“Free Software”) packages that can be downloaded and run on your local computer. This website also has links to many technical resources on statistics, including general introductory material.
  • Real Statistics Using Excel
    Detailed information about performing common statistical tests and procedures in Excel including t-tests, ANOVA, repeated measure ANOVA, Correlation, Simple and Multiple Linear Regression, calculating confidence intervals and other descriptive statistics. Free downloadable resource pack and example workbooks.
    Minitab is a commercial, easy to use statistical package with a drop-down menu interface. You can download a 30-day trial version for free.

Note: SPSS, SAS, and JMP can be obtained at a reasonable cost through UC Davis Information and Educational Technology.

Power/Sample Size Calculations


  • ANOVA 
  • Correlation 
  • Logistic Regression 
  • Paired Sample T-test, Method 1 
  • Paired Sample T-test, Method 2 
  • Two Sample Proportion Test 
  • Two Sample Survival Test 
  • Two Sample T-test 

Sample Size Calculators

Educational Resources

  • Introduction to Clinical Research for Residents
    This online course consists of readings compiled by the UC Davis and CTSC Biostatisticians.
  • The Little Handbook of Statistical Practice
    Nice, relevant overviews of common statistical analyses are presented. Gives applied examples and interesting discussion of various topics relevant to applied data analysis.
  • UCLA’s Institute for Digital Research and Education
    A wealth of information on conducting statistical analyses using SAS, R, SPSS, Stata, and Mplus is available from this site. The content includes examples of different types of analyses by explaining a motivating data set, providing code to analyze the data in one of the statistical packages, and reviewing and interpreting the output.
  • Clinical Research Case Studies
    CTSPedia entries on selected clinical research topics, including step–by–step tutorials on common sample size calculations; handling outliers; dealing with selection bias in observational studies; and others.
  • Biostatistics for Non–statisticians
    Online video series from the University of Colorado CTSI.
  • Ohio State University CCTS
    Papers on topics in study design and planning, power and sample size, statistical analysis.
  • Columbia University Irving Institute
    List of references for study design and biostatistics for clinical trials.
  • Medical College of Wisconsin
    YouTube seminar series with seminars on longitudinal analysis, survival analysis, propensity scores, Bayesian statistics, linear regression, sample size calculations, ANOVA, multiple comparisons, logistic regression among others.