
SPECIAL ARTICLE 

Year : 2008  Volume
: 52
 Issue : 6  Page : 788 


Basic Statistical Concepts for Sample Size Estimation
Vithal K Dhulkhed^{1}, MG Dhorigol^{2}, Rajesh Mane^{3}, Vandana Gogate^{3}, Pavan Dhulkhed^{4}
^{1} Professor and Head, KLE University, J.N. Medical College and KLES Dr. Prabhakar Kore Hospital and Medical Research Centre, Belgaum  590010, Karnataka State, India ^{2} Professor, KLE University, J.N. Medical College and KLES Dr. Prabhakar Kore Hospital and Medical Research Centre, Belgaum  590010, Karnataka State, India ^{3} Associate Professor, KLE University, J.N. Medical College and KLES Dr. Prabhakar Kore Hospital and Medical Research Centre, Belgaum  590010, Karnataka State, India ^{4} P.G. Student, KLE University, J.N. Medical College and KLES Dr. Prabhakar Kore Hospital and Medical Research Centre, Belgaum  590010, Karnataka State, India
Date of Acceptance  10Sep2008 
Date of Web Publication  19Mar2010 
Correspondence Address: Vithal K Dhulkhed Plot No. 7229, Sector No.10, Mal Maruti Extension, Belgaum 590016, Karnataka, India
Source of Support: None, Conflict of Interest: None  Check 
For grant proposals the investigator has to include an estimation of sample size .The size of the sample should be adequate enough so that there is sufficient data to reliably answer the research question being addressed by the study. At the very planning stage of the study the investigator has to involve the statistician. To have meaningful dialogue with the statistician every research worker should be familiar with the basic concepts of statistics. This paper is concerned with simple principles of sample size calculation. Concepts are explained based on logic rather than rigorous mathematical calculations to help him assimilate the fundamentals. Keywords: Sample size calculation, Power and Sample Size, Clinical trial,
How to cite this article: Dhulkhed VK, Dhorigol M G, Mane R, Gogate V, Dhulkhed P. Basic Statistical Concepts for Sample Size Estimation. Indian J Anaesth 2008;52:788 
How to cite this URL: Dhulkhed VK, Dhorigol M G, Mane R, Gogate V, Dhulkhed P. Basic Statistical Concepts for Sample Size Estimation. Indian J Anaesth [serial online] 2008 [cited 2020 Jan 29];52:788. Available from: http://www.ijaweb.org/text.asp?2008/52/6/788/60689 
Introduction   
While planning a research project the question arises as to how many subjects or patients are to be included in the study. It is needed for grant application. Members of the research committee need evidence that sufficient number of subjects is included for there to be a reasonable chance of getting a clear answer for the research question addressed by the project.
Depending upon the research question studies are planned and designed in different ways. Why should we know the basics of sample size calculation? To understand the contents of a scientific paper or an article we need to be familiar with the fundamental concepts of medical statistics. While designing a study we need to interact with a statistician. Understanding the basic concepts will help the anaesthesiologist to interact with him in a more meaningful way. This paper is meant to provide basic understanding of sample size calculation primarily for those with little experience of the subject.
Errors of judgment in a study   
Consider for example a man is accused of committing a crime and is being tried in a court of law. The judge needs adequate evidence to answer the question reliably whether the person is a criminal or not. Initially he assumes that the individual being tried has not committed any crime and is innocent unless proved otherwise. This is null hypothesis. The trial starts. The defense lawyer and the prosecuting lawyer bring forth evidence for and against this assumption. Inadequate evidence may lead to wrong judgment. An innocent man might be judged as criminal. This error of judgment (false positive) may be called type I error or alpha error. Obviously the probability of committing this error should be quite small. One may reason that it should be less than 5% may be still less. On the contrary a criminal might be judged as innocent and wrongly acquitted. The probability of this error (false negative or beta error) should also be small (you may want the possibility or probability of this error to be less than 20 out of 100 or 0.20). 80% probability of not committing this error is the power of judgment .In other words an individual should be judged as criminal (True positive) with power of 80% probability if in reality he has committed a crime. Sufficient weight of evidence will help in delivering correct judgment most of the time (or increase the power of judgment) and reduce the possibility of committing errors of judgment.
A clinical trial is similar in the sense that to reliably answer a clinical question it needs adequate evidence in the form of a proper sample size. What is a clinical trial? A clinical trial is a well designed experiment involving human participants. It proposes to answer a predefined set of clinical or research questions regarding an intervention. E.g. what is the mean systolic blood pressure of an adult population? While designing a study to answer this question, it is important to consider the number of participants to be included. Too many participants are a needless waste of resources and time. Too few participants (insufficient evidence) will not produce a precise and definitive answer (Probability of error will be higher).
What parameters are needed to calculate the sample size?   
Imagine a scenario in which a lady goes to a readymade garment shop alone and asks for trouser for her child. To select proper size of the trouser the shop keeper asks for parameters like age, height etc of the child since the child hasn't accompanied her. The lady may give him the parameters or may point a finger towards a child loitering in the shop who might fit the description and this helps the shopkeeper to select the proper size. In a similar way to calculate the sample size various parameters are needed depending upon the design of the study.
Consider the example stated earlier. What is the mean systolic blood pressure of an adult population? While planning the study, we have to define the population to which the results of the study are applied. This is achieved by defining the inclusion and exclusion criteria. We may like to include patients in the age group of 40 to 60 years with ASA grade I or II status. It is impossible to include all the patients fulfilling these criteria. It is a large number. Hence, we select a sample. We measure the systolic blood pressure in each individual and calculate average for the sample. Does this average equal the population average? It will differ to some extent. This disparity is sampling error. This can be explained by the concept of standard error (SE). We collect the data from the sample. Average or mean of the data is our parameter of interest in this example. The data points are dispersed around this mean. This dispersal or deviations from the mean is measured in terms of standard deviation (SD) for the sample. SD divided by the square root of sample size (N) is the standard error of the mean. It is obvious that as the sample size increases this error decreases. The error should be small enough so that we will be more precise in predicting the population mean The error is measured by the following formula.
Suppose calculated parameters for the sample are as follows. The mean of value for SBP = 120 mmHg with standard deviation (SD)=15 mm Hg and the standard error (SE)=3mm. What is our prediction for the population mean? Obviously it will not be exactly equal to the sample mean but it will be near to it. It will lie in a range of values around 120 mm Hg depending upon the SE.
We can say that the whole population mean will fall within range of 120±1.96 times SE i.e. 3 mm Hg. with more than 95% probability. Only by chance it can fall outside this range with a probability of less than 5%. In other words we are sure 95 times out of 100 that the population mean can be any value between 114 to 126 approx. Probability of this being outside the range is less than 5% (alpha error). 1.96 here corresponds to the number of standard errors on each side of the mean. Probability of population mean falling within this interval (120±1.96 x 3 mm Hg i.e. 114 to 126 approx) is 95%. This is also called the confidence interval (CI).This value 1.96 corresponding to 95% CI or 5% (0.05) alpha error is Z alpha. If we want narrower range for CI i.e. if we want to be more precise about the population mean e.g. 99% CI or alpha error less than 1 % (p< 0.01) we use different Z alpha value corresponding to it. They are shown in the [Table 1].
Another investigator wants to measure the systolic blood pressure in a similar population in his town. How many individuals he has to recruit for the study? As explained earlier he needs certain parameters to calculate. He refers to the study conducted earlier. He assumes before starting the study that for the similar kind of individuals in his town the predicted mean systolic blood pressure should be around 120mm Hg (and SD=15 mm Hg) and not far away from this value but within an error of 6 mm Hg (d) on either side of 120mm (114 to 126 approx.).To restrict the probability of getting the value outside this range to less than 5% just by chance , in other words to be 95% confident of being within this range he has to choose Z alpha of 1.96 as detailed above. Substitute these values in the formula below and get the sample size.
The parameters for sample size calculation are; Error value d = 6 mm, SD=15 mm Hg Z alpha=1.96
He decides to recruit 25 individuals in his study.
Similarly depending upon the design of the study certain specific parameters are needed to calculate the size of the sample. The best approach towards understanding those concepts is through practical examples or scenarios. Explanations for unfamiliar statistical terms will be given as and when they appear.
Example of an observational study when the parameter of interest is a proportion:   
The research question is "What is the present incidence of post operative shivering. The newly appointed chief of anaesthesiology finds out that the previous incidence of postoperative shivering was 60%. How many patients should be recruited for conducting a study to answer this question ?
Explanation:   
At the outset he assumes that the new incidence might be within 10% (error value d= 10% of 60% = 6%) on either side of 60% (p_{ 1} ) i.e. 54% to 66%. He wants to be more than 95% sure of this value being within this range if that is the reality. Probability of alpha error (the value falling outside the range) obviously here should be less than 5%, i.e.0 .05 (p <0 .05).For this alpha value the corresponding Z alpha value is 1.96 (number of SE on each side of 60%) The sample size calculated is
The parameters for sample size calculation are;
p= 60%; q = 100  p, error = 6%, Z alpha=1.96.
He decides to recruit 257 individuals in his study.
Experimental design:   
When the primary comparison is a mean   
Clinical trial example
The research question is "does a new drug work in preventing pressor response to laryngoscopy when compared to placebo?" How many patients would need to be recruited into a trial, to study the difference in mean systolic blood pressure, between an intervention group who receive a new antihypertensive drug and a control group (who do not receive intervention) with 80% power and a significance level of 5%.The two are Group D and GroupP respectively?
Step 1 The investigator decides to conduct the clinical trial in patients fulfilling the following inclusion criteria.
Age group between 20 to 60 years.
ASA grade I / II status and undergoing upper or lower abdominal surgery under general anaesthesia.
This defines the population to which the results and conclusions are to be applied. The population of patients fulfilling the criteria can be huge. Hence a good representative sample should be selected.
Step 2 He goes for literature search to study similar works conducted in the past. The search reveals that there is an average of 20 mm rise in systolic blood pressure (with a SD of 15 mm Hg) at the time of laryngoscopy. If the drug were to be effective it should prevent this rise. This is called the clinically important effect size. In this trial the plan is to record systolic blood pressure (SBP) before and after laryngoscopy in each patient in the two groups to study the pressor response (here change in SBP) and note the average (or mean) increase in each group i.e. µ_{ 1 } in group D and µ_{ 2 } in group P.
Step 3 The study will yield one of the two results depending upon the efficacy of the drug. If the drug is not effective the average rise in blood pressure in the two groups (µ_{ 1} and µ_{ 2 } respectively) is same i.e. nearly 20 mm Hg. This is called the clinically useful effect size. Hence µ_{ 1 } _{ }µ_{ 2} = nearly 0. The investigator assumes or hypothesizes in the beginning that the drug is not effective meaning no difference between the two means. This assumption of no difference between the two groups unless proved otherwise is called the null hypothesis. In this case when the drug is truly not effective naturally the results of the study should show that the drug is not effective (True negative result that drug is not effective) proving the null hypothesis. However there is always a chance of getting the wrong result i.e. the study result may wrongly show the ineffective drug as effective just by chance factor. The investigator wants that the probability of getting this wrong result (false positive result, á value or type I error) merely by chance even though the drug is not effective should be less than 5% i.e.0.05. This can be achieved by collecting adequate evidence in the form of data with proper sample size.
Further in this case the study result may not show exactly zero difference between the two groups. Hence the question that arises is up to what critical value from this zero difference one should say that the drug is not effective. This as has already been explained earlier is from zero up to 1.96 (Z alpha value) times the SE. If result of the study is beyond this critical value (also called significance level) the null hypothesis that the drug is not effective is proved. Secondly if it exceeds critical value then the difference is significantly nearer to 20 mm Hg (meaning there is significantly smaller increase in the D group) and it can be concluded that the difference between the two groups is significant and inference is that the drug is effective. When the difference is beyond the critical value can the investigator be 100% sure that the drug is effective. Here there is possibility of less than 5% (or p<0.05) of being false positive by mere chance factor. Anyway despite allowing the possibility of committing this small error it will be concluded that the drug is effective in preventing rise in SBP when the study result exceeds the critical value.
Secondly if the drug is effective, groupP does not show rise in blood pressure. In the placebo group 20 mm Hg increase in SBP is expected. Here difference between the mean systolic blood pressures µ_{ 1 } _{ }µ_{ 2} = nearly 20 mm Hg. If the drug were to be truly effective (the study result beyond the critical value as mentioned above) he wants that he should get this result (True positive result that the drug is effective) at least with 90% probability. This is called the power of the study. In other words probability of getting wrong result (false negative result, β value or type II error) merely by chance when drug is truly effective should be less than 10% i.e.0.10. The investigator may not get exactly 20 mm Hg difference but near to this value when the drug is effective. This is from 20mm Hg down to 1.282 times the SE below 20 (but exceeding the critical value).Here the term 1.282 is called Z beta corresponding to false negative result, β value or type II error of 0.10.The logic here is similar to what has been explained above. Again this can be achieved by collecting adequate evidence in the form of data with proper sample size. This is illustrated graphically in the diagram
Here the parameters for sample size calculation are;
Effect size of clinical interest i.e. difference in the means = d = 20 mm Hg, SD = 15 mm Hg Z alpha = 1.96 (corresponding to Type I error of 5% i.e.0.05) Z beta = 1.282 (corresponding to power of 90%).Refer to the table for the Z values.
He decides to recruit 12 individuals in each group in the study.
When the primary comparison is a proportion:   
Clinical trial example:
The research study question is whether clonidine is effective in the treatment of postoperative shivering. The newly appointed chief of anesthesiology finds out that the incidence of postoperative shivering is 60%. He wants to find out the effectiveness of the drug clonidine in reducing this incidence. Reduction up to 40% (i.e. from 60% to 20%) in the incidence of shivering will be clinically beneficial. How many patients should be recruited for the study with alpha value i.e. Type I error of 5% (0.05) and the power of 95%?
The parameters for sample size calculation are; p_{ 1} = 60%, p_{ 2} = 20% d = difference between two proportions or effect size = 40%, Here we have to calculate the average of p_{ 1} and p_{ 2}
q = 1  p = 60%, Z alpha=1.96, Z beta=1.96
The principles involved are similar to the earlier example
He decides to recruit 40 individuals in each group in the study.
Points to remember:   
For calculating sample size different parameters are needed depending upon the study design
 Prevalence, incidence rate, sensitivity, specificity, means, variance, SD, Odds Ratio (OR), Correlation coefficient (r) etc.
 Effect Size
 Error value
These are determined or estimated by studying in depth the problem at hand, literature search, dialogue with experts in the area of interest or by conducting a pilot study with limited number of subjects
Sample size increases:
 When treatment effect is small e.g. when comparing two drugs rather than a drug versus placebo
 With higher power
 The lower the significance level e.g. 1% (or p=0.01) rather than the typical 5%, the less likely you are to get a chance (but spurious) treatment effect i.e. a falsepositive result
 When measurements are highly variable natural variability can be thought of as 'noise' and makes the 'signal' more difficult to hear e.g. measurements such as peak flow rate is highly variable  a simple way to alleviate this problem is to take a few repeated measurements and use the average (or the maximum).^{ [9] }
References   
1.  Meinert CL. Clinical Trials: Design, Conduct, and Analysis. New York: Oxford University Press 1986. 
2.  Florey CDV. Sample size for beginners. BMJ 1993; 306:11811184. 
3.  Moher D, Dulberg CS, Wells GA. Statistical power, sample size, and their reporting in randomized controlled trials. JAMA 1994; 272:122124. [PUBMED] 
4.  Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Annals of Internal Medicine 1994; 121:200206. 
5.  Day SJ, Graham DF. Sample size estimation for comparing two or more treatment groups in clinical trials. Statistics in Medicine 1991; 10:3343. 
6.  Lemeshow S, Hosmer DW, Klar J. Sample size requirements for studies estimating odds ratios or relative risks. Statistics in Medicine 1988; 7:759764. 
7.  Greenland S. On Sample size and power calculations for studies using confidence intervals American Journal of Epidemiology 1988;128:231237. 
8.  Everitt BS .Medical Statistics from A to Z. Cambridge: Cambridge University Press. [E] {IOP, NHH, ST} 2003 
9.  Armitage P, Berry G& Matthews JNS (2001) Statistical Methods in Medical Research (fourth edition). Blackwell Publishing. [E] {IOP, WEC, ST, FW, NHH(1994 edition)} 
[Table 1]
