REVIEW ARTICLE Year : 2016  Volume : 60  Issue : 9  Page : 652656 Sample size calculation: Basic principles Sabyasachi Das^{1}, Koel Mitra^{1}, Mohanchandra Mandal^{2}, ^{1} Department of Anaesthesiology and Critical Care, Medical College, Kolkata, West Bengal, India ^{2} Department of Anaesthesiology and Critical Care, North Bengal Medical College, Sushrutanagar, Darjeeling, West Bengal, India Correspondence Address: Addressing a sample size is a practical issue that has to be solved during planning and designing stage of the study. The aim of any clinical research is to detect the actual difference between two groups (power) and to provide an estimate of the difference with a reasonable accuracy (precision). Hence, researchers should do a priori estimate of sample size well ahead, before conducting the study. Post hoc sample size computation is not encouraged conventionally. Adequate sample size minimizes the random error or in other words, lessens something happening by chance. Too small a sample may fail to answer the research question and can be of questionable validity or provide an imprecise answer while too large a sample may answer the question but is resourceintensive and also may be unethical. More transparency in the calculation of sample size is required so that it can be justified and replicated while reporting.
Introduction Besides scientific justification and validity, the calculation of sample size ('just large enough') helps a medical researcher to assess cost, time and feasibility of his project. [1] Although frequently reported in anaesthesia journals, the details or the elements of calculation of sample size are not consistently provided by the authors. Sample size calculations reported do not match with replication of sample size in many studies. [2] Most trials with negative results do not have a large enough sample size. Hence, reporting of statistical power and sample size need to be improved. [3],[4] There is a belief that studies with small sample size are unethical if they do not ensure adequate power. However, the truth is that for a study to be ethical in its design, its predicted value must outweigh the projected risks to its participants. In studies, where the risks and inconvenience borne by the participants outweigh the benefits received as a result of participation, it is the projected burden. A study may still be valid if the projected benefit to the society outweighs the burden to society. If there is no burden, then any sample size may be ideal. [5] Many different approaches of sample size design exist depending on the study design and research question. Moreover, each study design can have multiple subdesigns resulting in different sample size calculation. [6] Addressing a sample size is a practical issue that has to be solved during planning and designing stage of the study. It may be an important issue in approval or rejection of clinical trial results irrespective of the efficacy. [7] By the end of this article, the reader will be able to enumerate the prerequisite for sample size estimation, to describe the common lapses of sample size calculation and the importance of a priori sample size estimation. The readers will be able to define the common terminologies related to sample size calculation. Importance of Pilot Study in Sample Size Estimation In published literature, relevant data for calculating the sample size can be gleaned from prevalence estimates or event rates, standard deviation (SD) of the continuous outcome, sample size of similar studies with similar outcomes. The idea of approximate 'effect' estimates can be obtained by reviewing metaanalysis and clinically meaningful effect. Small pilot study, personal experience, expert opinion, educated guess, hospital registers, unpublished reports support researcher when we have insufficient information in the existing/available literature. A pilot study not only helps in the estimation of sample size but also its primary purpose is to check the feasibility of the study. The pilot study is a small scale trial run as a pretest, and it tries out for the proposed major trial. It allows preliminary testing of the hypotheses and may suggest some change, dropping some part or developing new hypotheses so that it can be tested more precisely. [8] It may address many logistic issues such as checking that instructions are comprehensive, and the investigators are adequately skilled for the trial. The pilot study almost always provides enough data for the researcher to decide whether to go ahead with the main study or to abandon. Many research ideas that seem to show great promise are unproductive when actually carried out. From the findings of pilot study, the researcher may abandon the main study involving large logistic resources, and thus can save a lot of time and money. [8] Methods for Sample Size Calculation Sample size can be calculated either using confidence interval method or hypothesis testing method. In the former, the main objective is to obtain narrow intervals with high reliability. In the latter, the hypothesis is concerned with testing whether the sample estimate is equal to some specific value. Null hypothesis This hypothesis states that there is no difference between the control and the study group in relation to randomized controlled trial (RCT). Rejecting or disproving the null hypothesis  and thus concluding that there are grounds for believing that there is a difference between the two groups, is a central task in the modern practice of science, and gives a precise criterion for rejecting a hypothesis. [9],[10] Alternative hypothesis This hypothesis is contradictory to null hypothesis, i.e., it assumes that there is a difference among the groups, or there is some association between the predictor and the outcome [Figure 1]. [9],[10] Sometimes, it is accepted by exclusion if the test of significance rejects the null hypothesis. It may be onesided (specifies the difference in one direction only) or twosided (specifies the difference in both directions).{Figure 1} Type I error (α error) occur if the null hypothesis is rejected when it is true. It represents the chance that the researcher detects a difference between two groups when in reality no difference exists. In other words, it is the chance of falsepositive conclusion. A value of 0.05 is most commonly used. Type II error (β error) is the chance of a falsenegative result. The researcher does not detect the difference between the two groups when in reality the difference exists. Conventionally, it is set at a level of 0.20, which translates into <20% chance of a falsenegative conclusion. Power is the complement of beta, i.e., (1beta). In other words, power is 0.80 or 80% when beta is set at 0.20. The power represents the chance of avoiding a falsenegative conclusion, or the chance of detecting an effect if it really exists. [11] Types of Trials Parallel arm RCTs are most commonly used, that means all participants are randomized in two or more arms of different interventions treated concurrently. Various types of parallel RCTs are used in accordance with the need: Superiority trials which verify whether a new approach is more effective than the conventional from statistically or clinical point of view. Here, the concurrent null hypothesis is that the new approach is not more effective than the conventional approach. Equivalence trials which are designed to ascertain that the new approach and the standard approach are equally effective. Corresponding null hypothesis states that the difference between both approaches is clinically relevant. Noninferiority trials which are designed to ascertain that the new approach is equal if not superior to the conventional approach. Corresponding null hypothesis is that the new approach is inferior to the conventional one. Prerequisites for Sample Size Estimation At the outset, primary objectives (descriptive/analytical) and primary outcome measure (mean/proportion/rates) should be defined. Often there is a primary research question that the researcher wants to investigate. It is important to choose a primary outcome and lock that for the study. The minimum difference that investigator wants to detect between the groups makes the effect size for the sample size calculation. [7] Hence, if the researcher changes the planned outcome after the start of the study, the reported P value and inference becomes invalid. [11] The level of acceptable Type I error (α) and Type II error (β) should also be determined. The error rate of Type I error (alpha) is customarily set lower than Type II error (beta). The philosophy behind this is the impact of a false positive error (Type I) is more detrimental than that of false negative (Type II) error. So they are protected against more rigidly. Besides, the researcher needs to know the control arm mean/event rates/proportion, and the smallest clinically important effect that one is trying to detect. The Relation Between Primary Objective and the Sample Size The type of primary outcome measure with its clear definition help computing correct sample size as there are definite ways to reach sample size for each outcome measure. It needs special attention as it principally influences how impressively the research question is answered. The type of primary outcome measure also is the basis for the mode of estimation regarding population variance. For continuous variable (e.g., mean arterial pressure [MAP]), population SD is incorporated in the formula whereas the SD needs to be worked out from the proportion of outcomes for binomial variables (e.g., hypotension  yes/no). In literature, there can be several outcomes for each study design. It is the responsibility of the researcher to find out the primary outcome of the study. Mostly sample size is estimated based on the primary outcome. It is possible to estimate sample size taking into consideration all outcome measures, both primary and secondary at the cost of much larger sample size. Essential Components of Sample Size Estimation The sample size for any study depends on certain factors such as the acceptable level of significance (P value), power (1 − β) of the study, expected 'clinically relevant' effect size, underlying event rate in the population, etc. [7] Primarily, three factors P value (depends on α), power (related with β) and the effect size (clinically relevant assumption) govern an appropriate sample size. [12],[13],[14] The 'effect size' means the magnitude of clinically relevant effect under the alternative hypothesis. It quantifies the difference in the outcomes between the study and control groups. It refers to the smallest difference that would be of clinical importance. Ideally, the basis of effect size selection should be on clinical judgement. It varies with different clinical trials. The researcher has to determine this effect size with scientific knowledge and wisdom. Available previous publications on related topic might be helpful in this regard. 'Minimal clinically important difference' is the smallest difference that would be worth testing. Sample size varies inversely with effect size. The ideal study to make a researcher happy is one where power of the study is high, or in other words, the study has high chance of making a conclusion with reasonable confidence, be it accepting or rejecting null hypothesis. [9] Sample size matrix, includes different values of sample sizes using varying dimensions of alpha, power (1β), and effect size. It is often more useful for the research team to choose the sample size number that fits conveniently to the need of the researcher [Table 1].{Table 2} Formulae and Software Once these three factors are fixed, there are many ways (formulae, nomogram, tables and software) for estimating the optimum sample size. At present, there are a good number of softwares, available in the internet. It is prudent to be familiar with the instructions of any software to get sample size of one arm of the study. Perhaps the most important step is to check with the most appropriate formula to get a correct sample size. Websites of some of the commonly used softwares are provided in [Table 2]. [2],[6] The number of formulae for calculating the sample size and power, to answer precisely different study designs and research questions are no less than 100. It is wise to check appropriate formula even while using software. Although there are more than 100 formulae, for RCTs numbers of formulae are limited. It essentially depends on the primary outcome measure such as mean ± SD, rate and proportion. [6] Interested readers may access all relevant sample size estimation formulae using the relevant links. Calculating the sample size by comparing two means A study to see the effect of phenylephrine on MAP as continuous variable after spinal anaesthesia to counteract hypotension. MAP as continuous variable: n = Sample size in each of the groupsμ1 = Population mean in treatment Group 1, μ2 = Population mean in treatment Group 2μ1−μ2 = The difference the investigator wishes to detectΩ = Population variance (SD)a = Conventional multiplier for alpha = 0.05, b = Conventional multiplier for power = 0.80. [INLINE:1] Value of a = 1.96, b = 0.842 [Table 3]. If a difference of 15 mmHg in MAP is considered between the phenylephrine and the placebo group as clinically significant (μ1− μ2) and be detected with 80% power and a significance level alpha of 0.05. [7] n = 2 × ([1.96 + 0.842] 2 × 20 2 )/15 2 = 27.9. That means 28 subjects per group is the sample size.{Table 3} Calculating the sample size by comparing two proportions A study to see the effect of phenylephrine on MAP as a binary variable after spinal anaesthesia to counteract hypotension. MAP as a binary outcome, below or above 60 mmHg (hypotension  yes/no): n = The sample size in each of the groupsp1 = Proportion of subjects with hypotension in treatment Group 1q1 = Proportion of subjects without hypotension in treatment Group 1 (1 − p1)p2 = Proportion of subjects with hypotension in treatment Group 2q2 = Proportion of subjects without hypotension in treatment Group 2 (1 − p2)x = The difference the investigator wishes to detecta = Conventional multiplier for alpha = 0.05b = Conventional multiplier for power = 0.8. [INLINE:2] Considering a difference of 10% as clinically relevant and from the recent publication the proportion of subjects with hypotension in the treated group will be 20% (p1 = 0.2) and in the control group will be 30% (p2 = 0.3), and thus q1 and q2 are 0.80 and 0.70, respectively. [7] Assuming a power of 80%, and an alpha of 0.05, i.e. 1.96 for a and 0.842 for b [Table 3] we get: ([1.96 + 0.842] 2 × [0.20 × 0.80 + 0.30 × 0.70])/0.10 2 = 290.5. Thus, 291 is the sample size. Researcher may follow some measures like using continuous variables as the primary outcome, measuring the outcome precisely or choose outcomes that can be measured properly. Use of a more common outcome, making onesided hypothesis may help achieving this target. Published literature and pilot studies are the basis of sample size calculation. At times, expert opinions, personal experience with event rates and educated guess becomes helpful. Variance, effect size or event rates may be underestimated during calculation of the sample size at the designing phase. If the investigator realizes that this underestimation has led to 'too small a sample size' recalculation can be tried based on interim data. [15] Summary Sample size calculation can be guided by previous literature, pilot studies and past clinical experiences. The collaborative effort of the researcher and the statistician is required in this stage. Estimated sample size is not an absolute truth, but our best guess. Issues such as anticipated loss to followup, large subgroup analysis and complicated study designs, demands a larger sample size to ensure adequate power throughout the trial. A change in sample size is proportional to variance (square of SD) and inversely proportional to the detected difference. Financial support and sponsorship Nil. Conflicts of interest There are no conflicts of interest. References


