What is statistical power?
Power is traditionally defined as the probability of rejecting the null hypothesis when the null hypothesis is false and the alternative hypothesis is true.
Suppose we have two populations whose parametric means are different.
- We take samples from the two populations and obtain sample means and variances.
- We perform a statistical test to determine if the means are significantly different
- We repeat sampling and testing many times.
Basically, test power is the proportion of those tests thatRightindicate that the means of the two populations are significantly different.
Assuming that the statistic being tested has a "known" distribution (for example, normal), the power of the test is as follows:
- To imaginedAis hedistributionYour test stat (for example, Z) under theAlternative hypothesis, HA.
- So heEnergyYour test is simply the ratio ofdAthose outside the bottom and/or top'critical values' of your test - these are quantiles ofd0, this is the test statisticdistributionunder theNull hypothesis, H0.
This works perfectly fine regardless of the effect size of your treatment (could be zero), but it assumes that your treatment effect is fixed (cannot vary) and that the only difference is between them.d0ydAis its location (thus the treatment effect, δ, is
- of course yesd0ydAdiffer in other ways, or their distributions are unknown (or cannot be easily calculated), the only way to find performance may be empirically (in other words, through simulation). In this case, you repeatedly sample a defined population, apply your test to each sample, and find what proportion of the results are "significant."
Finally, note that when the distribution of the test statistic is not continuous (smooth) but very discrete (stepped), the use of traditional critical values can reduce achievable performance to the point of uselessness. In this situation, averagePValues perform better "on average" as long as you accept that your test is conservative or
Of course, we want the power of our statistical test to be as high as possible. So we need to know what other factors determine the validity of a test:
- the effect size is large.
- the sample size is large,
- the variances of the populations examined are small,
- the significance level (α) is high (for example, 5% compared to 1%),
- a one-tailed test is used instead of a two-tailed one.
For any given statistical test, there is a mathematical relationship between power, significance level, various population parameters, and sample size. For some of the more important statistical tests, we provide the formulas for this relationship. But before introducing the first of these (for the z test), we need to think carefully about what we are going to calculate using the relationship.
Statistical power estimate
There are two reasons for estimating the power of a test:
- To create a power curve to predict how much information needs to be collected to have reasonable confidence (for example, 95%), you will get a meaningful result. This is a sensible and productive practice.
In practice, the sample size required for a given desired performance is usually calculated directly, rather than constructing a performance curve. Still, examining a power curve can be very useful as it can help make a more rational experimental design decision. SuchFirstPower forecasts are useful, but they can be criticized when they are based on insufficient prior information (from a pilot study that is too small) or when too rough (or inappropriate) a model is used to predict how the statistic under test is likely to perform. to vary. Somewhat perversely, reviewers tend to be much more concerned with the exact mathematical model than with the data to which it is applied, possibly because theoretical mathematical flaws are easier to resolve and their refinement offers interesting career prospects for statisticians. mathematicians.
For additional information on data already collected and tested. Suchpost-hocYield predictions are controversial and generally discouraged for two reasons:
You'llforeverdetermine that significance is not sufficient to demonstrate a non-significant treatment effect. This is because the estimated yield is directly related to the observed one.P-Worth. In other words, it cannot give you more than a precise indicationP-Wert.
- Despite this objection, several standard textbooks (such asZar
Unfortunately,post-hocPotency determinations have no theoretical justification and areNorecommended. Power is a pre-judgment concept. We should not apply a pre-experimental probability of a hypothesized set of outcomes to the observed outcome. This has been compared to trying to convince someone that buying a lottery ticket was stupid (the pre-college point of view) after winning the lottery jackpot (the post-college point of view).
Calculating the power to demonstrate the effect of your observed treatment locks you in the significant/insignificant mindset with a rigid significance level of 0.05. Once you have the data, it is better to use the accurate onesP-Value used to assess the weight of evidence and calculate a confidence interval around the estimated effect size as a measure of the reliability of that estimate.
Accepting these points, there is a way of calculating post-event performance that can be very informative: the empirical performance curve or its equivalent.Pvalue chart, theP-value function - corresponding to each possible confidence interval on the size of the observed effect. Whatever it's called, this function estimates the relationship between the probability of rejecting the null hypothesis and the effect size, given the available data. For simpler models, this relationship can be predicted algebraically. Alternatively, and more tellingly, the relationship can be estimated using a "trial inversion." Because test inversion exploits the underlying connection between tests and confidence intervals, we examine this method in
Estimate the sample size required for a given power level
Predicting the required sample size for a particular statistical test requires power values, significance level, effect size, and various population parameters. You should also indicate whether the test is unilateral or bilateral. We will look at each of these components.
The values chosen for the statistical power and the level of significance depend on the study. Conventionally, the power should not be less than 0.8, and preferably around 0.9. The most commonly used value for the significance level (α) is 0.05. However, there may be good reasons to deviate from these conventional values. If it is more important to avoid a type I error (ie, a false positive result), the significance level can be lowered to 0.01. If it is more important to avoid a type II error (ie, a false negative result), the power can be increased to 0.95.
The relevant population parameters depend on the type of statistical test. When comparing means, you should include the population standard deviation. When you compare proportions, you must provide the proportion of the reference or control group, which in turn allows you to estimate the standard deviation. These parameters can normally be estimated using the literature or, if this is not possible, using a pilot study. Sometimes it is necessary to re-evaluate these parameters during the course of a study, although statisticians generally advise against this because it can be an introduction.biasin the process.
The effect size (the smallest difference between the means or proportions that you consider to be significant) is probably the most difficult parameter to determine because it is subjective to some degree. When comparing a new malaria treatment to the standard, how much is the improvement worth? In making this decision, one must consider the frequency and severity of side effects, the relative cost of the new treatment, and the relative ease of administration. If the new drug is cheaper than the current one and has fewer side effects, even a small improvement in cure rate (say 5%) is worth it. If it's much more expensive with similar side effects, you might consider that only a larger improvement (say 20%) would be worth it.
this gives you a useful sample size!
Effect size options should be consideredforevermake yourself explicit - a point not sufficiently emphasized in the literature! Too often, researchers do what is popularly known as this'samba sample size'- That is, simply changing the size of the effect to obtain an adequate sample size. This is very silly because when you find a smaller effect size, you're determined to say it's not worth it, even if it's worth it!
Finally, one must decide whether to choose a one-tailed or a two-tailed test. Sometimes a one-tailed test is chosen simply to reduce the required sample size, a practice that statisticians strongly discourage. Today, the convention is that with a two-tailed test one should always estimate the sample size, even if a one-tailed test is later used for evaluation.
There is one last important point!
The estimate of the required sample size isOh noa precise science. It is always approximate because you need to estimate (sometimes just estimate) the variances of the populations involved. As a result, the actual performance you achieve may be far below what you expect.
Therefore, it is a good idea to use a slightly larger sample size than specified in your power analysis.
Power of the estimator and sample size for the z test
hypotheses and queues
We now consider how the statistical power of the z test to compare a randomly selected Q value from a test population to the true mean (μ1) - with a known mean of the reference population (μ0) and known standard error (σd). This standard error is assumed to be the same under the zero and alternative hypotheses, and
- For a one-tailed high-end test:
- The null hypothesis (H0) es
Also δ = [m1- metro0] = 0
- Die Alternativhipótesis (H1) es
Also δ = [m1- metro0] > 0
- The null hypothesis (H0) es
- For a low-end one-sided test:
- H0is the same,
re = 0.
- but H1es
metro1< metro0 Then δ < 0.
- H0is the same,
- For a two-tailed test:
- H0is the same.
- low h1
d ≠ 0.
To reduce computational effort, these comparisons are generally made using standardized values. Unfortunately, this usually leads to some additional notation that we need to explain before proceeding.
- For a standard one-tailed significance test, thehigher, higherCola,
= 0,05y +za= +1,645.
- Since this distribution is symmetric, for thelowercola
− za= −1,645.
- For a two-sided comparison, assuming a probability of ≥/2 on each edge and ≥/2 on each edge. = 0.05, then
− zA'2= −1,960y +zA'2= +1.960.
For the three tests listed above, the probability of correctly rejecting the null hypothesis with a predefined α is as follows:
Algebraically speaking -
a. For a one-tailed test using the upper bound (positive treatment effect):
for example when
b. For a one-tailed test using the lower bound (negative treatment effect):
C. For a two-tailed test with both ends:
- P is the probability, determined from the cumulative normal distribution, as the fraction of the standard normal distribution that is greater than or less than Z. This can be determined from your computer's statistics probability calculator.
Package.If you use tables, some indicate the proportion of the distributionless than Z,while others state the proportion of the distribution that isbut bigger than zAnother variation is that the probability given in the table ranges from zero to Z, so you have to add 0.5 to get the correct one. Wert.
- Z is the standardized normal deviation,
- zais the location of the critical value for α, above which 100% of population zero lies, and is obtained from your probability calculator or tables, if that is the case
P(Z <za) = 1-aand α is the level of significance. zd=d/pdo[metro1−metro0]/pd
- metro0is the mean of the reference population (under H0),
- metro1is the mean of the test population
- pd= population standard error of d. For a zσ testd= σ/√n, the standard error of the reference population mean, usually calculated as the standard deviation of the reference population observations (σ) divided by the square root of the number of observations in the sample (n) .
Sample size estimation
We rearrange the power formula to obtain the number of samples needed to obtain a given power.
For a one-sided test:
- (zais obtained from your probability calculator or tables, if available
P(Z <za) = 1 - andand α is the level of significance.
- zbis obtained from your probability calculator or tables, if that is the case
P(Z <zb) = 1 − β y 1 - bIt's the power
- metro0is the known population mean,
- metro1is the mean of the test population
- σ is the known population standard deviation of the observations
For a two-tailed test we use an approximation and use, for exampleA'2instead of for examplea. This ignores the possibility of a Type III error, but usually does not result in a fatal error in the case of large treatment effects.
The following values of for examplea, y Zbare the most commonly used in sample size calculations:
It makes a number of assumptions when estimating power and the required sample size. The first set of assumptions applies to all significance tests, namely:
- Samples are drawn at random or individuals are randomly assigned to treatment groups.
- The observations are independent of each other.
The second set of assumptions is specific to the z test:
- The response variable approximates a normal distribution.
- The true population mean and standard deviation are known and are not estimated from a sample.