What is statistical power?
Power is traditionally defined as the probability of rejecting the null hypothesis when the null hypothesis is false and the alternative hypothesis is true.
Suppose we have two populations whose parametric means are different.
 We take samples from the two populations and obtain sample means and variances.
 We perform a statistical test to determine if the means are significantly different
 We repeat sampling and testing many times.
Basically, test power is the proportion of those tests thatRightindicate that the means of the two populations are significantly different.
In practice, as we pointed out
Assuming that the statistic being tested has a "known" distribution (for example, normal), the power of the test is as follows:
 To imagined_{A}is hedistributionYour test stat (for example, Z) under theAlternative hypothesis, H_{A}.
 So heEnergyYour test is simply the ratio ofd_{A}those outside the bottom and/or top'critical values' of your test  these are quantiles ofd_{0}, this is the test statisticdistributionunder theNull hypothesis, H_{0}.
This works perfectly fine regardless of the effect size of your treatment (could be zero), but it assumes that your treatment effect is fixed (cannot vary) and that the only difference is between them.d_{0}yd_{A}is its location (thus the treatment effect, δ, is
 of course yesd_{0}yd_{A}differ in other ways, or their distributions are unknown (or cannot be easily calculated), the only way to find performance may be empirically (in other words, through simulation). In this case, you repeatedly sample a defined population, apply your test to each sample, and find what proportion of the results are "significant."
Finally, note that when the distribution of the test statistic is not continuous (smooth) but very discrete (stepped), the use of traditional critical values can reduce achievable performance to the point of uselessness. In this situation, averagePValues perform better "on average" as long as you accept that your test is conservative or
Of course, we want the power of our statistical test to be as high as possible. So we need to know what other factors determine the validity of a test:
 the effect size is large.
 the sample size is large,
 the variances of the populations examined are small,
 the significance level (α) is high (for example, 5% compared to 1%),
 a onetailed test is used instead of a twotailed one.
For any given statistical test, there is a mathematical relationship between power, significance level, various population parameters, and sample size. For some of the more important statistical tests, we provide the formulas for this relationship. But before introducing the first of these (for the z test), we need to think carefully about what we are going to calculate using the relationship.
Statistical power estimate
There are two reasons for estimating the power of a test:
 To create a power curve to predict how much information needs to be collected to have reasonable confidence (for example, 95%), you will get a meaningful result. This is a sensible and productive practice.
In practice, the sample size required for a given desired performance is usually calculated directly, rather than constructing a performance curve. Still, examining a power curve can be very useful as it can help make a more rational experimental design decision. SuchFirstPower forecasts are useful, but they can be criticized when they are based on insufficient prior information (from a pilot study that is too small) or when too rough (or inappropriate) a model is used to predict how the statistic under test is likely to perform. to vary. Somewhat perversely, reviewers tend to be much more concerned with the exact mathematical model than with the data to which it is applied, possibly because theoretical mathematical flaws are easier to resolve and their refinement offers interesting career prospects for statisticians. mathematicians.
 For additional information on data already collected and tested. SuchposthocYield predictions are controversial and generally discouraged for two reasons:
 You'llforeverdetermine that significance is not sufficient to demonstrate a nonsignificant treatment effect. This is because the estimated yield is directly related to the observed one.PWorth. In other words, it cannot give you more than a precise indicationPWert.
 Despite this objection, several standard textbooks (such asZar
Unfortunately,posthocPotency determinations have no theoretical justification and areNorecommended. Power is a prejudgment concept. We should not apply a preexperimental probability of a hypothesized set of outcomes to the observed outcome. This has been compared to trying to convince someone that buying a lottery ticket was stupid (the precollege point of view) after winning the lottery jackpot (the postcollege point of view).
Accepting these points, there is a way of calculating postevent performance that can be very informative: the empirical performance curve or its equivalent.Pvalue chart, thePvalue function  corresponding to each possible confidence interval on the size of the observed effect. Whatever it's called, this function estimates the relationship between the probability of rejecting the null hypothesis and the effect size, given the available data. For simpler models, this relationship can be predicted algebraically. Alternatively, and more tellingly, the relationship can be estimated using a "trial inversion." Because test inversion exploits the underlying connection between tests and confidence intervals, we examine this method in
Estimate the sample size required for a given power level
Predicting the required sample size for a particular statistical test requires power values, significance level, effect size, and various population parameters. You should also indicate whether the test is unilateral or bilateral. We will look at each of these components.
The values chosen for the statistical power and the level of significance depend on the study. Conventionally, the power should not be less than 0.8, and preferably around 0.9. The most commonly used value for the significance level (α) is 0.05. However, there may be good reasons to deviate from these conventional values. If it is more important to avoid a type I error (ie, a false positive result), the significance level can be lowered to 0.01. If it is more important to avoid a type II error (ie, a false negative result), the power can be increased to 0.95.
The relevant population parameters depend on the type of statistical test. When comparing means, you should include the population standard deviation. When you compare proportions, you must provide the proportion of the reference or control group, which in turn allows you to estimate the standard deviation. These parameters can normally be estimated using the literature or, if this is not possible, using a pilot study. Sometimes it is necessary to reevaluate these parameters during the course of a study, although statisticians generally advise against this because it can be an introduction.biasin the process.
The effect size (the smallest difference between the means or proportions that you consider to be significant) is probably the most difficult parameter to determine because it is subjective to some degree. When comparing a new malaria treatment to the standard, how much is the improvement worth? In making this decision, one must consider the frequency and severity of side effects, the relative cost of the new treatment, and the relative ease of administration. If the new drug is cheaper than the current one and has fewer side effects, even a small improvement in cure rate (say 5%) is worth it. If it's much more expensive with similar side effects, you might consider that only a larger improvement (say 20%) would be worth it.
this gives you a useful sample size!
Effect size options should be consideredforevermake yourself explicit  a point not sufficiently emphasized in the literature! Too often, researchers do what is popularly known as this'samba sample size' That is, simply changing the size of the effect to obtain an adequate sample size. This is very silly because when you find a smaller effect size, you're determined to say it's not worth it, even if it's worth it!
Finally, one must decide whether to choose a onetailed or a twotailed test. Sometimes a onetailed test is chosen simply to reduce the required sample size, a practice that statisticians strongly discourage. Today, the convention is that with a twotailed test one should always estimate the sample size, even if a onetailed test is later used for evaluation.
There is one last important point!
The estimate of the required sample size isOh noa precise science. It is always approximate because you need to estimate (sometimes just estimate) the variances of the populations involved. As a result, the actual performance you achieve may be far below what you expect.
Therefore, it is a good idea to use a slightly larger sample size than specified in your power analysis.
Power of the estimator and sample size for the z test
hypotheses and queues
We now consider how the statistical power of the z test to compare a randomly selected Q value from a test population to the true mean (μ_{1})  with a known mean of the reference population (μ_{0}) and known standard error (σ_{d}). This standard error is assumed to be the same under the zero and alternative hypotheses, and
 For a onetailed highend test:
 The null hypothesis (H_{0}) es
metro_{1}= metro_{0} .
Also δ = [m_{1} metro_{0}] = 0  Die Alternativhipótesis (H_{1}) es
metro_{1}> metro_{0} .
Also δ = [m_{1} metro_{0}] > 0
 The null hypothesis (H_{0}) es
 For a lowend onesided test:
 H_{0}is the same,
re = 0 .  but H_{1}es
metro_{1}< metro_{0} Then δ < 0 .
 H_{0}is the same,
 For a twotailed test:
 H_{0}is the same.
 low h_{1}
d ≠ 0 .
Z notation
To reduce computational effort, these comparisons are generally made using standardized values. Unfortunately, this usually leads to some additional notation that we need to explain before proceeding.
 For a standard onetailed significance test, thehigher, higherCola,
= 0,05 y+z_{a}= +1,645 .  Since this distribution is symmetric, for thelowercola
− z_{a}= −1,645 .  For a twosided comparison, assuming a probability of ≥/2 on each edge and ≥/2 on each edge. = 0.05, then
− z_{A'2}= −1,960 y+z_{A'2}= +1.960 .
power formulas
For the three tests listed above, the probability of correctly rejecting the null hypothesis with a predefined α is as follows:
Algebraically speaking 
a. For a onetailed test using the upper bound (positive treatment effect):
for example when
b. For a onetailed test using the lower bound (negative treatment effect):
Likewise if
C. For a twotailed test with both ends:
since yes
Wo:
 P is the probability, determined from the cumulative normal distribution, as the fraction of the standard normal distribution that is greater than or less than Z. This can be determined from your computer's statistics probability calculator.
Package. If you use tables, some indicate the proportion of the distributionless thanZ, while others state the proportion of the distribution that isbut bigger thanz Another variation is that the probability given in the table ranges from zero to Z, so you have to add 0.5 to get the correct one.Wert.  Z is the standardized normal deviation,
 z_{a}is the location of the critical value for α, above which 100% of population zero lies, and is obtained from your probability calculator or tables, if that is the case
P(Z <z_{a}) = 1a and α is the level of significance. z_{d}=^{d}/p_{d}o^{[metro}1^{−metro}0^{]}/p_{d}  metro_{0}is the mean of the reference population (under H_{0}),
 metro_{1}is the mean of the test population
 p_{d}= population standard error of d. For a zσ test_{d}= σ/√n, the standard error of the reference population mean, usually calculated as the standard deviation of the reference population observations (σ) divided by the square root of the number of observations in the sample (n) .
Sample size estimation
We rearrange the power formula to obtain the number of samples needed to obtain a given power.
For a onesided test:
norte  =  (z_{a}+z_{b})^{2}p^{2} 
(metro_{1} metro_{0})^{2} 
 (z_{a}is obtained from your probability calculator or tables, if available
P(Z <z_{a}) = 1  and and α is the level of significance.  z_{b}is obtained from your probability calculator or tables, if that is the case
P(Z <z_{b}) = 1 − β y 1  b It's the power  metro_{0}is the known population mean,
 metro_{1}is the mean of the test population
 σ is the known population standard deviation of the observations
For a twotailed test we use an approximation and use, for example_{A'2}instead of for example_{a}. This ignores the possibility of a Type III error, but usually does not result in a fatal error in the case of large treatment effects.
The following values of for example_{a}, y Z_{b}are the most commonly used in sample size calculations:

performance (eg._{b})  
80%  90%  95% 
0,8416  1.2816  1.6449 
assumptions
It makes a number of assumptions when estimating power and the required sample size. The first set of assumptions applies to all significance tests, namely:
 Samples are drawn at random or individuals are randomly assigned to treatment groups.
 The observations are independent of each other.
The second set of assumptions is specific to the z test:
 The response variable approximates a normal distribution.
 The true population mean and standard deviation are known and are not estimated from a sample.