Sampling error will inevitably exist when estimating population parameters with sample statistics, and its size is directly proportional to the size of individual variation (standard deviation); Is inversely proportional to the square root of the sample content. The statistical index indicating the sampling error is standard deviation [1434-0 1] or replaced by statistics [1434-02] (7), which is equivalent to the standard deviation when the average (or ratio) of each sample (assuming that there are multiple samples in the same population) is regarded as an individual, reflecting the differences from the same population. The formulas of other sampling methods are more complicated.
The significance of the difference between two groups or two groups of data will always be greater or smaller. The question is, is this difference only a reflection of sampling error or is it because they come from different populations? Is there a substantial difference? In statistical terms, it is to judge whether the difference between data is "significant". Using statistical methods to infer the nature of differences is called the significance test of differences. There are many methods of significance test, and the basic steps are as follows: firstly, assume that all the data come from the same population, that is, assume that there is no substantial difference between the data to be compared, which is called zero hypothesis; Calculate the probability of this degree difference caused by sampling error according to the original data; If it is very small, the zero hypothesis is denied according to the principle of "small probability events are actually unlikely to happen", and it is considered that "the difference is significant", that is, this difference is meaningful from a statistical point of view; On the other hand, if it is not too small, it will not deny the zero hypothesis and think that the difference is not significant, that is, fluctuations within the sampling error range cannot be ruled out. Correct application of significance test can make the conclusion of experiment or investigation based on more scientific and reliable basis and avoid simplification and absoluteness.
The probability of significance level can only be relative, and it is customary to use =0.05 as the upper limit of small probability in the difference significance test of biological data. Sometimes for the sake of strictness, it is stipulated that =0.0 1. It is called significance level, that is, when the null hypothesis is correct (type I error), the probability of falsely denying the null hypothesis. But the smaller the better. If the null hypothesis is wrong but can't be denied, its probability (type II error) will increase with the decrease of the regulation. Increasing the sample size can reduce the probability of class I or class II errors.
The simplest difference significance test for comparing two count data is to compare the sum of two counts "from the same population" under the null hypothesis system.
[1434-03] (8) Obey the standard normal distribution. In other words, > probability 1.96 < 0.05 (table 1 [tail probability summary of standard normal distribution]).
For example, "7 14" is used to treat asthmatic bronchitis, compared with aminophylline: each patient alternately uses these two drugs for a course of treatment. Half of the patients take medicine A first, and the other half take medicine B first. Results The effect of aminophylline was better in patients with 16(= 16) and in patients with "714" (= 5).
Substitute the above results into equation (8)
[1434-03a] Because >: 1.96, & lt0.05, the null hypothesis is denied, so it can be considered that the curative effects of the two drugs are different, that is, "7 14" is not as good as aminophylline.
Any significance test with standard normal distribution statistics can be called test.
You can also compare two averages:
[1434-04] (9) Where and respectively represent the average standard and content of the 1 th sample, and so on. For population variance, it is usually unknown, so the approximate formula on the right is often used. When the content of two samples+=
This test is a significance test based on statistical probability distribution (called distribution, see Table 2[ Net increase length (cm) after spraying]). When comparing two averages,
[1434-05] (10) where is the combined variance, i.e.
[ 1434-06]
[1 434-07] (11), in which the data in the sample1and the data in the second sample are represented, and the meanings of other symbols are the same as those in the previous degree of freedom = 2, and [tav] is used to represent the critical value of significance level and degree of freedom, which can be found in Table 3. If the absolute value calculated by (10) is greater than [tav], then; Know [1435-33], > 0.05, so there is no significant difference.
The confidence limit is to estimate the population by samples, and there will inevitably be sampling errors, which will lead to the problems of credibility and credibility range of statistics. If we regard statistics (or) as an individual, the overall mean is, and the standard deviation is =/[1435-0]; Whether the distribution is normal or not, as long as it is not very small, it is an approximate normal distribution, that is, = (-)/approximate standard normal distribution. So the following formula
The probability that [1435-01] (14) holds is 0.95. Replace with, and make a slight transformation, that is, the interval (range) for estimating the overall parameters by the sum of sample statistics:
The actual calculated values of the interval [1435-02] (15) vary from sample to sample, but the probability that they cover [u1], which is called the confidence level, is 95%, so the formula (15) is called the 95% confidence interval.
For example, according to the data in Table 4 [16 1 7-year-old boy's height frequency distribution], the average height of 0-7-year-old boy can be calculated as 16 1 (cm), standard deviation =4.63 and standard error. According to the formula (15), the 95% confidence limit of the overall average height of 7-year-old boys is [1 14.95, 1 15.73].
The confidence limit of the difference between the two population means can be calculated as follows:
The symbol meaning in [1435-04] (16) is the same as that in the previous formula. When the confidence level (1-) is 95%, = 0.05; = 1+2-2; Therefore, the value of [tav] can be found in Table 2[ Net increase length (cm) after spraying].
Variance analysis is also one of the basic statistical analysis methods, which is often used to analyze experimental data. It is used to test the significance of the difference of the mean value of each group, as well as the significance of the individual effect and interaction effect of multiple factors. Basic idea: the variation of normal distribution data can be divided into two parts: uncontrollable and unexplained "error" and "influence" with clear source and clear explanation. The latter can be further divided into various factors and their interactions.
Data structure grouped by different levels of factors;
Observed value = average effect+this level (group) effect+error (17) When testing the significance of data differences among groups, the null hypothesis is equivalent to "all group effects are zero"; When the null hypothesis is rejected, the substitution hypothesis is equivalent to "the effect of at least one treatment (level) is not zero".
Generally speaking, the variance between data is measured by the sum of the deviations from the mean square (recorded as), and then divided by the degrees of freedom (recorded as =/), and the degrees of freedom reflect the average degree of variance. Let each group have data, then the * * * group has =. Their total variation [1435-05] represents the first data of the first group; The variance between groups [1435-06] is the first group mean; Intra-group variation (i.e. error) [1435-07]. There is the following relationship between them:
=+ (18) can also increase their degrees of freedom:
(-1) = (-1)+(-) (19) The ratio of mean square between groups to mean square within groups =/(- 1).
=/ (20) can be used to test the significance of differences between groups. The boundary value of can be found in the table of values. The software used for variance analysis can print out a table containing and corresponding tail probability values (Table 6[ ANOVA data table in Table 5]).
For example, 30 hypertensive patients with systolic blood pressure of about 200 mm Hg were randomly divided into three groups, each group used a drug, and their blood pressure was measured after a course of treatment. The results are shown in Table 5 [Blood pressure (mmHg) of three groups of patients after medication].
The results printed by off-the-shelf computer software are shown in Table 6 [ANOVA data table in Table 5].
Data structure grouped by two factors:
Observed value = average value+row effect+column effect
+interaction effect+error (2 1), where "mean" refers to average effect, row effect refers to group effect grouped by 1 factor, and column effect refers to group effect grouped by the second factor. The meaning of interactive influence: when data are grouped by two or more factors, if the functions of these factors are not independent of each other, that is, the function of one factor changes with the level of the other factor, it is said that there is interactive influence between these two factors.
For example, the blood pressure changes of patients with three diseases after trying four drugs are shown in Table 7 [Original Data]. Each data represents a patient's drug treatment results.
Table 8 [ANOVA Table] is the result given by the computer.
From the numerical point of view, there is no significant difference among the three disease types; There are significant differences between drugs; There is no obvious interaction between drugs and disease types. The "mean" is generally significant, that is, not zero, unless it comes from the difference between paired data or the difference between two means.
Only after repeated experiments, that is, there are more than two data for the collocation of two factors at different levels, can it be possible to calculate the variation of the interactive influence term. This should be thought of when designing.
The above content is not difficult to be extended to the analysis of variance of more than three factors.
Verification of theory ── Experimental biology pays attention to experiment and investigation. Inductive theory and deductive hypothesis must be verified by practice. Because individual differences are inherent characteristics of biological data, this verification can only be statistical.