The origin of statistics
English statistics of statistics originated from modern Latin Statisum Collegium * * * Congress * * *, Italian statista*** national or politics * * * and German Statistik, which was first used by Gottfried Achenwall in 1749, representing the knowledge of analyzing national data, that is, "researching national science". In the19th century, statistics explored its significance in a wide range of data and materials and was introduced to the English-speaking world by John Sinclair.
Statistics is a very old science. It is generally believed that its theoretical research began in Aristotle's time in ancient Greece and has a history of more than 2300 years. It originated from the study of social and economic problems. In the development process of more than 2,000 years, statistics has experienced at least three stages of development: city-state politics, political arithmetic and statistical analysis science. The so-called "mathematical statistics" is not a new discipline independent of statistics. To be exact, it is the general name of all the new methods of collecting and analyzing data formed in the third development stage of statistics. Probability theory is the theoretical basis of mathematical statistics, but it belongs to mathematics rather than statistics.
Main terms of statistics
Statistics: The science of collecting, processing, analyzing and interpreting data and drawing conclusions from them.
Descriptive statistics * * * Descriptive statistics * * *: Statistical methods for data collection, processing and description.
Inferential statistic * * * inferential statistic * * *: A statistical method to study how to infer the overall characteristics by using sample data.
Variable * * * Variable * * *: A feature that leads to different results every time you observe it.
Classification variable * * * Classification variable * * *: The observation results show a certain type of variable.
Sequence variable * * * Rank variable * * *: Also known as ordered classification variable, the observation results show some ordered variable.
Numerical variable * * * Metric variable * * *: Also called quantitative variable, the observation result is numerical variable.
Mean * * * Mean * * *: Mean is average, sometimes especially arithmetic average, which is calculated by other methods. The solution is to add up all the numbers first, and then divide by the number of numbers, which is a way to measure the concentration trend, or the average value.
Median * * * Median * *: That is to say, to ask for the median, you need to sort from small to large first, and then see what the middle number is.
Mode ***mode***: Mode is the number that appears most frequently in the data set.
Test application of statistics
The central problem of statistics is how to explore the real situation of population according to samples. Therefore, how to extract some elements from the population to form samples and what kind of samples can best represent the population directly affect the accuracy of statistics. If the method of extracting elements is to keep the elements in the population unchanged, then the observed values are independent random variables with the same distribution as the population. Such a sample is a simple random sample, which is the best representative of the population. The process of obtaining a simple random sample is called simple random sampling.
Simple random sampling refers to repeating the same random test, that is, each test is conducted under the same set of conditions, so the possibility of what results are obtained from each test is fixed. For a finite population, simple random sampling means extracting one element at a time, putting it back and then extracting it. If it is not put back, the composition of the population will change, so the possibility of various results will change relatively when it is extracted again. As for the infinite population, there is no need to distinguish between "putting it back" and "not putting it back".
In addition to the above principles, on the other hand, whether the specific sample acquisition method can ensure the independence of the observed values is the key to the problem. Therefore, whether a sample is random or not depends on the specific sample acquisition method.
When sampling, we must choose different sampling methods according to different research purposes.
① Simple random sampling method numbers each individual first, and then draws samples from the population by drawing lots. This method is suitable for the research objects with small differences among individuals, few individuals to be selected or concentrated individual distribution.
(2) Randomly divide the population into several parts by split-slot random sampling method, and then randomly select several individuals from each part to form a sample. This sampling method can be more organized, and the distribution of selected individuals in the population is more uniform than that of simple random sampling.
(3) The systematic sampling method first systematically divides the population into several groups, and then randomly determines a starting point from the first group, such as 15 elements in each group, and decides to choose from 13 elements in the first group, so the units selected later are 28, 43, 58, 73, etc.
(4) Stratified sampling method divides the crowd into several levels or types according to the understanding of the characteristics of the crowd, and then randomly selects from each level according to a certain proportion. This method is representative, but if the hierarchy is not correct, a highly representative sample cannot be obtained. Statistical terminology