How to understand the problem of sample selection deviation?

1. One of the most common problems in microeconomics is sample selection. In general statistical or econometric research, the data used to estimate the parameters of the studied system depends on the samples extracted from the population.

2. If the sampled samples are random, that is, samples obtained in a way similar to "drawing lots", the parameters estimated according to these sample data can accurately reflect the relevant characteristics of the population, that is, the theoretically estimated parameters are unbiased and consistent. Moreover, the larger the sample, the more accurate the description of the overall feature distribution of the event.

3. However, if the sampling samples are not random, the parameters estimated from these sample data can not accurately reflect the distribution of the overall properties studied, no matter how large the sample size is. However, in the study of different events, most samples are not random, because the total number of events is always quite large, even without boundaries. Therefore, most sampling can only be carried out within the limited scope and rules selected by researchers, which may lead to the deviation of sampling selection, or the relevant variables are not included in the sampling, or the irrelevant variables are included in the sampling.