However, if the selected samples are not random, the parameters estimated from these sample data can not accurately reflect the distribution of the overall properties studied, no matter how large the sample size is. However, in the study of different events, most samples are not random, because the total number of events is always quite large, even without boundaries. Therefore, most sampling can only be carried out within the limited scope and rules selected by researchers, which may lead to the deviation of sampling selection, or the relevant variables are not included in the sampling, or the irrelevant variables are included in the sampling.
For example, study the wages of working women in a region. All school-age women in this field (including employed and non-employed women) constitute a whole study. In the research, it is generally impossible to get all the information of these women, only some information about employed women can be obtained. This part of the women who provided information constituted the sample of the study.
Now, the purpose of the study is to analyze some determinants of the wage level of all school-age women (in general) if they are employed through the labor data provided by the women surveyed in the sample. The women surveyed can be randomly selected from the crowd, but only employed women can provide information on their salary level, so we can only study the sample data of employed women. Employment and non-employment can be regarded as personal decisions. If women's decision does not depend on the determinants of labor wages to be studied, then even if we only study the data of employed women, we can regard the sample as random. Because theoretically, the factors that determine whether women are employed are exogenous and do not affect the problems to be studied.
However, women's choice of employment and non-employment is often not exogenous, but partly determined by the problems they study. For example, the level of wages and the quality of working environment will obviously affect women's employment choices. In this way, taking employed women as the research sample is no longer random, but partly determined by some factors of the research question, that is to say, some problems of the research affect the selection of the sample. According to the traditional method, the parameters estimated by this sample can not reflect the nature of the population well, and there will be deviations. On the other hand, if the factors or information that determine women's employment and non-employment choices can be obtained through investigation, then the problem of sample selection deviation can be solved by adding relevant variables to the traditional analysis method. When this information is not available, traditional methods are difficult to deal with. The method invented by heckman can deal with this problem simply and conveniently. Because the non-randomness of the sample that leads to the deviation of statistical inference is caused by the personal decision of the research object (in this case, women), this problem is also called self-selection problem. Sample selection bias may also come from some decision-making or data processing by researchers.
problem solving
"heckman's two-stage model" or hershey's method is to solve the problems of deviation and self-selection.
It can be said that in all kinds of social science research, the problems of selection bias and self-selection are the most common and inevitable. Because in most empirical studies of social sciences, it is difficult to ensure the randomness of the obtained sample data. We can take heckman's first research on the wage decision of professional women as an example to illustrate this point.
/kloc-in the mid-1970s, heckman encountered the problem of selective samples in the process of studying the labor supply in the United States, which prompted him to put forward the so-called heckman amendment method (or two-stage method, hershey method and heckert method). This method is simple and applicable, and has been widely used not only in microeconomics, but also in empirical research of other social sciences.
In the field of economics, the famous application of heckman's method includes Li's research on the influence of joining a trade union on workers' relative wages in 1978. This problem involves self-selection, because whether workers join or not is a selective decision, not arbitrary. There are many factors that determine whether a worker joins a trade union, some of which are unobservable. Another famous application is the research conducted by Willis and Rosen on how education can increase wage income. Whether to receive education is also a question of self-choice.