There are many endogenous variables and virtual variables in tool variable estimation. How to enter them?
In econometrics, if we have a large number of high-quality data, then if all variables do not violate the classical assumptions. The estimated parameters will be unbiased and consistent under large samples. Let's look at the classic assumption: ols 1: The model is linear with respect to the parameters to be estimated. Ols2: the data source of the model. For general cross-sectional data, it is independent and identically distributed. Ols3:E(uX)=0. There is no endogenous hypothesis. There is no complete multiple linear relationship between ols 4:x OLS 5: VAR (UX) = A 2 (A is a constant). Ols6: Residual obeys independent and identical normal distribution. Among them, OLS 1-OLS 4 is to ensure that the estimated parameters are consistent. The third hypothesis is endogenous hypothesis. Reality description: In econometrics, we need to estimate the bias effect. In other words, the influence of independent variables on dependent variables. If this independent variable has nothing to do with random error, then the estimated parameters of ols obtained by us will be consistent, and it can be said that the effect is good. But this is not the case in reality. In reality, variables are generally endogenous variables, that is to say, the two variables are not unilaterally determined, but mutually determined. So generally speaking, as long as we have errors or missing variables, there may be endogenous problems, that is, we cannot get consistent estimates. Proxy variables and tool variables: What are proxy variables? -Solutions for missing variables. In an equation, suppose: y = B0+b1* x1+...+bn * xn+U. The variable X in the equation has nothing to do with random error, or we can tolerate a certain degree of correlation, then we can say that the ols estimation of the parameters is satisfactory, but if we can know that some variables in U are related to X and this is missing, If we can find a variable related to the missing variable Q in U, but this variable has nothing to do with X, then we can add this missing variable to the equation for regression. Suppose we find a variable that can reflect Q to some extent, or a set of variables Z, then we can substitute this Z into the equation to do ols. The estimated values of the obtained parameters are better than the original values. But there is a problem here, that is, Z is never Q, so there is no way to fully represent Q to some extent. This will also lead to some inconsistencies in the estimated parameters, but it is better than the original parameters estimated without z, but under certain circumstances, we can know whether it is overestimated or underestimated. Because q = A0+A1* X1+A2 * X2 ...+An * Xn+C1* Z65438+C2 * Z2 ...+CK * ZK. Bring this equation into the original equation (y = B0+b1* x1+...+bn * xn+c * q+u). Then we can get that the estimated value of bi is bi+ai. In fact, this estimate is also biased. In fact, the deviation of parameter estimation depends on two factors. First, the relationship between variables Q and Z is omitted, that is, whether the covariance is positive or negative. Second, it depends on the relationship between q and y If: cov(q, z)>0 and cov (q, y) >; 0, biased upward. If: cov(q, z)>0 and cov (q, y) 0,2, cov(z, u)=0. When these two conditions are met, we First, xi regresses X (excluding xi) and the set of tool variables (there may not be one tool variable, but there may be more than ten tool variables, so the tool variables may be a set), and a fitted xi is obtained through regression. At this time, do y to x (where xi is replaced by the quasi-sum value obtained by regression just now). The regression made at this time is consistent. Now discuss the problem of hidden variables: how to solve the problem of hidden variables with the help of instrumental variables? Generally speaking, the problem of hidden variables can be solved by the proxy variables mentioned above, but the results are biased and inconsistent. Although it is better than when it is useless, if conditions permit, then we can use the method of tool variables to get a better result than proxy variables. This condition is: if we know that the hidden variable Q cannot be accurately measured or there is no recognized evaluation standard, then we can use other indicators related to Q to carry out the tool variable, but there must be two related measurable observations, and these two observations cannot have measurement errors. At this time, we can get a regression model with measurement error by casually bringing an observation index into the equation. At this time, the problem is solved just like solving the measurement error, assuming that Q 1 and Q2 are different indicators. Then we can do the regression of Q 1 to x and q2 and get the fitting sum. 2. We are doing a fitting regression from y to x and q 1. At this point, you get a consistent estimator.