How to understand the local mean in instrumental variable regression

Source: Zhihu.

Change the basic principle zcw

Suppose you want to see the influence of being a soldier on your future work income, but there is endogeneity in it, so you want to find an iv, which affects the choice of being a soldier, but does not directly affect your income.

When you found out about the Vietnam War, the United States drew lots to decide who was going to be a soldier. The standard for drawing lots is the date of birth. If you are drawn, congratulations, the government let you join the army. If you are not drawn, the government will not force you to join the army.

Because the lottery is random, it does not directly affect the income, but it does affect whether you are a soldier or not, so it is a suitable tool variable.

But drawing lots can only explain part of the behavior of being a soldier. Imagine that there are four kinds of people in the world:

1, staunch patriot: If you are drawn, you will naturally join the army without hesitation; If you can't draw, you have to go if you can't create conditions.

2. Firmly oppose the war: if you can't win, you won't be a soldier; I won the lottery and would rather go to jail than be a soldier.

3, ordinary people: if you are drawn, you will go to the army; If you can't win, you won't go to the army.

4. Madman: If you win, you would rather go to jail than be a soldier; If you can't win, you will go to the army even if you die.

In this way, the influence of drawing lots on whether to be a soldier is heterogeneous. In this case, our iv estimator is late.

For example, consider a person who won the lottery and joined the army. At this time, we don't know whether he is an ordinary person or a staunch patriot, and we don't know what he would choose if he didn't win the lottery. Similarly, consider a person who didn't win the lottery or join the army. At this time, we don't know whether he is an ordinary person or a staunch anti-war. I don't know what he would choose if he won the lottery.

Assuming that a soldier's income is Y( 1) and a non-soldier's income is Y(0), then when not drawing lots, the income of four people is:

1、y( 1)

2、y(0)

3、y(0)

4、y( 1)

In the lottery, the incomes of the four people are:

1、y( 1)

2、y(0)

3、y( 1)

4、y(0)

In other words, there is no one-to-one correspondence between drawing lots and being processed, which is heterogeneous. At this time, if you subtract it directly from the change of the lottery ticket, it is found that for 1 and 2, it is directly cut off.

So what remains is the processing effect we are interested in? Y( 1)-y(0) is true for 3, but it is y(0)-y( 1) for 4. If we get an average, that is, the number of people weighted by 3 and 4, I find that the weighted number can be any number, positive number, negative number and zero, even if the average processing effect of 3 is really positive.

Let's assume there are no lunatics.

In other words, for a positive incentive, people are always more likely to do it after the incentive than before. So the fourth person is gone, and we subtract it to get the y( 1)-y(0) of the third person. This is indeed the processing effect we are interested in, but it is only the processing effect of the third person, so it is late.