Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about weird instrument result with xtxtivreg

    I am using a measure of immigration inflow (typically called the shift share instrument) to predict population by municipality, and using population to predict housing prices. Immigrants are divided into two ancestry groups, western and non-western. The instrument can be constructed using either western or non-western immigrants. I am using the Stata command xtxtivreg. The first stage reports that non-western immigrants are a much better instrument by R-squared, however the coefficients reported in the instrumental regression are all the same. Shouldn't a worse instrument produce a lower coefficient? In the attached image, i have first the non-western first sage, then the non-western instrumental regression, and then the same for western.

    Click image for larger version

Name:	ikkevestlig 1.PNG
Views:	1
Size:	164.7 KB
ID:	1734294

    Click image for larger version

Name:	ikkevestlig 2.PNG
Views:	1
Size:	158.2 KB
ID:	1734295

    Click image for larger version

Name:	Vestlig 1.PNG
Views:	1
Size:	161.4 KB
ID:	1734296

    Click image for larger version

Name:	Vestlig 2.PNG
Views:	1
Size:	159.8 KB
ID:	1734297

  • #2
    There are no results I know of that suggest the magnitude of the coefficient in the second stage depends on the R-squared from the first stage. Having said that, everything is identical -- including the first stage R-squareds -- which makes me think you're actually using the same instrument. What's the correlation between the instruments?

    These IVs are so strong I bet you'll get a similar answer if you don't instrument. Also, you should cluster yours standard errors by omr.

    Comment


    • #3
      Originally posted by Jeff Wooldridge View Post
      There are no results I know of that suggest the magnitude of the coefficient in the second stage depends on the R-squared from the first stage. Having said that, everything is identical -- including the first stage R-squareds -- which makes me think you're actually using the same instrument. What's the correlation between the instruments?

      These IVs are so strong I bet you'll get a similar answer if you don't instrument. Also, you should cluster yours standard errors by omr.
      Thanks for the answer.
      The correlation between the two instruments is .8182686 (coefficient when doing a simple reg of one instrument on the other). R-squared is 0.6917. So quite high, but not enough that the results would be identical when using the two different instruments. The shift-share i'm using is the same one as the first equation in the paper "Shift-Share Instruments and the Impact of Immigration".
      Last edited by Carl Kier; 20 Nov 2023, 08:26.

      Comment


      • #4
        The chance that 18 estimates would agree to 7 decimal places is essentially zero, unless the variables are defined to make them equivalent. I don't know why the IVs give identical results, but you don't have two different sets of estimates. You have one, and now you need to figure out why. Is one instrument the other a term that varies only by time, or only by cross section. The simple regression doesn't tell you what you want. You need to do two-way fixed effects, just as when you estimate the model. What happens when you do the following?

        Code:
        xtreg z1 z2, i.year, fe
        where z1 and z2 are the different instruments? I wouldn't be surprised if this gives a perfect fit.

        Comment


        • #5
          Originally posted by Jeff Wooldridge View Post
          The chance that 18 estimates would agree to 7 decimal places is essentially zero, unless the variables are defined to make them equivalent. I don't know why the IVs give identical results, but you don't have two different sets of estimates. You have one, and now you need to figure out why. Is one instrument the other a term that varies only by time, or only by cross section. The simple regression doesn't tell you what you want. You need to do two-way fixed effects, just as when you estimate the model. What happens when you do the following?

          Code:
          xtreg z1 z2, i.year, fe
          where z1 and z2 are the different instruments? I wouldn't be surprised if this gives a perfect fit.
          Yes, that regression does give a coefficient of 1, indicating that they are the same. After some fiddling around, I have found a clue as to why. With both instruments, I'm taking log because I'm using them to instrument for the log of population. When logged, the regression you gave gives a coefficient of 1, but this is not the case when I use the non-log version of the instruments. Indeed, in that case the coefficient is only 0.185. Why this is the case I don't know though. Is there some kind of problem with using log with instruments?

          Comment


          • #6
            They aren't the same instrument, but the difference is only time-varying or only cross-sectional varying. There's nothing wrong with the log; it's preferred for the reason you said. If z1(i,t) = z2(i,t)*q(t), where q(t) is a function only of time, then log(z1(i,t)) = log(z2(i,t)) + log(q(t)) and the second part is accounted for by the time fixed effects. Then the IV estimates are identical. Is something like that going on? They different only due to some time-varying multiplicative term? Or, even if you have z1(i,t) = z2(i,t)*q(t)r(i) the same result holds.

            Comment


            • #7
              Originally posted by Jeff Wooldridge View Post
              They aren't the same instrument, but the difference is only time-varying or only cross-sectional varying. There's nothing wrong with the log; it's preferred for the reason you said. If z1(i,t) = z2(i,t)*q(t), where q(t) is a function only of time, then log(z1(i,t)) = log(z2(i,t)) + log(q(t)) and the second part is accounted for by the time fixed effects. Then the IV estimates are identical. Is something like that going on? They different only due to some time-varying multiplicative term? Or, even if you have z1(i,t) = z2(i,t)*q(t)r(i) the same result holds.
              This explains it. They differ both across time and cross-sectionally, but due to the way the instruments are constructed, the equation z1(i,t) = z2(i,t)*q(t)r(i) should hold. When using a non-log specification, it is clear that the instrument using non-western immigrants is a far stronger instrument. So presumably the instrument using western immigrants is only strong in the log specification because it can be related to the instrument using non-western immigrants with that equation in a fixed effects specification with time dummies. Is there some kind of problem with this? Or should I just go ahead and use either one of them (I'm inclined to use the instrument using non-western immigrants)?

              I do have another question. If the two instruments produce the same estimates, why does Stata report a different between and overall R-squared in the first stage with the two different instruments?

              Comment


              • #8
                The nice thing about using the log is that you're not having to choose between two IVs. There's only one IV that's relevant if you're using fixed effects and it doesn't matter what you're calling it. The larger R-squared with one IV is a mirage: it doesn't matter one bit. One IV is getting credit because it has extra variation (across time, I'm guessing). But it's all irrelevant. I do wonder why these two definitions differ only by factors that's don't change by (i,t). That's unexpected to me.

                Comment


                • #9
                  Originally posted by Jeff Wooldridge View Post
                  The nice thing about using the log is that you're not having to choose between two IVs. There's only one IV that's relevant if you're using fixed effects and it doesn't matter what you're calling it. The larger R-squared with one IV is a mirage: it doesn't matter one bit. One IV is getting credit because it has extra variation (across time, I'm guessing). But it's all irrelevant. I do wonder why these two definitions differ only by factors that's don't change by (i,t). That's unexpected to me.
                  I will just briefly explain how the instrument is constructed. This is a variant of a commonly used instrument called the Shift-Share instrument.
                  Click image for larger version

Name:	Instrument.PNG
Views:	1
Size:	4.6 KB
ID:	1734726

                  The first division is the number of immigrants and descendents M of origin o in municipality i in the base year, which in my case is 1997, divided by the total number immigrants and descendents of origin o in the entire country in the base year. This is called the share. The second division is the sum of residence permits granted to people of origin o from 1997 up to time t (in the entire country), divided by the population in municipality i in the previous year. It is a weighted average of the inflow rates from each origin (the “shift”), with weights that depend on the distribution of earlier immigrants in the base year (the “shares”).
                  Usually this instrument is constructed as a sum of the different instruments for each origin, but I found that this did not work as well (though it still worked pretty well, within R-squared is 0.7808 in the first stage) as simply using one origin.
                  Then it is clear that z1(i,t) = z2(i,t)*q(t)r(i) should hold. The first division, only differs across i. The sum of residence permits differs only across t (as it is a national level measure). L_it-1 differs across both time and municipality but is of course the same for both instruments. So it should hold.

                  Comment

                  Working...
                  X