Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two periods panel analysis issue

    Dear all,

    I have a simple question but haven't found satisfactory answer to it:
    I want to establish causality between Y and X, where Y is productivity of a city measured by average wages and X is average education at that city. Certainly, X is endogenous, ols doesn't work.
    im thinking of two methods, first is to use city fixed effects and time fixed effects model to eliminate time invariant omitted variables and control for time trends, but endogeneity caused by time variant unobservables remains. Second is to use IV, since my IV is a one period variable, doesn't fit the two periods panel, I plan to instrument for the changes of average education, like something has happened in the past would affect changes of average education later but not affect changes if productivity directly, my first step of this method is to take first-difference to obtain the changes for Y and X over time, then it becomes a cross section data, then I implement the IV I proposed.

    My question is, which method is better and why?

    Thanks a lot, this is my first post.

    Best, Jack
    Last edited by Jack Qi; 10 Jun 2017, 05:00.

  • #2
    Jack:
    welcome to the list.
    I would go first difference and then IV.
    Two remarks:
    - if your data are at city level and you try to obtain inference at individual level, your results will suffer from ecological fallacy (https://en.wikipedia.org/wiki/Ecological_fallacy);
    - when data are collected at individual level,the endogeneity in your regression model is caused by the omitted predictor -ability- (which lurks within residuals and is correlated with both the dependent variable and the vector of regressor). A suggested instrument, when data are collected at individual level, is -proximity to college- (see: http://www.stata.com/bookstore/micro...ata/index.html, pag. 178)
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo,
      Thanks for your quick reply. I have both individual and city level data. Individual data is used to create aggregated data in the following way: first, run ols of log wage on individual characteristics with city dummies:
      reg lnw bn.city bn.female edu_yr age age2,hascons cluster(city)

      where I obtain intercepts for all cities, they are the "adjusted averages" for each city controlling for individual background.
      Then I reg this average wage on city level average education, which is either obtained to the individual survey data or from census of the whole country.

      What you have suggested seems to be related to my question, but I am not very clear about how it is related. My first ols regression certainly is endogenous, but I was not interested in any coefficients from it, only the intercept, where I guess the "ability" is still involved...

      Back to my original question, maybe it is not very clear: I was debating about whether the IV model of changes (growths) is better than city and year FE model in a two period panel? I use IV in the FD model (Im still looking for such iv though).

      ivreg [lnw_(t) - lnw_(t-1)] ([edu_(t) - edu_(t-1)]=Z)
      V.S.
      ​​​​reg lnw_(ct) edu_(ct) i.city i.year

      The results from first model is much larger than the second one, I am concerned if I should care about the time trend in two period panel...

      Best, Jack

      Comment


      • #4
        Jack:
        you may be better off with posting what you typed and what Stata gave you back (as per FAQ).
        As far as your last question is concerned, fixed effect specification (assumed that it's the righ one for your dataset) removes time-invariant observed and unobserved heterogeneity only. Hence, it's up to you to decide how to deal with time-varying heterogeneity (if any).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi all,

          I have a very similar problem. In essence I have a panel data set (more than two time periods) with an endogenous regressor. I have a time-fixed instrument that predicts the cross-section of the endogenous variable very well in each time period.
          I would like to use the panel characteristics of my data (i.e. remove time-invariant individual effects) AND the instrument.
          @ Jack: If you proceed as you did, you are implicitly assuming that your instrument has a differential effect in each time-period. Precisely a linear effect in time:
          1. Endogenous regressor in year (t+1)=constant+coefficient*(t+1)*instrument + error in year (t+1)
          2. Endogenous regressor in year (t)=constant+coefficient*(t)*instrument + error in year (t)
          Subtract 2 from 1 and you will be left with exactly the first stage regression you described in your post (regressing the difference of the endogenous variable on the instrument). This interpretation does not only work with two time-periods. If you have 3 time periods, first differencing will leave you with a panel of two time periods.
          So my approach to the problem would be:
          1. Multiply your instrument with each of the years you have as time dimension of your panel. That way you create a "time-varying" instrument
          2. Take first differences of the instruments, all the regressors (including the endogenous ones) and the dependent variable
          3. You are left with a Panel data set where the time-dimension is "one smaller" (sorry don't know how to formulate correctly) and all observations are first-differences
          4. As a first step you could now run a simple reduced form pooled OLS regression of the difference in the dependent variable on the difference of the now time-variant instrument
            1. At least here you have a clear interpretation of what you are doing
          5. However, I am still not sure of how to implement an "interpretable" 2SLS with that dataset:
          My Questions:
          1. Is the above approach really correct?
          2. To incorporate an IV approach: Would it make sense to implement only step 1 from above and then proceed with "xtivreg2, fe" (now I have one instrument for each endogenous variable IN EACH PERIOD)
            1. I am asking because xtivreg2 uses the classical fixed-effects estimator (demeaned variables instead of differenced variables) and that would change the above interpretation of a linear trend
          Hope this helped.
          Best,
          Konstantin

          Comment

          Working...
          X