Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low within-unit variation in independent variable – does using 2-year change help?

    Hello all,

    I’m working with a balanced panel dataset and am interested in estimating the effect of change in a independent variable (X) on a dependent variable (Y).

    However, the independent variable in levels appears to have low within-unit variation, as shown below:

    Code:
    Variable         |      Mean   Std. dev.       Min        Max |    Observations
    -----------------+--------------------------------------------+----------------
    X        overall |  .4074      .0444         .2675     .5905  |     N = 1152
             between |             .0435         .2800     .5768  |     n = 192
             within  |             .0092         .3706     .4657  |     T = 6

    To address this, I constructed a 2-year change variable (which indicates the difference of X's current value from the value of 2 years beofre), and obtained the following:

    Code:
    Variable         |      Mean   Std. dev.       Min        Max |    Observations
    -----------------+--------------------------------------------+----------------
    X_change2 overall|  .0057      .0105        -.0266     .0702  |     N = 1152
               between|            .0050        -.0102     .0219  |     n = 192
               within |            .0092        -.0232     .0593  |     T = 6

    As you can see, the within-unit variation in the change variable is much larger relative to the level variable.

    Since my goal is to assess whether changes in X over time influences Y, I’m considering using the 2-year change variable in a fixed/random effects model.

    So, my question is: Does using the 2-year change variable adequately address the problem of low within-unit variation in X?

    Many thanks!

    Eran


  • #2
    In macroeconomic panels, for example, you often see people taking 5-year or 10-year averages of slow-moving variables like population in fixed effects regressions as a robustness check. Here, slow-moving variables have limited within variation, which can make their coefficients imprecise or unstable. Taking multi-year averages can highlight more meaningful medium- or long-term trends, which I think is what you have in mind. In your case, your \(T\) dimension is not that large, so maybe the benefit is marginal. Here is an illustration using the Grunfeld dataset:

    Code:
    webuse grunfeld, clear
    xtreg invest mvalue kstock, absorb(year) fe
    
    sort year
    *CREATE 5-YEAR AVERAGES
    gen period= ceil((year-year[1])/5)
    *GENERATE PERIOD AVERAGES
    collapse invest mvalue kstock, by(company period)
    xtset company period
    xtreg invest mvalue kstock, absorb(period) fe
    Res.:

    Code:
    . xtreg invest mvalue kstock, absorb(year) fe
    
    Halperin APM for regression coefficients:
    
    Dependent variable:
    Iteration 1:  Maximum absolute difference = 1.072e-12
    
    Independent variables:
    Iteration 1:  Maximum absolute difference = 1.599e-12
    
    Halperin APM for panel effects:
    Iteration 1:  Maximum absolute difference =       153
    Iteration 2:  Maximum absolute difference = 4.263e-14
    
    Fixed-effects (within) regression               Number of obs     =        200
    Group variable: company                         Number of groups  =         10
    
    R-squared:                                      Obs per group:
         Within  = 0.7985                                         min =         20
         Between = 0.8143                                         avg =       20.0
         Overall = 0.8025                                         max =         20
    
                                                    F(2, 169)         =     217.44
    corr(u_i, Xb) = -0.3186                         Prob > F          =     0.0000
    
    --------------------------
    Absorbed variable | Levels
    ------------------+-------
                 year |     20
    --------------------------
    ------------------------------------------------------------------------------
          invest | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          mvalue |   .1177158   .0137513     8.56   0.000     .0905694    .1448623
          kstock |   .3579163    .022719    15.75   0.000     .3130667    .4027659
           _cons |  -80.16378   14.84402    -5.40   0.000    -109.4674   -50.86019
    -------------+----------------------------------------------------------------
         sigma_u |  91.798268
         sigma_e |  51.724523
             rho |  .75902159   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(9, 169) = 54.14                     Prob > F = 0.0000
    
    . 
    . collapse invest mvalue kstock, by(company period)
    
    . 
    . xtset company period
    
    Panel variable: company (strongly balanced)
     Time variable: period, 0 to 4
             Delta: 1 unit
    
    . 
    . xtreg invest mvalue kstock, absorb(period) fe
    
    Halperin APM for regression coefficients:
    
    Dependent variable:
    Iteration 1:  Maximum absolute difference = 1.615e-13
    
    Independent variables:
    Iteration 1:  Maximum absolute difference = 8.886e-13
    
    Halperin APM for panel effects:
    Iteration 1:  Maximum absolute difference =     159.9
    Iteration 2:  Maximum absolute difference = 4.263e-15
    
    Fixed-effects (within) regression               Number of obs     =         50
    Group variable: company                         Number of groups  =         10
    
    R-squared:                                      Obs per group:
         Within  = 0.8648                                         min =          5
         Between = 0.8055                                         avg =        5.0
         Overall = 0.8133                                         max =          5
    
                                                    F(2, 34)          =      65.98
    corr(u_i, Xb) = -0.0427                         Prob > F          =     0.0000
    
    --------------------------
    Absorbed variable | Levels
    ------------------+-------
               period |      5
    --------------------------
    ------------------------------------------------------------------------------
          invest | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          mvalue |   .0973246   .0337442     2.88   0.007     .0287481    .1659011
          kstock |   .3498738   .0475963     7.35   0.000     .2531466    .4466011
           _cons |  -50.50475   30.16873    -1.67   0.103     -111.815     10.8055
    -------------+----------------------------------------------------------------
         sigma_u |  83.641639
         sigma_e |  44.796401
             rho |   .7770968   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(9, 34) = 15.43                      Prob > F = 0.0000
    
    .
    Comparing annual data estimates to 5-year averages, there is very little change in coefficient magnitude for capital stock (kstock) (from 0.358 to 0.350), suggesting the association between kstock and investment (invest) is stable and robust across time frequencies. However, for market valuation (mvalue), the coefficient drops from 0.118 to 0.097, about a 17–18% reduction. This suggests the association between mvalue and invest is more sensitive to short-run variation.


    Comment


    • #3
      Thank you Andrew Musau! So I understand that fixed-effects will be problematic here. Will random-effects be more reliable?

      Comment


      • #4
        Originally posted by Eran Itskovich View Post
        Thank you Andrew Musau! So I understand that fixed-effects will be problematic here.
        Not at all. As long as there is some within-unit variation, fixed effects perform well. In my illustration in #2, the results remain robust despite a slight decrease in the magnitude of the coefficient on mvalue. We can still conclude that larger companies (i.e., those with higher market valuations) tend to attract higher levels of investment. Your primary analysis should use annual data, while the analysis using multi-year averages should serve as a robustness check.

        Will random-effects be more reliable?
        Random effects rely on a strong assumption: that there is no correlation between the unobserved individual effects and the explanatory (right-hand-side) variables. This assumption is often violated in observational data. Unless you are working with experimental data, it is safer to assume that this condition does not hold. An alternative approach is correlated random effects (also known as the Mundlak regression). You can implement this using the -cre- option in xtreg.

        Code:
        help xtreg

        Comment

        Working...
        X