Low within-unit variation in independent variable – does using 2-year change help?

Eran Itskovich

Join Date: Aug 2019

Posts: 36
#1

Low within-unit variation in independent variable – does using 2-year change help?

25 May 2025, 00:33

Hello all,

I’m working with a balanced panel dataset and am interested in estimating the effect of change in a independent variable (X) on a dependent variable (Y).

However, the independent variable in levels appears to have low within-unit variation, as shown below:

Code:

Variable | Mean Std. dev. Min Max | Observations -----------------+--------------------------------------------+---------------- X overall | .4074 .0444 .2675 .5905 | N = 1152 between | .0435 .2800 .5768 | n = 192 within | .0092 .3706 .4657 | T = 6

To address this, I constructed a 2-year change variable (which indicates the difference of X's current value from the value of 2 years beofre), and obtained the following:

Code:

Variable | Mean Std. dev. Min Max | Observations -----------------+--------------------------------------------+---------------- X_change2 overall| .0057 .0105 -.0266 .0702 | N = 1152 between| .0050 -.0102 .0219 | n = 192 within | .0092 -.0232 .0593 | T = 6

As you can see, the within-unit variation in the change variable is much larger relative to the level variable.

Since my goal is to assess whether changes in X over time influences Y, I’m considering using the 2-year change variable in a fixed/random effects model.

So, my question is: Does using the 2-year change variable adequately address the problem of low within-unit variation in X?

Many thanks!

Eran
Tags: fixed effects, longitudinal data, low variability, panel data

Andrew Musau

Join Date: Oct 2014
Posts: 10168

25 May 2025, 03:20

In macroeconomic panels, for example, you often see people taking 5-year or 10-year averages of slow-moving variables like population in fixed effects regressions as a robustness check. Here, slow-moving variables have limited within variation, which can make their coefficients imprecise or unstable. Taking multi-year averages can highlight more meaningful medium- or long-term trends, which I think is what you have in mind. In your case, your \(T\) dimension is not that large, so maybe the benefit is marginal. Here is an illustration using the Grunfeld dataset:

Code:

webuse grunfeld, clear
xtreg invest mvalue kstock, absorb(year) fe

sort year
*CREATE 5-YEAR AVERAGES
gen period= ceil((year-year[1])/5)
*GENERATE PERIOD AVERAGES
collapse invest mvalue kstock, by(company period)
xtset company period
xtreg invest mvalue kstock, absorb(period) fe

Res.:

Code:

. xtreg invest mvalue kstock, absorb(year) fe

Halperin APM for regression coefficients:

Dependent variable:
Iteration 1:  Maximum absolute difference = 1.072e-12

Independent variables:
Iteration 1:  Maximum absolute difference = 1.599e-12

Halperin APM for panel effects:
Iteration 1:  Maximum absolute difference =       153
Iteration 2:  Maximum absolute difference = 4.263e-14

Fixed-effects (within) regression               Number of obs     =        200
Group variable: company                         Number of groups  =         10

R-squared:                                      Obs per group:
     Within  = 0.7985                                         min =         20
     Between = 0.8143                                         avg =       20.0
     Overall = 0.8025                                         max =         20

                                                F(2, 169)         =     217.44
corr(u_i, Xb) = -0.3186                         Prob > F          =     0.0000

--------------------------
Absorbed variable | Levels
------------------+-------
             year |     20
--------------------------
------------------------------------------------------------------------------
      invest | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      mvalue |   .1177158   .0137513     8.56   0.000     .0905694    .1448623
      kstock |   .3579163    .022719    15.75   0.000     .3130667    .4027659
       _cons |  -80.16378   14.84402    -5.40   0.000    -109.4674   -50.86019
-------------+----------------------------------------------------------------
     sigma_u |  91.798268
     sigma_e |  51.724523
         rho |  .75902159   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(9, 169) = 54.14                     Prob > F = 0.0000

. 
. collapse invest mvalue kstock, by(company period)

. 
. xtset company period

Panel variable: company (strongly balanced)
 Time variable: period, 0 to 4
         Delta: 1 unit

. 
. xtreg invest mvalue kstock, absorb(period) fe

Halperin APM for regression coefficients:

Dependent variable:
Iteration 1:  Maximum absolute difference = 1.615e-13

Independent variables:
Iteration 1:  Maximum absolute difference = 8.886e-13

Halperin APM for panel effects:
Iteration 1:  Maximum absolute difference =     159.9
Iteration 2:  Maximum absolute difference = 4.263e-15

Fixed-effects (within) regression               Number of obs     =         50
Group variable: company                         Number of groups  =         10

R-squared:                                      Obs per group:
     Within  = 0.8648                                         min =          5
     Between = 0.8055                                         avg =        5.0
     Overall = 0.8133                                         max =          5

                                                F(2, 34)          =      65.98
corr(u_i, Xb) = -0.0427                         Prob > F          =     0.0000

--------------------------
Absorbed variable | Levels
------------------+-------
           period |      5
--------------------------
------------------------------------------------------------------------------
      invest | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      mvalue |   .0973246   .0337442     2.88   0.007     .0287481    .1659011
      kstock |   .3498738   .0475963     7.35   0.000     .2531466    .4466011
       _cons |  -50.50475   30.16873    -1.67   0.103     -111.815     10.8055
-------------+----------------------------------------------------------------
     sigma_u |  83.641639
     sigma_e |  44.796401
         rho |   .7770968   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(9, 34) = 15.43                      Prob > F = 0.0000

.

Comparing annual data estimates to 5-year averages, there is very little change in coefficient magnitude for capital stock (kstock) (from 0.358 to 0.350), suggesting the association between kstock and investment (invest) is stable and robust across time frequencies. However, for market valuation (mvalue), the coefficient drops from 0.118 to 0.097, about a 17–18% reduction. This suggests the association between mvalue and invest is more sensitive to short-run variation.

Comment

Eran Itskovich

Join Date: Aug 2019

Posts: 36
#3

26 May 2025, 02:48

Thank you Andrew Musau! So I understand that fixed-effects will be problematic here. Will random-effects be more reliable?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10168
#4

26 May 2025, 14:26

Originally posted by Eran Itskovich View Post

Thank you Andrew Musau! So I understand that fixed-effects will be problematic here.

Not at all. As long as there is some within-unit variation, fixed effects perform well. In my illustration in #2, the results remain robust despite a slight decrease in the magnitude of the coefficient on mvalue. We can still conclude that larger companies (i.e., those with higher market valuations) tend to attract higher levels of investment. Your primary analysis should use annual data, while the analysis using multi-year averages should serve as a robustness check.

Will random-effects be more reliable?

Random effects rely on a strong assumption: that there is no correlation between the unobserved individual effects and the explanatory (right-hand-side) variables. This assumption is often violated in observational data. Unless you are working with experimental data, it is safer to assume that this condition does not hold. An alternative approach is correlated random effects (also known as the Mundlak regression). You can implement this using the -cre- option in xtreg.

Code:

help xtreg
Comment

Announcement