Confusion with a fixed effect regression for panel data with lags/leads

James Glass

Join Date: Apr 2018
Posts: 24

Confusion with a fixed effect regression for panel data with lags/leads

12 Aug 2018, 09:36

I performed a fixed effects regression but only got around 7'000 observations when I should get more than 100'000.

I am using stata 15.

For extra information, I am using panel data to see the effects that the act of internal migration has on a person's subjective well being.
I think an issue is that I am using the explanatory variable M0 ( shows if the person internally migrated) and its lags and leads ( e.t.c L3.M0, F3.M0). The problem is that as I created this from the original variable (Movest) which includes options such as 'new entrant' and 'moved back to GB', will this affect my results ( because I seem to ignore if they were a new entrant or moved back to UK) and are there recommended solutions I can do for this?

This is what I got from the command:

xtreg lfsato age nkids mastat L5.M0 L4.M0 L3.M0 L2.M0 L1.M0 M0 F1.M0 F2.M0 F3.M0 F4.M0 F5.M0, fe

HTML Code:

Fixed-effects (within) regression               Number of obs     =      7,667
Group variable: pid                             Number of groups  =      4,027

R-sq:                                           Obs per group:
     within  = 0.0083                                         min =          1
     between = 0.0274                                         avg =        1.9
     overall = 0.0238                                         max =          2

                                                F(14,3626)        =       2.17
corr(u_i, Xb)  = -0.4845                        Prob > F          =     0.0068

------------------------------------------------------------------------------
      lfsato |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0483116   .0175464     2.75   0.006     .0139098    .0827133
       nkids |  -.1035138   .0644501    -1.61   0.108     -.229876    .0228483
      mastat |  -.0206515   .0396681    -0.52   0.603    -.0984256    .0571225
             |
          M0 |
         L5. |   .1455779   .0881316     1.65   0.099    -.0272146    .3183703
         L4. |   .2665378   .1237939     2.15   0.031     .0238253    .5092504
         L3. |   .2677237   .1488129     1.80   0.072    -.0240415     .559489
         L2. |   .0835615   .1711475     0.49   0.625    -.2519934    .4191163
         L1. |    .145695   .1847631     0.79   0.430    -.2165549     .507945
         --. |   .1794107   .1960381     0.92   0.360    -.2049451    .5637665
         F1. |  -.1019313    .203507    -0.50   0.616    -.5009308    .2970682
         F2. |  -.3197264   .2059309    -1.55   0.121    -.7234784    .0840255
         F3. |  -.2678047   .1987484    -1.35   0.178    -.6574743     .121865
         F4. |  -.1905153   .1841593    -1.03   0.301    -.5515814    .1705508
         F5. |    -.02172   .1577854    -0.14   0.891     -.331077     .287637
             |
       _cons |   2.857905   .9339647     3.06   0.002     1.026757    4.689053
-------------+----------------------------------------------------------------
     sigma_u |   1.224461
     sigma_e |   .7531382
         rho |  .72552085   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4026, 3626) = 3.70                  Prob > F = 0.0000

and here is a snippet of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long pid byte lfsato int age byte(nkids mastat movest) float M0
10007857 6 64 0 3 4 0
10007857 5 65 0 3 1 0
10007857 5 66 0 3 1 0
10007857 . 67 0 3 1 0
10014578 5 59 0 1 1 0
10014578 7 61 0 1 1 0
10014578 7 62 0 1 1 0
10014578 6 63 0 1 1 0
10014578 6 65 0 1 1 0
10014578 6 66 0 1 1 0
10014578 5 67 0 1 1 0
10014578 5 68 0 1 1 0
10014578 6 69 0 1 1 0
10014578 7 70 0 1 2 1
10014578 6 71 0 1 1 0
10014608 5 62 0 1 1 0
10014608 6 64 0 1 1 0
10014608 6 65 0 1 1 0
10014608 5 66 0 1 1 0
10014608 6 68 0 1 1 0
10014608 6 69 0 1 1 0
10014608 6 70 0 1 1 0
10014608 5 71 0 1 1 0
10014608 3 72 0 1 1 0
10014608 6 73 0 1 2 1
10014608 3 74 0 1 1 0
10016813 5 41 1 5 1 0
10016813 . 43 0 4 1 0
10016813 6 44 0 4 1 0
10016813 . 45 0 4 1 0
10016813 . 47 0 4 1 0
10016813 . 49 0 3 1 0
10016848 2 37 0 5 2 1
10016848 1 39 0 4 1 0
10016848 4 40 0 2 1 0
10016848 3 41 0 2 1 0
10016848 . 42 0 2 1 0
10016848 6 43 0 2 1 0
10016848 6 44 0 2 1 0
10016848 5 45 0 2 1 0
10016848 5 46 0 1 1 0
10016848 6 47 0 1 1 0
10016848 5 48 0 1 1 0
10016848 4 49 0 1 1 0
10016872 . 17 0 6 1 0
10016872 6 18 0 6 1 0
10016872 . 19 0 6 1 0
10016872 . 19 0 6 1 0
10016872 6 21 0 6 1 0
10016872 6 22 0 2 1 0
10016872 . 23 0 2 1 0
10016872 6 24 0 2 1 0
10016872 6 25 0 2 1 0
10016872 6 26 0 2 1 0
10016872 6 27 1 2 2 1
10017933 6 54 0 5 1 0
10017933 6 55 0 4 1 0
10017933 6 56 0 4 1 0
10017933 3 56 0 4 1 0
10017933 5 58 0 4 1 0
10017933 . 59 0 4 1 0
10017933 5 60 0 4 1 0
10017933 6 60 0 4 1 0
10017933 4 62 0 4 1 0
10017933 6 63 0 4 1 0
10017933 5 64 0 4 1 0
10017933 6 65 0 4 1 0
10017933 6 66 0 4 1 0
10017992 5 17 0 6 1 0
10017992 6 18 0 6 1 0
10017992 5 19 0 6 1 0
10017992 4 19 0 6 1 0
10017992 6 21 0 6 1 0
10017992 . 22 0 6 1 0
10017992 5 23 0 6 1 0
10017992 . 23 0 6 1 0
10017992 5 25 0 6 1 0
10017992 7 26 0 6 1 0
10017992 4 27 0 6 1 0
10017992 5 28 0 6 1 0
10019057 . 64 0 6 2 1
10019057 6 65 0 6 1 0
10019057 6 66 0 6 1 0
10019057 6 67 0 6 1 0
10019057 5 67 0 6 1 0
10019057 . 68 0 6 1 0
10019057 5 69 0 6 1 0
10019057 . 71 0 6 1 0
10019057 6 71 0 6 1 0
10019057 5 73 0 6 1 0
10019057 6 74 0 6 1 0
10019057 6 75 0 6 1 0
10019057 5 76 0 6 1 0
10023526 4 43 0 4 1 0
10023526 5 44 0 4 1 0
10023526 . 48 0 1 1 0
10023526 5 49 0 4 1 0
10023526 5 50 0 4 1 0
10023526 5 51 0 4 1 0
10023526 5 52 0 4 1 0
end
label values lfsato flfsato
label def flfsato 1 "not satisfied at all", modify
label def flfsato 7 "completely satisfied", modify
label values age fage
label values nkids fnkids
label def fnkids 0 "none", modify
label values mastat fmastat
label def fmastat 1 "married", modify
label def fmastat 2 "living as couple", modify
label def fmastat 3 "widowed", modify
label def fmastat 4 "divorced", modify
label def fmastat 5 "separated", modify
label def fmastat 6 "never married", modify
label values movest fmovest
label def fmovest 1 "non-mover", modify
label def fmovest 2 "mover within gb", modify
label def fmovest 4 "mover back to gb", modify

This follows on from where I got rid of individuals that migrated multiple time
https://www.statalist.org/forums/for...multiple-times

Thanks for the help.

Tags: fixed effects, panel data, regression

William Lisowski

Join Date: Dec 2014
Posts: 10150

12 Aug 2018, 09:57

Consider your sample data for pid 10014578. You have 11 observations. Of these, only the sixth observation will have nonmissing values for all 10 lags and leads of M0. So xtreg will omit 10 of those 11 observations from the model, retaining only the sixth observation.

Starting with your sample data, we'll generate a fake "time" variable because you neglected to include it in your sample data, and then apply xtset and xtdescribe and review the results.

Code:

. sort pid, stable

. by pid: generate time=_n

. xtset pid time
       panel variable:  pid (unbalanced)
        time variable:  time, 1 to 13
                delta:  1 unit

. xtdescribe

     pid:  10007857, 10014578, ..., 10023526                 n =         10
    time:  1, 2, ..., 13                                     T =         13
           Delta(time) = 1 unit
           Span(time)  = 13 periods
           (pid*time uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         4       4       7        11        12      13      13

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+---------------
        3     30.00   30.00 |  11111111111..
        2     20.00   50.00 |  111111111111.
        2     20.00   70.00 |  1111111111111
        1     10.00   80.00 |  1111.........
        1     10.00   90.00 |  111111.......
        1     10.00  100.00 |  1111111......
 ---------------------------+---------------
       10    100.00         |  XXXXXXXXXXXXX

So we have two pids with 13 periods, which will each yield 3 observations in xtreg; two pids with 12 periods, which will each yield 2 observations, and three pids with 11 periods, which will each yield 1 observation. All the other pids do not yield any observations. So from your 100 original observations, you will have 3*2 + 2*2 + 1*3 = 13 observations in xtreg.

Note that your xtreg output told you that you apparently got at most 2 observations per pid.

Last edited by William Lisowski; 12 Aug 2018, 09:59.

Comment

James Glass

Join Date: Apr 2018

Posts: 24
#3

12 Aug 2018, 10:15

Originally posted by William Lisowski View Post

[/CODE]
So we have two pids with 13 periods, which will each yield 3 observations in xtreg; two pids with 12 periods, which will each yield 2 observations, and three pids with 11 periods, which will each yield 1 observation. All the other pids do not yield any observations. So from your 100 original observations, you will have 3*2 + 2*2 + 1*3 = 13 observations in xtreg.

Note that your xtreg output told you that you apparently got at most 2 observations per pid.

Thanks you for this, is there a way to take into account all observations per pid?
Comment
James Glass

Join Date: Apr 2018

Posts: 24
#4

12 Aug 2018, 12:27

Completely my bad, I understand it a bit more. Would the reduced amount of observations in an issue for my regression?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

12 Aug 2018, 13:51

I cannot comment. My own thought is that there might be a better way to parameterize this. Each observation represents SWB at some point before or after a move, except for the pids that do not move. And in that case, I don't think they tell you anything about the effect of migration. So I'd be inclined to drop nonmovers and replace your lags and leads with something that indicates the number of years before or number of years after the move.

Beyond that, your outcome would perhaps be better modeled with an ordered logit or ordered probit model.
Comment
James Glass

Join Date: Apr 2018

Posts: 24
#6

13 Aug 2018, 10:28

Originally posted by William Lisowski View Post

So I'd be inclined to drop non-movers and replace your lags and leads with something that indicates the number of years before or number of years after the move.

Thank you for the suggestion. I am trying it in the case of only movers, is there a command for me to indicate number of years before and after a move?
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

13 Aug 2018, 10:45

The following should point you in a useful direction.

Code:

sort pid, stable
by pid: generate time = _n
bysort pid (time): egen movetime = max(cond(M0==1,time,.))
generate years = time-movetime
list if pid <= 10014578, nolabel sepby(pid)

Code:

. list if pid <= 10014578, nolabel sepby(pid)

     +----------------------------------------------------------------------------------+
     |      pid   lfsato   age   nkids   mastat   movest   M0   time   movetime   years |
     |----------------------------------------------------------------------------------|
  1. | 10007857        6    64       0        3        4    0      1          .       . |
  2. | 10007857        5    65       0        3        1    0      2          .       . |
  3. | 10007857        5    66       0        3        1    0      3          .       . |
  4. | 10007857        .    67       0        3        1    0      4          .       . |
     |----------------------------------------------------------------------------------|
  5. | 10014578        5    59       0        1        1    0      1         10      -9 |
  6. | 10014578        7    61       0        1        1    0      2         10      -8 |
  7. | 10014578        7    62       0        1        1    0      3         10      -7 |
  8. | 10014578        6    63       0        1        1    0      4         10      -6 |
  9. | 10014578        6    65       0        1        1    0      5         10      -5 |
 10. | 10014578        6    66       0        1        1    0      6         10      -4 |
 11. | 10014578        5    67       0        1        1    0      7         10      -3 |
 12. | 10014578        5    68       0        1        1    0      8         10      -2 |
 13. | 10014578        6    69       0        1        1    0      9         10      -1 |
 14. | 10014578        7    70       0        1        2    1     10         10       0 |
 15. | 10014578        6    71       0        1        1    0     11         10       1 |
     +----------------------------------------------------------------------------------+

Comment

James Glass

Join Date: Apr 2018

Posts: 24
#8

13 Aug 2018, 12:24

Thank you, but what if the data is unbalanced ( as I have). Is there a way to incorporate this?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#9

13 Aug 2018, 12:47

Stata handles unbalanced panel data naturally. The documentation is perhaps a little more matter-of-fact about it than it should be - the only explicit acknowledgement I can find is in Example 2 in the xt chapter in the Stata Longitudinal-Data/Panel-Data Reference Manual PDF accessible from Stata's Help menu.

I guess they think concerns about balanced v. unbalanced are passé but they underestimate the inertia of received knowledge.
Comment

James Glass

Join Date: Apr 2018
Posts: 24

#10

13 Aug 2018, 14:02

I meant for the method that you proposed to indicate the number of years before and after a move,
As you can see, wave 7 and 11 are missing but the corresponding value for 'year' doesn't take this into account, is there a way to do this?

HTML Code:

     +-------------------------------------------------------------------------+
       | lfsato        pid   age   wave   M0   time   movetime   years   regyear |
       |-------------------------------------------------------------------------|
    1. |      5   10014578    59      6    0      1         10      -9         . |
    2. |      7   10014578    61      8    0      2         10      -8         . |
    3. |      7   10014578    62      9    0      3         10      -7         . |
    4. |      6   10014578    63     10    0      4         10      -6         . |
    5. |      6   10014578    65     12    0      5         10      -5         . |
    6. |      6   10014578    66     13    0      6         10      -4         . |
    7. |      5   10014578    67     14    0      7         10      -3         . |
    8. |      5   10014578    68     15    0      8         10      -2         . |
    9. |      6   10014578    69     16    0      9         10      -1         . |
   10. |      7   10014578    70     17    1     10         10       0         . |
   11. |      6   10014578    71     18    0     11         10       1         . |

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#11

13 Aug 2018, 14:29

The only purpose of creating and using the time variable was because your sample data in post #1 did not include any indication of wave, or year, or much of anything to make a time variable for your panel, although you had obviously used one in an xtset command to enable time series lag and lead notation. As I commented in my post #2

we'll generate a fake "time" variable because you neglected to include it in your sample data

and even in post #8 where you raised the issue of unbalancedness you did not make it clear what your problem was, nor provide sample data to illustrate what your data is completely like.

Having been given the sample code in post #7, your task now is to read it and understand it, using the help files and manuals to figure out how it works, and if those aren't sufficient, to reply back with questions on specific issues that you've been unable to figure out. It's not very difficult code to figure out, especially with all the intermediate results available in the data.

Others may have other motivations for responding on Statalist, but my hope is to help other members become better Stata programmers, as others have helped me. Teaching them to fish, as the old adage goes, rather than giving them a fish.
1 like
Comment
James Glass

Join Date: Apr 2018

Posts: 24
#12

13 Aug 2018, 17:18

Thank you so far with the help
Comment

Announcement

Confusion with a fixed effect regression for panel data with lags/leads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment