Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effects model: how do I take account of non-uniform time intervals?

    Good morning all,

    Hope you're well. Apologies if this is a very basic enquiry but I am still finding my feet in Stata and am completely new to panel data analysis.

    I am trying to use a standard fixed effects balanced panel data model to estimate the effect of lockdown on individual workers' well-being (i.e. within-variation). I am using primary survey data which was collected from approximately 700 individuals in two waves. Wave 1 occurred pre-lockdown (data was collected between Nov 2019 to Feb 2020). Wave 2 was collected during and immediately after lockdown (May -June).The data has been reshaped into long panel data format using individuals id and the time variable wave (i.e. all variables are suffixed with 1 or 2 to indicate whether they come from wave 1 or 2 e.g. parentalstatus1 parentalstatus2 etc)

    My dependent variables are various outcome variables e.g. job satisfaction ("ws"). My main independent variable is a binary dummy variable ("wave2") which indicates whether the observations originate in wave 1 (pre-lockdown) or wave 2 (during lockdown). wave2 was generated as follows:

    gen wave2=0 if wave==1
    replace wave2=1 if wave==2
    label var wave2 "wave dummy indicating whether survey was taken pre or during c19"


    My basic model specification is: xtreg ws wave2, fe vce(cluster id)

    I have two questions:

    1) How do I get the model to take account of the fact that there is a non-uniform time gap between the two surveys i.e. worker1 may have answered Survey 1 in Nov 2019 and Survey2 in June 2020 whereas worker2 may have answered Survey 1 in Feb 2020 and Survey 2 in July 2020?

    I attempted to generate a duration variable as follows:

    gen surveydate1=date(date1,"YMD###")
    gen surveydate2=date(date2,"YMD###")
    format surveydate1 %td
    format surveydate2 %td
    gen timebetweensurveys=surveydate2-surveydate1
    label var timebetweensurveys "no of days between completing surveys 1 and 2"


    Where surveydate1 is the date on which survey 1 was completed by worker i etc. The variable 'works' in that it generates the number of days between surveys which I what I am trying to capture BUT when I try and use it in my xtreg regression e.g. xtreg ws wave2 timebetweensurveys, fe vce(cluster id) it is of course omitted due to collinearity so I am back to square one! I thought about adding i.surveydate into the regression instead but that seems to mess up all my results i.e. changes the sign of the main coefficient etc. I assume this is because my main indepdent variable is essentially time variation so by introducing a time fixed effect I am using up all that variation that I need to make the model run? So my question is: how do I account for duration in my model?

    2) My second question is more general. It relates to the fact that when I run my basic model, for some (not all) of my outcome variables I am getting a Prob > F which is greater than zero and a very low within R-sq figure. See below

    . xtreg ws wave2, fe vce (cluster id)

    Fixed-effects (within) regression Number of obs = 1,238
    Group variable: id Number of groups = 621

    R-sq: Obs per group:
    within = 0.0042 min = 1
    between = 0.0028 avg = 2.0
    overall = 0.0010 max = 2

    F(1,620) = 2.60
    corr(u_i, Xb) = 0.0020 Prob > F = 0.1077

    (Std. Err. adjusted for 621 clusters in id)
    ------------------------------------------------------------------------------
    | Robust
    ws | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    wave2 | .1296596 .0804776 1.61 0.108 -.0283821 .2877014
    _cons | 5.978086 .0401738 148.81 0.000 5.899193 6.056979
    -------------+----------------------------------------------------------------
    sigma_u | 1.9463651
    sigma_e | 1.4129581
    rho | .65487919 (fraction of variance due to u_i)
    ------------------------------------------------------------------------------


    Similarly, when I just run reg ws wave2 i.id I get missing values for F and Prob > F although the R squared is quite high. The beta for wave2 is the same under both models


    . reg ws wave2 i.id,vce (cluster id)

    Linear regression Number of obs = 1,238
    F(0, 620) = .
    Prob > F = .
    R-squared = 0.7924
    Root MSE = 1.413

    (Std. Err. adjusted for 621 clusters in id)
    ------------------------------------------------------------------------------------------
    | Robust
    ws | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------------------+----------------------------------------------------------------
    wave2 | .1296596 .1139971 1.14 0.256 -.0942077 .3535269
    |
    id |
    543e85adfdf99b735690~90 | 1.5 9.82e-14 1.5e+13 0.000 1.5 1.5
    546aa9acfdf99b3f01f12~4 | 4.5 9.82e-14 4.6e+13 0.000 4.5 4.5
    547a4f58fdf99b5321ba5~4 | 2 9.82e-14 2.0e+13 0.000 2 2
    54876fe7fdf99b03e~64374 | 1 9.82e-14 1.0e+13 0.000 1 1
    54b8ea6cfdf99b34ce257~5 | -1 9.82e-14 -1.0e+13 0.000 -1 -1
    54d35e1afdf99b68c74dd~c | 1 9.82e-14 1.0e+13 0.000 1 1


    Also, I am concerned that my model is unstable as when I add in the control variable ("wwbpriority" which is a 0-10 rating of org's prioritisation of wellbeing) the main coefficient switches sign from positive (which is what I would expect i.e. the mean value of ws DOES increase between survey 1 and 2, albeit non-significantly) and the Prob > F reverts to 0.000.

    . xtreg ws wave2 wwbpriority, fe vce (cluster id)

    Fixed-effects (within) regression Number of obs = 1,234
    Group variable: id Number of groups = 621

    R-sq: Obs per group:
    within = 0.1127 min = 1
    between = 0.3198 avg = 2.0
    overall = 0.2680 max = 2

    F(2,620) = 30.59
    corr(u_i, Xb) = 0.2239 Prob > F = 0.0000

    (Std. Err. adjusted for 621 clusters in id)
    ------------------------------------------------------------------------------
    | Robust
    ws | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    wave2 | -.0523467 .0779918 -0.67 0.502 -.2055069 .1008134
    wwbpriority | .3048209 .0393393 7.75 0.000 .2275665 .3820752
    _cons | 4.313767 .2212248 19.50 0.000 3.879326 4.748208
    -------------+----------------------------------------------------------------
    sigma_u | 1.6614572
    sigma_e | 1.3367479
    rho | .60704568 (fraction of variance due to u_i)
    ------------------------------------------------------------------------------


    Does this mean that my model is unstable and could be misspecified? I am worried that I am doing something basic wrong here that I need to correct before I go any further and try to introduce further controls etc.

    Thanks in advance for taking the time to read this message. Any help or advice that you can provide would be very much appreciated.

    Diane

  • #2
    Diane:
    admittedly, I've found your post to be too long to delve into all the details you wanted to convey.
    That said, some comment follows:
    - in my opinion, the way you create -wave 2- implies a more substantive issue: is that approach usual in your reserach field?
    - in your -xtreg,fe- your within R-sq is low probably because your model is misspecified (too few predcitors and/or interaction at a very first glance);
    - with -regress- code you simply replied the -xtreg,fe- estimate. As expected, coefficient for -wave2- is the same for both codes, whereas standard errors and related stuff differ. As you wisely used clustered standard errors in -regress- to account for the panel structure of your data, F statistic is left unreported for the reason explained in -help j_robustsingular-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      1) You have already taken account of the person specific length between the two surveys. It is absorbed by the person fixed effect.

      2) Prob > F has to be greater than 0 of course, nothing unusual here. The meaning of these R-squares in xtreg and regress is different, so you cannot compare them. R-squares are not overly important anyways, R-squares are overrated.

      You have a very interesting result, that your coefficient on Lockdown reverses when you include as a control "a 0-10 rating of org's prioritisation of wellbeing." I think you have a publishable paper right there. And it makes perfect sense to me: The higher importance the organisation places on wellbeing, the higher is the job satisfaction. And this is correlated with the Lockdown variable. The fact that you are able to estimate it at all, tells me that "a 0-10 rating of org's prioritisation of wellbeing" is a time changing variable. Interpretation to me is simple: Some organisations reacted adequately to the Lockdown, and some did not. Hence yes, if you are omitting this reaction, your model is misspecified. Because this reaction is clearly related to Lockdown.

      What you need to consider in this very interesting and publishable result:

      A) do you want to treat the 0-10 well being variable as continuous (as you are doing now) or as a categorical variable?

      B) do you want to interact the well being variable with the Lockdown dummy?

      Comment


      • #4
        Many thanks Joro and Carlo. Really appreciate your input.

        Comment

        Working...
        X