Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using a cross sectional analysis to support the results of a panel analysis

    Hi all,

    I have a 3 wave panel of children's height and weight and whether either of their parents experienced a change from employment to unemployment (binary) during a recession. As you will see below, I make use of a fixed effects logit model to test this relationship.

    BMI Z SCORE

    Code:
    . xtreg z_score_bmi parents_unemployed_y i.urban_or_rural_y i.year i.mothers_age_y i.mothers_education_y i.
    > mothers_marital_status_y child_age_y, fe 
    note: child_age_y omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs     =     28,723
    Group variable: id                              Number of groups  =     10,998
    
    R-sq:                                           Obs per group:
         within  = 0.0476                                         min =          1
         between = 0.0001                                         avg =        2.6
         overall = 0.0140                                         max =          3
    
                                                    F(12,17713)       =      73.74
    corr(u_i, Xb)  = -0.0131                        Prob > F          =     0.0000
    
    ----------------------------------------------------------------------------------------------------
                           z_score_bmi |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------------------------+----------------------------------------------------------------
                  parents_unemployed_y |   .0446461   .0200391     2.23   0.026     .0053676    .0839247
                    1.urban_or_rural_y |   .0057002   .0279775     0.20   0.839    -.0491384    .0605388
                                       |
                                  year |
                                    1  |   .1900417   .0119574    15.89   0.000     .1666042    .2134793
                                    2  |  -.1473351   .0139738   -10.54   0.000    -.1747251   -.1199451
                                       |
                         mothers_age_y |
                                19-29  |   .0120552   .0357093     0.34   0.736    -.0579385    .0820489
                                30-39  |   .0150572   .0224131     0.67   0.502    -.0288746    .0589889
                                       |
                   mothers_education_y |
    Leaving Certificate to Non Degree  |   .0791826    .045583     1.74   0.082    -.0101644    .1685297
            Primary Degree or greater  |   .0667024   .0555739     1.20   0.230    -.0422278    .1756327
                                       |
              mothers_marital_status_y |
                                    2  |   .0312368   .0577671     0.54   0.589    -.0819924     .144466
                                    3  |   .0069348   .0835213     0.08   0.934    -.1567751    .1706446
                                    4  |  -.0289106   .0350781    -0.82   0.410    -.0976672     .039846
                                    5  |   -.117364   .2767071    -0.42   0.671     -.659737     .425009
                                       |
                           child_age_y |          0  (omitted)
                                 _cons |   .6499031   .0508691    12.78   0.000     .5501947    .7496115
    -----------------------------------+----------------------------------------------------------------
                               sigma_u |  .90798198
                               sigma_e |  .75347727
                                   rho |  .59219609   (fraction of variance due to u_i)
    ----------------------------------------------------------------------------------------------------
    F test that all u_i=0: F(10997, 17713) = 3.54                Prob > F = 0.0000

    Child Overweight

    Code:
    . xtlogit child_overweight_y  parents_unemployed_y i.urban_or_rural_y i.year i.mothers_age_y i.mothers_educ
    > ation_y i.mothers_marital_status_y child_age_y, fe nolog
    note: child_age_y omitted because of collinearity
    note: multiple positive outcomes within groups encountered.
    note: 9,053 groups (23,188 obs) dropped because of all positive or
          all negative outcomes.
    
    Conditional fixed-effects logistic regression   Number of obs     =      5,535
    Group variable: id                              Number of groups  =      1,945
    
                                                    Obs per group:
                                                                  min =          2
                                                                  avg =        2.8
                                                                  max =          3
    
                                                    LR chi2(12)       =     239.12
    Log likelihood  = -1895.5991                    Prob > chi2       =     0.0000
    
    ----------------------------------------------------------------------------------------------------
                    child_overweight_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------------------+----------------------------------------------------------------
                  parents_unemployed_y |   .2705773   .1005702     2.69   0.007     .0734634    .4676913
                    1.urban_or_rural_y |  -.0209971   .1507919    -0.14   0.889    -.3165439    .2745497
                                       |
                                  year |
                                    1  |   .3195079   .0596055     5.36   0.000     .2026833    .4363326
                                    2  |  -.5625906   .0751008    -7.49   0.000    -.7097854   -.4153958
                                       |
                         mothers_age_y |
                                19-29  |   .1622443   .1849878     0.88   0.380    -.2003252    .5248138
                                30-39  |   .1249393   .1217915     1.03   0.305    -.1137676    .3636462
                                       |
                   mothers_education_y |
    Leaving Certificate to Non Degree  |   .4089109   .2192694     1.86   0.062    -.0208492    .8386711
            Primary Degree or greater  |   .4875182   .2805553     1.74   0.082    -.0623601    1.037397
                                       |
              mothers_marital_status_y |
                                    2  |  -.0491349   .2948536    -0.17   0.868    -.6270372    .5287675
                                    3  |  -.3627668   .4261465    -0.85   0.395    -1.197999    .4724649
                                    4  |  -.1200269   .1744402    -0.69   0.491    -.4619234    .2218696
                                    5  |    .762811   1.064853     0.72   0.474    -1.324263    2.849884
                                       |
                           child_age_y |          0  (omitted)
    ----------------------------------------------------------------------------------------------------


    After showing that there is a relationship between parental employment and child weight I want to determine the possible mechanisms of effect.

    I suspect that as parents lose their jobs they can less easily afford things for their children like nutritious food, expensive exercise clubs, etc., thus I replicate the above but replace child's weight with things like calorie consumption, sports-club membership etc., as below.


    Calorie consumption


    Code:
    . reg calories3 parents_unemployed_y2 i.urban_or_rural_y2 i.mothers_age_y2 i.mothers_educati
    > on_y2 i.mothers_marital_status_y2 child_age_y2
    note: child_age_y2 omitted because of collinearity
    
          Source |       SS           df       MS      Number of obs   =     8,738
    -------------+----------------------------------   F(10, 8727)     =     56.06
           Model |   138278595        10  13827859.5   Prob > F        =    0.0000
        Residual |  2.1527e+09     8,727  246676.696   R-squared       =    0.0604
    -------------+----------------------------------   Adj R-squared   =    0.0593
           Total |  2.2910e+09     8,737  262221.143   Root MSE        =    496.67
    
    -------------------------------------------------------------------------------------------
                    calories3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------------------+----------------------------------------------------------------
        parents_unemployed_y2 |   101.2964   15.69874     6.45   0.000     70.52316    132.0696
                              |
            urban_or_rural_y2 |
                       Urban  |   1.654679   10.93902     0.15   0.880    -19.78839    23.09774
                              |
               mothers_age_y2 |
                       19-29  |   163.4985   20.73145     7.89   0.000     122.8599     204.137
                       30-39  |   58.80138   11.90259     4.94   0.000     35.46949    82.13327
                              |
         mothers_education_y2 |
    Leaving Certificate to..  |   -102.457   22.42631    -4.57   0.000    -146.4179   -58.49615
    Primary Degree or grea..  |  -254.1752   23.51429   -10.81   0.000    -300.2688   -208.0817
                              |
    mothers_marital_status_y2 |
    Married and separated ..  |   85.05454   32.51722     2.62   0.009     21.31311     148.796
          Divorced / Widowed  |    106.224   47.42236     2.24   0.025     13.26496     199.183
               Never married  |   41.64441   15.10367     2.76   0.006     12.03767    71.25116
                  Don't know  |   -94.0545   165.8077    -0.57   0.571    -419.0767    230.9677
                              |
                 child_age_y2 |          0  (omitted)
                        _cons |   1605.807   24.11237    66.60   0.000     1558.541    1653.073
    -------------------------------------------------------------------------------------------

    Sports Club Membership

    Code:
    . logit binarysportsclub3 parents_unemployed_y2 i.urban_or_rural_y2 i.mothers_age_y2 i.mothe
    > rs_education_y2 i.mothers_marital_status_y2 child_age_y2
    
    note: child_age_y2 omitted because of collinearity
    Iteration 0:   log likelihood = -6076.5431  
    Iteration 1:   log likelihood = -5886.1443  
    Iteration 2:   log likelihood = -5885.4165  
    Iteration 3:   log likelihood = -5885.4162  
    
    Logistic regression                             Number of obs     =      8,773
                                                    LR chi2(10)       =     382.25
                                                    Prob > chi2       =     0.0000
    Log likelihood = -5885.4162                     Pseudo R2         =     0.0315
    
    -------------------------------------------------------------------------------------------
            binarysportsclub3 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------------------+----------------------------------------------------------------
        parents_unemployed_y2 |  -.4062894   .0653562    -6.22   0.000    -.5343851   -.2781936
                              |
            urban_or_rural_y2 |
                       Urban  |   .0365025   .0449827     0.81   0.417    -.0516618    .1246669
                              |
               mothers_age_y2 |
                       19-29  |  -.2919594   .0858323    -3.40   0.001    -.4601877   -.1237312
                       30-39  |  -.0474992   .0488224    -0.97   0.331    -.1431894     .048191
                              |
         mothers_education_y2 |
    Leaving Certificate to..  |   .7879872   .1000404     7.88   0.000     .5919116    .9840627
    Primary Degree or grea..  |   1.189186   .1039254    11.44   0.000     .9854956    1.392876
                              |
    mothers_marital_status_y2 |
    Married and separated ..  |  -.1125892   .1323738    -0.85   0.395    -.3720372    .1468587
          Divorced / Widowed  |  -1.145787   .2191226    -5.23   0.000    -1.575259   -.7163142
               Never married  |  -.1271735   .0619616    -2.05   0.040    -.2486161   -.0057309
                  Don't know  |  -.8121472    .695992    -1.17   0.243    -2.176266     .551972
                              |
                 child_age_y2 |          0  (omitted)
                        _cons |  -.6890735   .1060969    -6.49   0.000    -.8970197   -.4811274
    -------------------------------------------------------------------------------------------

    My questions are as follows:
    1. The core analysis is in longitudinal data, but the mechanisms I investigate are only available in 1 of 3 waves in this data, usually later waves when the recession really struck, is it OK to use a cross sectional analysis to investigate the mechanisms I suggest as explaining the results in my longitudinal analysis?
    2. I wanted to make my cross-sectional analysis almost identical to my longitudinal analysis, but did I succeed? As my longitudinal analysis was a fixed effects regression I wanted to repeat that here and according to this post (https://www.stata.com/statalist/arch.../msg00413.html) including dummy variables is the same thing as including fixed effects, but is that the case, and is there anything I need to do to make this a better replication of my initial fixed effects analysis?
    3. Also, should I include a variable for year and how could I do this? In my longitudinal analysis I included year fixed effects by including the year variable created by xtset so I feel I should match that in my cross-sectional analysis to keep things the same, but I'm not sure that I have a variable for the year cross-sectionally.
    Thank you for any advice,

    John

  • #2
    1. You can't make solid inferences about a longitudinal process from data in a single cross section. It may be suggestive, but without longitudinal data, it falls a bit short. All you can conclude is that those children whose parents are unemployed at one moment experience certain different outcomes from those whose parents are employed at that time--but it doesn't extend beyond that.

    2. For a linear regression, a fixed-effects analysis is the same as including indicator ("dummy") variables for the panels. But for non-linear models such as logistic, this is not the case.

    3.
    In my longitudinal analysis I included year fixed effects by including the year variable created by xtset
    That's wrong. First of all, -xtset- doesn't create any variables at all. And -xtset-ing a time variable does not result in the inclusion of a time fixed effect in subsequent regression commands. It does automatically lead to inclusion of a panel variable, but not a time variable. The panel regressions you show do not mention a year variable explicitly, so they do not incorporate year at all.

    Whether you need a year variable is a content issue. When the dependent variable is subject to short-term shocks, then year indicator variables reduce that source of noise variance allowing sharper estimates of other effects. It can also be used to capture more continuous trends (such as the obesity epidemic), although that is more efficiently done using a continuous year variable rather than indicators. As some of your outcome variables fall within the domain of epidemiology, I am comfortable addressing them. There has been a long-term secular trend of increasing weight (and body mass index) in both children and adults over the past 4 decades in the economically developed countries. It may have leveled off recently. If your three waves fall within a relatively short, recent time period, you would probably be fine without a time variable. But if they extend over a longer period, say a decade or more, then I think you would benefit from including year as a continuous variable to capture that trend. For the non-epidemiologic variables, I cannot comment with authority. It strikes me that those are probably more subject to the kind of short-term shocks that afflict most economic variables. If I have that right, you would need year indicators to adequately adjust for that. Since the year indicators would also capture the secular trend in weight, if less efficiently than a continuous year variable, and since it is best to have the models be as similar as possible, I would probably do these models using year indicators in all cases.

    Comment


    • #3
      Dear Clyde,

      As always thank you for a very informative response, I highly appreciate it.
      1. I think your assessment is right, unfortunately, with mechanisms not measured longitudinally, I will probably be forced to use a few cross-sections to provide some measure of support for the mechanisms I argue for, while noting the limitations of the data and a cross-sectional approach in general.
      2. Can you explain what you mean by including indicator variables for the panels? How do you define panels? Am I to include dummy variables per individual id or is the approach I described OK? Similarly, how can I best replicate my longitudinal logit fixed effects analysis for the logistic cross-sectional regression, as option fe is not allowed in a cross-sectional logit model?
      3. In terms of the year variable, the panel is collected in three waves, one wave at year 1 (2000), one wave at year 2 (2003) and one wave at year 3 (2006), I xtset the data as follows:
      Code:
      . reshape long z_score_bmi child_overweight_y parents_unemployed_y urban_or_rural_y child_age_y mothers_age
      > _y mothers_education_y mothers_marital_status_y, i(id) j(year)
      (note: j = 0 1 2 3)
      (note: parents_unemployed_y3 not found)
      (note: urban_or_rural_y3 not found)
      (note: mothers_age_y3 not found)
      (note: mothers_education_y3 not found)
      (note: mothers_marital_status_y3 not found)
      
      Data                               wide   ->   long
      -----------------------------------------------------------------------------
      Number of obs.                    11134   ->   44536
      Number of variables                3685   ->    3667
      j variable (4 values)                     ->   year
      xij variables:
      z_score_bmi0 z_score_bmi1 ... z_score_bmi3->   z_score_bmi
      child_overweight_y0 child_overweight_y1 ... child_overweight_y3->child_overweight_y
      parents_unemployed_y0 parents_unemployed_y1 ... parents_unemployed_y3->parents_unemployed_y
      urban_or_rural_y0 urban_or_rural_y1 ... urban_or_rural_y3->urban_or_rural_y
      child_age_y0 child_age_y1 ... child_age_y3->   child_age_y
      mothers_age_y0 mothers_age_y1 ... mothers_age_y3->mothers_age_y
      mothers_education_y0 mothers_education_y1 ... mothers_education_y3->mothers_education_y
      mothers_marital_status_y0 mothers_marital_status_y1 ... mothers_marital_status_y3->mothers_marital_status_y
      -----------------------------------------------------------------------------
      
      .  
      . xtset id year
             panel variable:  id (strongly balanced)
              time variable:  year, 0 to 3
                      delta:  1 unit
      Above a time variable is created: year, 0 to 3. It was my understanding that this "year" variable was a variable indicating each year of the survey, so 2000, 2003 and 2006 (there was a measure for 2007-wave 4-but I drop this as it is not relevant to my analysis). Which I then included in my analysis to capture time trends as per:


      BMI Z SCORE

      . xtreg z_score_bmi parents_unemployed_y i.urban_or_rural_y i.year i.mothers_age_y i.mothers_education_y i. > mothers_marital_status_y child_age_y, fe


      Child Overweight

      . xtlogit child_overweight_y parents_unemployed_y i.urban_or_rural_y i.year i.mothers_age_y i.mothers_educ > ation_y i.mothers_marital_status_y child_age_y, fe nolog

      From your explanation it seems that I have misunderstood this entirely? As I do not have a built in year variable other than the one created when I xtset the data, how can I best include time fixed effects?

      Thank you again for your insight,

      Kindest regards,

      John

      Comment


      • #4
        Can you explain what you mean by including indicator variables for the panels? How do you define panels? Am I to include dummy variables per individual id or is the approach I described OK? Similarly, how can I best replicate my longitudinal logit fixed effects analysis for the logistic cross-sectional regression, as option fe is not allowed in a cross-sectional logit model?
        By panels, I mean the variable you mentioned in your -xtset- command (other than the time variable). From the output in #1 that appears to be id. I do not know what that refers to, but you do. So one can emulate -xtreg ..., fe- by running -reg ... i.id- in longitudinal data. Of course, if you are reduced to one observation per id, then i.id will be colinear with the constant term and will be deleted anyway. So if you have only one observation per id, there is really nothing analogous to fixed effects regression.

        There really isn't anything analogous to that for logistic regression. The closest you can come is just an ordinary logistic regression. But that isn't really the same thing.

        From your explanation it seems that I have misunderstood this entirely? As I do not have a built in year variable other than the one created when I xtset the data, how can I best include time fixed effects?
        That's right, you misunderstood it. First, -xtset- does not create any variables. It just designates the panel variable (id, in your case) to be used in fixed effects models. But the time variable is not used that way. In order to include the time fixed effects you need to include i.year as a covariate in the regressions.

        Comment


        • #5
          Hi Clyde,

          Thank you for your response.

          I'm confused, I do include i.year as a covariate in my regressions, see BMI Z-Score and Child Overweight regressions above, is there something wrong with how I do this or where I get the year variable from?


          All the best,

          John

          Comment


          • #6
            I probably should clarify, before reshaping and xtseting the data, the variables included appeared as follows:

            child_overweight_y0
            child_overweight_y1
            child_overweight_y2

            parents_unemployed_y0
            parents_unemployed_y1
            parents_unemployed_y2

            urban_or_rural_y0
            urban_or_rural_y1
            urban_or_rural_y2

            child_age_y0
            child_age_y1
            child_age_y2

            mothers_age_y0
            mothers_age_y1
            mothers_age_y2

            mothers_education_y0
            mothers_education_y1
            mothers_education_y2

            mothers_marital_status_y0
            mothers_marital_status_y1
            mothers_marital_status_y2

            They were specifically coded with this _y0/_y1/_y2 corresponding to the year in which they appeared to aid in creating this year variable during xtset id year.

            Thanks again,

            John

            Comment


            • #7
              I'm confused, I do include i.year as a covariate in my regressions, see BMI Z-Score and Child Overweight regressions above, is there something wrong with how I do this or where I get the year variable from?
              I'm sorry for confusing you. You do include them in those regressions, but you didn't in #1 in the regressions for calorie consumption and sports club membership. I think that they are needed there.

              The way you use i.year in the regressions where you do use it is exactly right.

              Comment


              • #8
                That's great, thank you Clyde, I'm sorry I wasn't clearer initially. Thanks again!

                Comment

                Working...
                X