Does my fixed effects regression set-up make sense?

John Adler

Join Date: Apr 2017
Posts: 173

Does my fixed effects regression set-up make sense?

10 Aug 2019, 10:11

I have a 4 wave panel of children's height and weight and whether either of their parents have experienced a change from employment to unemployment (binary). Anthropometric measures are objectively measured and so I convert them to z scores using the zanthro package in Stata. I create weight for age, weight for height, height for age and BMI z scores and then I create new binary variables of overweight using z-score cut-offs using the WHO child growth charts to go with these continuous variables.

Before I begin my analysis I remove wave 4 as there is a lack of response in this wave:

Code:

drop if wave==4

I would like to analyse the data in a fixed effects analysis, but the data providers have suggested I apply a wave 3 weight they provide to make the sample representative of the national child population.

xtlogit will not allow me to apply weights so instead I do the following:

Code:

clogit child_overweight_y parents_unemployed_y  i.urban_or_rural_y child_age_y [pw=weighting_factor], group(id) nolog robust
margins, dydx(parents_unemployed_y) post
estimates store logitmod
estimates table logitmod, star stats(N r2 r2_a)

Which provides the following output:

Code:

. clogit child_overweight_y parents_unemployed_y  i.urban_or_rural_y child_age_y [pw=weighting_factor], gro
> up(id) nolog robust
note: multiple positive outcomes within groups encountered.
note: 7,150 groups (20,713 obs) dropped because of all positive or
      all negative outcomes.

Conditional (fixed-effects) logistic regression

                                                Number of obs     =      5,341
                                                Wald chi2(3)      =      51.26
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -1939.9054               Pseudo R2         =     0.0199

                                             (Std. Err. adjusted for clustering on id)
--------------------------------------------------------------------------------------
                     |               Robust
  child_overweight_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
parents_unemployed_y |   .4196281   .1226047     3.42   0.001     .1793273     .659929
  1.urban_or_rural_y |   .1462153   .1653259     0.88   0.376    -.1778174    .4702481
         child_age_y |  -.0089368   .0013886    -6.44   0.000    -.0116585   -.0062151
--------------------------------------------------------------------------------------

. margins, dydx(parents_unemployed_y) post

Average marginal effects                        Number of obs     =      5,341
Model VCE    : Robust

Expression   : Pr(child_overweight_y|fixed effect is 0), predict(pu0)
dy/dx w.r.t. : parents_unemployed_y

--------------------------------------------------------------------------------------
                     |            Delta-method
                     |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
parents_unemployed_y |   .1024646    .029751     3.44   0.001     .0441537    .1607755
--------------------------------------------------------------------------------------

. estimates store logitmod

. estimates table logitmod, star stats(N r2 r2_a)

------------------------------
    Variable |   logitmod    
-------------+----------------
parents_un~y |  .10246458***  
-------------+----------------
           N |       5341    
          r2 |                
        r2_a |                
------------------------------
legend: * p<0.05; ** p<0.01; *** p<0.001

I would like to cluster the standard errors by the child's location but urban_or_rural_y is the closest variable I have to location, referring to whether the child lives in an urban or rural region and is a binary variable as below:

Code:

. tab urban_or_rural_y

urban_or_ru |
      ral_y |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     17,091       57.34       57.34
          1 |     12,713       42.66      100.00
------------+-----------------------------------
      Total |     29,804      100.00

.

Where 0 is urban and 1 is rural. When I try to include this cluster I get the following outcome:

Code:

groups (strata) are not nested within clusters

I'm not quite sure what that means, is it that I don't have enough clusters? i.e. I only have urban or rural?

I want to look at whether parental employment increases the probability of being overweight, so above I take this result as indicating that parental employment increases the probability of being overweight by 10%, i.e. as either parent goes from employed to unemployed the probability of the child going from a normal to overweight increases by 10%

Having done that I would like to know if either parent being unemployed increases the z-score, as I feel that a larger z-score implies a child is further from the mean and closer to being overweight if the score is positive and large, so I do the following:

Code:

xtreg z_score_bmi parents_unemployed_y i.urban_or_rural_y child_age_y [pw=weighting_factor], fe

Which gives me the following result:

Code:

. xtreg z_score_bmi parents_unemployed_y i.urban_or_rural_y child_age_y [pw=weighting_factor], fe

Fixed-effects (within) regression               Number of obs     =     26,054
Group variable: id                              Number of groups  =      8,972

R-sq:                                           Obs per group:
     within  = 0.0089                                         min =          1
     between = 0.0000                                         avg =        2.9
     overall = 0.0024                                         max =          3

                                                F(3,8971)         =      30.52
corr(u_i, Xb)  = -0.0192                        Prob > F          =     0.0000

                                         (Std. Err. adjusted for 8,972 clusters in id)
--------------------------------------------------------------------------------------
                     |               Robust
         z_score_bmi |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
parents_unemployed_y |   .1075761   .0263291     4.09   0.000      .055965    .1591872
  1.urban_or_rural_y |   .0516994   .0344913     1.50   0.134    -.0159113    .1193102
         child_age_y |  -.0026084   .0003005    -8.68   0.000    -.0031974   -.0020193
               _cons |   .8034086   .0191922    41.86   0.000     .7657874    .8410298
---------------------+----------------------------------------------------------------
             sigma_u |  .86341018
             sigma_e |  .77595763
                 rho |  .55319391   (fraction of variance due to u_i)
--------------------------------------------------------------------------------------

Which I take as indicating that as either parent becomes unemployed the child's weight increases by a tenth of a standard deviation.

Does my approach, and understanding of my results make sense?

I would hate to make a mistake and would really appreciate if anyone could point out my mistakes now so that I could correct them at the beginning of my study and do better!

Thank you so much,

John

Last edited by John Adler; 10 Aug 2019, 10:17.

Tags: fixed effects, panel, panel data, regression, syntax

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

10 Aug 2019, 12:04

Most of this sounds right.

As for the clustering situation, the message does not mean that you have too few clusters. It means that some of your children are sometimes urban and sometimes rural, and it can't abide that: you could only do this kind of clustering if every child were consistently urban or consistently rural. That said, it is also true that you have far too few clusters here. While there is no universally agreed upon minimum number of acceptable clusters, everyone would agree that 2 is not even close. You simply shouldn't attempt to do this kind of clustering.

I want to look at whether parental employment increases the probability of being overweight, so above I take this result as indicating that parental employment increases the probability of being overweight by 10%, i.e. as either parent goes from employed to unemployed the probability of the child going from a normal to overweight increases by 10%

No. The increase in probability of being overweight is 10 percentage points, not 10 percent. Percentage points are additive, percents are multiplicative. So, if the probability of being overweight in the employed situation is, let's say, 25%, a 10% increase on that would be 25%*1.10 = 27.5%--which is not what your model gives you. Your model gives you 25% + 10 percentage points = 35%.

Also, unless you have external information that enables you to infer that the relationship is causal, you should not use causal language in describing a marginal effect. Use descriptive langauge: "according to the model, the probability of a child being overweight is 10 percentage points higher if the family has experienced unemployment than if it has not."

Which I take as indicating that as either parent becomes unemployed the child's weight increases by a tenth of a standard deviation.

My caution about the use of causal language applies to this as well. But other than that, it is correct.
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#3

26 Sep 2019, 14:02

I'm sorry to drop back in on this question Clyde but I had a brief question on standard deviations, above I say that as either parent becomes unemployed the child's weight increases by a tenth of a standard deviation, but I was wondering how I would describe this if I got a less neat result, say 0.04*, would I call that 0.04 of a standard deviation? Or a fourth of a standard deviation?

Sorry I just felt I should try to understand the basic concepts a bit better than Stata's output!

Thanks again,

John
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

26 Sep 2019, 14:25

would I call that 0.04 of a standard deviation?

In writing, I would call it 0.04 standard deviations. In speaking, I would read it as "four one-hundredths of a standard deviation," or, since, in this case, 4/100 = 1/25 I might say "one twenty-fifth of a standard deviation."
Comment

Announcement

Does my fixed effects regression set-up make sense?

Comment

Comment

Comment