Correlated random effects models

Thorben Schmidt

Join Date: Oct 2018

Posts: 6
#1

Correlated random effects models

31 Oct 2018, 07:36

Dear Statalisters!

I have more of a conceptual question. In order to relax the assumptions of a random effects model, I want to integrate a Mundlak transformation that is similar to the following suggestions:

Originally posted by daniel klein View Post

The "hybrid-model" is actually a rather simple thing, that can be explained in three steps

1. Calculate the panel-unit-specific mean for all time-varying predictors (but not the response/outcome). This is something along the lines by <id> ,sort : egen x1_between = mean(x1)

2. Subtract the panel-unit-specific mean from the original values, i.e. preform the fixed-effects/within-transformation. This is as simple as generate x1_within = x1 - x1_between

3. Run a random-effects/mixed model where you include the time-varying predictors in their de-meaned form (those from step 2) and their mean (those calculated in step 1) along with the time-invariant predictors. This is, in the simplest form, xtreg depvar x1_within x1_between x2_within x2_between x3_within x3_between x4

You are done. The coefficients for the *_within variables resemble the fixed-effects estimates, while the *_between variables can be interpreted as a between estimator. The coefficients for time-invariant predictors are those from a random-effects model.

Best
Daniel

Instead of the hybrid model, I would rather go with the correlated random effects as Sebastian Kripfganz proposed:

Originally posted by Sebastian Kripfganz View Post

Daniel already gave some good advice. Let me add my few cents to it.

You will get exactly the same results in the third step by using the original variables x1 instead of x1_within ..., which you can easily verify. In the literature, this approach is also known as "correlated random effects".

The principle is clear to me as such in the case of continuous variables. My question is, in this context, how do I treat categorical variables?

It would be great to get a response!

Best,
Thorben

Last edited by Thorben Schmidt; 31 Oct 2018, 08:03.
Tags: categorical variables, correlated random effects, mundlak, random effects
daniel klein

Join Date: Mar 2014

Posts: 3859
#2

31 Oct 2018, 08:11

You will have to create k-1 indicator variables for your k-level categorical variables, then include the means of these indicators. Unfortunately, you cannot use factor-variable notation any longer. This is inconvenient but on the pro-side, it will prevent you from trying to include interaction-terms in the wrong way (see Schuck 2013).

Best
Daniel

Schunck, R. 2013. Within and between estimates in random-effects models: Advantages and drawbacks of correlated random effects and hybrid models. The Stata Journal, 13(1), pp. 65–76. Within and between estimates in random-effects models: Advantages and drawbacks of correlated random effects and hybrid models
2 likes
Comment
Thorben Schmidt

Join Date: Oct 2018

Posts: 6
#3

31 Oct 2018, 09:02

Dear Daniel,

thank you so so much for the quick and useful reply! I think it worked, so just to be on the safe side:

In the case of the categorical variable "Owner" with the following 4 values, I created 4 dummy variables

tab Owner, gen (linkdum)
rename linkdum1 Owners
rename linkdum2 Main
rename linkdum3 Sub
rename linkdum4 Tenant

Subsequently, I created the means of three of these dummies

by pid, sort : egen Main_mean = mean(Main)
by pid, sort : egen Sub_mean = mean(Sub)
by pid, sort : egen Tenant_mean = mean(Tenant)

In my regression, I integrated the dummies (Main Sub Tenant) and their means (Main_mean Sub_mean Tenant_mean), leaving out the first value in order to avoid the dummy variable trap.

If this is correct, then I am more than grateful for your help!!!

Best regards,
Thorben
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#4

31 Oct 2018, 09:37

The approach looks correct; be sure to calculate the mean values in the same sample that you use in the regression model.

Best
Daniel
1 like
Comment
Thorben Schmidt

Join Date: Oct 2018

Posts: 6
#5

01 Nov 2018, 03:11

I am sorry to bother you again but what exactly do you mean by that? Controlling for the missing values?

Also, how would you precisely describe the effect of the time-invariant variable in econometric terms? Wooldrige only says

"In addition, we obtain an estimate of the time-invariant regressor, although the estimate should be interpreted with caution because it does not necessarily estimate a causal effect of the variable on the dependent variable."

Thank you very much for your help!

Best,
Thorben
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#6

01 Nov 2018, 04:26

Originally posted by Thorben Schmidt View Post

I am sorry to bother you again but what exactly do you mean by that? Controlling for the missing values?

I think I have to ask you back what you mean by "controlling" for missing values. I just wanted to point out that the means should be calculated from the sample, i.e., the same observations that enter the regression model. Since Stata will exclude observations with missing values on one or more variables from the regression model, you will indeed need to watch out for missing values when calculating the means.

Originally posted by Thorben Schmidt View Post

Also, how would you precisely describe the effect of the time-invariant variable in econometric terms?

I do not think that the mean variables have a substantive interpretation in the CRE-model. They estimate the difference between the within and between coefficient (see the article by Schunck that I have pointed to). You may interpret these coefficients like a Hausman test for each variable. A significant coefficient means that within and between coefficients differ significantly; thus the simple RE model would be biased.

If you want a more substantive interpretation, use the hybrid model, instead. Here, the mean variables represent the between units effects.

Best
Daniel

Last edited by daniel klein; 01 Nov 2018, 04:29. Reason: spelling
Comment
Thorben Schmidt

Join Date: Oct 2018

Posts: 6
#7

01 Nov 2018, 05:04

Originally posted by daniel klein View Post

I think I have to ask you back what you mean by "controlling" for missing values. I just wanted to point out that the means should be calculated from the sample, i.e., the same observations that enter the regression model. Since Stata will exclude observations with missing values on one or more variables from the regression model, you will indeed need to watch out for missing values when calculating the means.

Ok, then we were on the same page - I indeed did that by creating a sample variable via e(sample) that indicated the regression's sample while calculating the means. I was just wondering because I thought xtreg dependent x1 x2 x2mean x3 x3mean, re (where x1 is the time-invariant variable) should yield the same result for time-varying variables as xtreg dependent x1 x2 x3, fe (where x1 is omitted because it is time-invariant).

Originally posted by daniel klein View Post

I do not think that the mean variables have a substantive interpretation in the CRE-model. They estimate the difference between the within and between coefficient (see the article by Schunck that I have pointed to). You may interpret these coefficients like a Hausman test for each variable. A significant coefficient means that within and between coefficients differ significantly; thus the simple RE model would be biased.

If you want a more substantive interpretation, use the hybrid model, instead. Here, the mean variables represent the between units effects.

For that matter, I was not clear enough - sorry. I meant the time-invariant variable (my main variable of interest) why I estimated the correlated random effects model in the first place, which of course was not accompanied by its mean. But your point is interesting, thank you!

Best,
Thorben

Last edited by Thorben Schmidt; 01 Nov 2018, 05:06. Reason: Reason for edit: structure
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#8

01 Nov 2018, 06:36

Originally posted by Thorben Schmidt View Post

I was just wondering because I thought xtreg dependent x1 x2 x2mean x3 x3mean, re [...] should yield the same result for time-varying variables as xtreg dependent x1 x2 x3, fe

It should indeed.

I meant the time-invariant variable (my main variable of interest) why I estimated the correlated random effects model in the first place

I do not have a good answer to that; the CRE and hybrid models really focus on the time-varying predictors. The coefficients for time-constant predictors is still based only on the between panel-units variation, so it will still be biased by any unobserved between panel-unit heterogeneity. Guess the estimate should be close to the between effect.

Hopefully, someone else has a better answer.

Best
Daniel
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#9

01 Nov 2018, 08:03

The CRE coefficients of the time-invariant variables are equal to the between effects. There are no within effects for these variables. For any between effect estimator to be unbiased, those time-invariant variables and averages of the time-varying variables need to be uncorrelated with any unobserved between panel-unit heterogeneity. In other words, there shall be no omitted variable bias in the between effects model.

https://www.kripfganz.de/stata/
1 like
Comment

Thorben Schmidt

Join Date: Oct 2018
Posts: 6

#10

01 Nov 2018, 08:08

Hm, so I should be concerned about it. The code below is the procedure I applied to estimate the CRE (with more variables, just giving an example of continuous and categorical variables). Note that Instability is the time-invariant variable. Would you see why the estimates between re and fe differ?

Code:

tab Education if sample==1, gen (linkdum)
    rename linkdum1 Low
    rename linkdum2 Intermediate    
    rename linkdum3 High

gen sample=0
xtreg economic Instability income Public_Transfers Asset_Flows Intermediate High, re
replace sample=1 if e(sample)
by pid, sort : egen income_mean = mean(income) if sample==1
by pid, sort : egen Public_Transfers_mean = mean(Public_Transfers) if sample==1
by pid, sort : egen Asset_Flows_mean = mean(Asset_Flows) if sample==1
by pid, sort : egen Intermediate_mean = mean(Intermediate) if sample==1
by pid, sort : egen High_mean = mean(High) if sample==1

xtreg economic Instability income Public_Transfers Asset_Flows Intermediate High *mean if sample==1, re
xtreg economic Instability income Public_Transfers Asset_Flows Intermediate High if sample==1, fe

Thank you for your time and help!

Last edited by Thorben Schmidt; 01 Nov 2018, 08:11. Reason: minor mistake

Comment

daniel klein

Join Date: Mar 2014

Posts: 3859
#11

01 Nov 2018, 08:51

Originally posted by Sebastian Kripfganz View Post

The CRE coefficients of the time-invariant variables are equal to the between effects.

That is what I thought; however, the coefficients do not match exactly. I believe this is because the between model does not account for any within variation while the CRE does. That is: while the within estimates in CRE are, naturally, not affected by the inclusion or omission of time-invariant (within panel-unit constant) variables, the between estimates are sensitive to the inclusion of both time-varying (within panel-unit varying) and time-constant (within panel-unit constant) variables.

Edit: here is an example

Code:

// toy data webuse nlswork // mark the sample quietly regress ln_wage hours union collgrad race keep if e(sample) // get means foreach v in hours union { bysort id : egen double mean_`v' = mean(`v') } // decalre panel xtset id year // fe model xtreg ln_wage hours i.union i.collgrad i.race , fe // cre xtreg ln_wage hours i.union i.collgrad i.race mean* // between xtreg ln_wage hours i.union i.collgrad i.race , be

Best
Daniel

Last edited by daniel klein; 01 Nov 2018, 09:15.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#12

01 Nov 2018, 09:12

Originally posted by Thorben Schmidt View Post

The code below is the procedure I applied to estimate the CRE

But you generate sample in line 6 after you have referred to it in the very first line ... If you show code, please show what you have typed exactly.

You might want to get rid of all these if qualifiers that you miss so easily and just code

Code:

preserve quietly regress ... keep if e(sample) ... restore

Best
Daniel
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#13

01 Nov 2018, 12:03

Originally posted by daniel klein View Post

That is what I thought; however, the coefficients do not match exactly. I believe this is because the between model does not account for any within variation while the CRE does. That is: while the within estimates in CRE are, naturally, not affected by the inclusion or omission of time-invariant (within panel-unit constant) variables, the between estimates are sensitive to the inclusion of both time-varying (within panel-unit varying) and time-constant (within panel-unit constant) variables.

The reason for the differences is probably the unbalanced nature of the panel data set. With balanced panel data, the coeffcients from CRE and BE for the time-invariant regressors should exactly coincide.

https://www.kripfganz.de/stata/
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#14

01 Nov 2018, 12:10

Sebastian: Thanks for the hint; now I will have a good place to start when I find the time to look into this again.

Best
Daniel
Comment
Thorben Schmidt

Join Date: Oct 2018

Posts: 6
#15

02 Nov 2018, 07:15

Thank both of you so much, you really helped me get through this!
Comment

Announcement

Correlated random effects models

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment