Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlated random effects models

    Dear Statalisters!

    I have more of a conceptual question. In order to relax the assumptions of a random effects model, I want to integrate a Mundlak transformation that is similar to the following suggestions:

    Originally posted by daniel klein View Post

    The "hybrid-model" is actually a rather simple thing, that can be explained in three steps

    1. Calculate the panel-unit-specific mean for all time-varying predictors (but not the response/outcome). This is something along the lines by <id> ,sort : egen x1_between = mean(x1)

    2. Subtract the panel-unit-specific mean from the original values, i.e. preform the fixed-effects/within-transformation. This is as simple as generate x1_within = x1 - x1_between

    3. Run a random-effects/mixed model where you include the time-varying predictors in their de-meaned form (those from step 2) and their mean (those calculated in step 1) along with the time-invariant predictors. This is, in the simplest form, xtreg depvar x1_within x1_between x2_within x2_between x3_within x3_between x4

    You are done. The coefficients for the *_within variables resemble the fixed-effects estimates, while the *_between variables can be interpreted as a between estimator. The coefficients for time-invariant predictors are those from a random-effects model.

    Best
    Daniel
    Instead of the hybrid model, I would rather go with the correlated random effects as Sebastian Kripfganz proposed:

    Originally posted by Sebastian Kripfganz View Post
    Daniel already gave some good advice. Let me add my few cents to it.

    You will get exactly the same results in the third step by using the original variables x1 instead of x1_within ..., which you can easily verify. In the literature, this approach is also known as "correlated random effects".
    The principle is clear to me as such in the case of continuous variables. My question is, in this context, how do I treat categorical variables?

    It would be great to get a response!

    Best,
    Thorben
    Last edited by Thorben Schmidt; 31 Oct 2018, 08:03.

  • #2
    You will have to create k-1 indicator variables for your k-level categorical variables, then include the means of these indicators. Unfortunately, you cannot use factor-variable notation any longer. This is inconvenient but on the pro-side, it will prevent you from trying to include interaction-terms in the wrong way (see Schuck 2013).

    Best
    Daniel


    Schunck, R. 2013. Within and between estimates in random-effects models: Advantages and drawbacks of correlated random effects and hybrid models. The Stata Journal, 13(1), pp. 65–76. Within and between estimates in random-effects models: Advantages and drawbacks of correlated random effects and hybrid models

    Comment


    • #3
      Dear Daniel,

      thank you so so much for the quick and useful reply! I think it worked, so just to be on the safe side:

      In the case of the categorical variable "Owner" with the following 4 values, I created 4 dummy variables

      tab Owner, gen (linkdum)
      rename linkdum1 Owners
      rename linkdum2 Main
      rename linkdum3 Sub
      rename linkdum4 Tenant

      Subsequently, I created the means of three of these dummies

      by pid, sort : egen Main_mean = mean(Main)
      by pid, sort : egen Sub_mean = mean(Sub)
      by pid, sort : egen Tenant_mean = mean(Tenant)

      In my regression, I integrated the dummies (Main Sub Tenant) and their means (Main_mean Sub_mean Tenant_mean), leaving out the first value in order to avoid the dummy variable trap.

      If this is correct, then I am more than grateful for your help!!!

      Best regards,
      Thorben

      Comment


      • #4
        The approach looks correct; be sure to calculate the mean values in the same sample that you use in the regression model.

        Best
        Daniel

        Comment


        • #5
          I am sorry to bother you again but what exactly do you mean by that? Controlling for the missing values?

          Also, how would you precisely describe the effect of the time-invariant variable in econometric terms? Wooldrige only says

          "In addition, we obtain an estimate of the time-invariant regressor, although the estimate should be interpreted with caution because it does not necessarily estimate a causal effect of the variable on the dependent variable."

          Thank you very much for your help!

          Best,
          Thorben

          Comment


          • #6
            Originally posted by Thorben Schmidt View Post
            I am sorry to bother you again but what exactly do you mean by that? Controlling for the missing values?
            I think I have to ask you back what you mean by "controlling" for missing values. I just wanted to point out that the means should be calculated from the sample, i.e., the same observations that enter the regression model. Since Stata will exclude observations with missing values on one or more variables from the regression model, you will indeed need to watch out for missing values when calculating the means.

            Originally posted by Thorben Schmidt View Post
            Also, how would you precisely describe the effect of the time-invariant variable in econometric terms?
            I do not think that the mean variables have a substantive interpretation in the CRE-model. They estimate the difference between the within and between coefficient (see the article by Schunck that I have pointed to). You may interpret these coefficients like a Hausman test for each variable. A significant coefficient means that within and between coefficients differ significantly; thus the simple RE model would be biased.

            If you want a more substantive interpretation, use the hybrid model, instead. Here, the mean variables represent the between units effects.

            Best
            Daniel
            Last edited by daniel klein; 01 Nov 2018, 04:29. Reason: spelling

            Comment


            • #7
              Originally posted by daniel klein View Post

              I think I have to ask you back what you mean by "controlling" for missing values. I just wanted to point out that the means should be calculated from the sample, i.e., the same observations that enter the regression model. Since Stata will exclude observations with missing values on one or more variables from the regression model, you will indeed need to watch out for missing values when calculating the means.
              Ok, then we were on the same page - I indeed did that by creating a sample variable via e(sample) that indicated the regression's sample while calculating the means. I was just wondering because I thought xtreg dependent x1 x2 x2mean x3 x3mean, re (where x1 is the time-invariant variable) should yield the same result for time-varying variables as xtreg dependent x1 x2 x3, fe (where x1 is omitted because it is time-invariant).

              Originally posted by daniel klein View Post

              I do not think that the mean variables have a substantive interpretation in the CRE-model. They estimate the difference between the within and between coefficient (see the article by Schunck that I have pointed to). You may interpret these coefficients like a Hausman test for each variable. A significant coefficient means that within and between coefficients differ significantly; thus the simple RE model would be biased.

              If you want a more substantive interpretation, use the hybrid model, instead. Here, the mean variables represent the between units effects.
              For that matter, I was not clear enough - sorry. I meant the time-invariant variable (my main variable of interest) why I estimated the correlated random effects model in the first place, which of course was not accompanied by its mean. But your point is interesting, thank you!

              Best,
              Thorben

              Last edited by Thorben Schmidt; 01 Nov 2018, 05:06. Reason: Reason for edit: structure

              Comment


              • #8
                Originally posted by Thorben Schmidt View Post
                I was just wondering because I thought xtreg dependent x1 x2 x2mean x3 x3mean, re [...] should yield the same result for time-varying variables as xtreg dependent x1 x2 x3, fe
                It should indeed.

                I meant the time-invariant variable (my main variable of interest) why I estimated the correlated random effects model in the first place
                I do not have a good answer to that; the CRE and hybrid models really focus on the time-varying predictors. The coefficients for time-constant predictors is still based only on the between panel-units variation, so it will still be biased by any unobserved between panel-unit heterogeneity. Guess the estimate should be close to the between effect.

                Hopefully, someone else has a better answer.

                Best
                Daniel

                Comment


                • #9
                  The CRE coefficients of the time-invariant variables are equal to the between effects. There are no within effects for these variables. For any between effect estimator to be unbiased, those time-invariant variables and averages of the time-varying variables need to be uncorrelated with any unobserved between panel-unit heterogeneity. In other words, there shall be no omitted variable bias in the between effects model.
                  https://twitter.com/Kripfganz

                  Comment


                  • #10
                    Hm, so I should be concerned about it. The code below is the procedure I applied to estimate the CRE (with more variables, just giving an example of continuous and categorical variables). Note that Instability is the time-invariant variable. Would you see why the estimates between re and fe differ?

                    Code:
                    tab Education if sample==1, gen (linkdum)
                        rename linkdum1 Low
                        rename linkdum2 Intermediate    
                        rename linkdum3 High
                    
                    gen sample=0
                    xtreg economic Instability income Public_Transfers Asset_Flows Intermediate High, re
                    replace sample=1 if e(sample)
                    by pid, sort : egen income_mean = mean(income) if sample==1
                    by pid, sort : egen Public_Transfers_mean = mean(Public_Transfers) if sample==1
                    by pid, sort : egen Asset_Flows_mean = mean(Asset_Flows) if sample==1
                    by pid, sort : egen Intermediate_mean = mean(Intermediate) if sample==1
                    by pid, sort : egen High_mean = mean(High) if sample==1
                    
                    xtreg economic Instability income Public_Transfers Asset_Flows Intermediate High *mean if sample==1, re
                    xtreg economic Instability income Public_Transfers Asset_Flows Intermediate High if sample==1, fe
                    Thank you for your time and help!
                    Last edited by Thorben Schmidt; 01 Nov 2018, 08:11. Reason: minor mistake

                    Comment


                    • #11
                      Originally posted by Sebastian Kripfganz View Post
                      The CRE coefficients of the time-invariant variables are equal to the between effects.
                      That is what I thought; however, the coefficients do not match exactly. I believe this is because the between model does not account for any within variation while the CRE does. That is: while the within estimates in CRE are, naturally, not affected by the inclusion or omission of time-invariant (within panel-unit constant) variables, the between estimates are sensitive to the inclusion of both time-varying (within panel-unit varying) and time-constant (within panel-unit constant) variables.

                      Edit: here is an example

                      Code:
                      // toy data
                      webuse nlswork
                      // mark the sample
                      quietly regress ln_wage hours union collgrad race
                      keep if e(sample)
                      // get means
                      foreach v in hours union {
                          bysort id : egen double mean_`v' = mean(`v')
                      }
                      // decalre panel
                      xtset id year
                      // fe model
                      xtreg ln_wage hours i.union i.collgrad i.race , fe
                      // cre
                      xtreg ln_wage hours i.union i.collgrad i.race mean*
                      // between
                      xtreg ln_wage hours i.union i.collgrad i.race , be
                      Best
                      Daniel
                      Last edited by daniel klein; 01 Nov 2018, 09:15.

                      Comment


                      • #12
                        Originally posted by Thorben Schmidt View Post
                        The code below is the procedure I applied to estimate the CRE
                        But you generate sample in line 6 after you have referred to it in the very first line ... If you show code, please show what you have typed exactly.

                        You might want to get rid of all these if qualifiers that you miss so easily and just code

                        Code:
                        preserve
                        quietly regress ...
                        keep if e(sample)
                        ...
                        restore
                        Best
                        Daniel

                        Comment


                        • #13
                          Originally posted by daniel klein View Post
                          That is what I thought; however, the coefficients do not match exactly. I believe this is because the between model does not account for any within variation while the CRE does. That is: while the within estimates in CRE are, naturally, not affected by the inclusion or omission of time-invariant (within panel-unit constant) variables, the between estimates are sensitive to the inclusion of both time-varying (within panel-unit varying) and time-constant (within panel-unit constant) variables.
                          The reason for the differences is probably the unbalanced nature of the panel data set. With balanced panel data, the coeffcients from CRE and BE for the time-invariant regressors should exactly coincide.
                          https://twitter.com/Kripfganz

                          Comment


                          • #14
                            Sebastian: Thanks for the hint; now I will have a good place to start when I find the time to look into this again.

                            Best
                            Daniel

                            Comment


                            • #15
                              Thank both of you so much, you really helped me get through this!

                              Comment

                              Working...
                              X