Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data: FE Regression = Panel Quasi-RE Regression = RE with Controls

    Hello all,

    my Name ist Marcus Fiedler. I am writing a dissertation (PhD) about the human resources director.
    UP to now i am working with Stata 14. The structure will be set with "xtset ID Year".

    The literature suggests that there are three different methods for getting "same" results:
    Panel FE regression (FE procedure involves group-mean-centering all iV, but I have time invariant variables...) or
    Panel Quasi-RE Regression or
    RE with Controls (controls are mean variables from iv)

    What I want to do:
    In my data there are time-invariant variables which means they don´t get an coefficient with FE regression and omitted. That means, I have to do a Quasi-RE (a), including group-mean-centering variables, or RE with controls (b), iv and mean iv.

    Example (simple):
    (a) x= (y1-y1m)+(y2-y2m)...
    (b) x = y1+y1m+y2+y2m...
    Both methods produce the same results like FE. The first method is also called "hybrid".

    What my problem is:
    I will produce RE with controls. My problems is that I have missing data and it isn´t possible calculating and including MEANS iv before the regression is running.

    Example:
    t a b
    2005 12 40
    2006 17 30
    2007 13 x
    2008 x 20
    Calculating before running: mean a = 14
    Calculation before running: mean b = 30

    BUT calculation means before running the RE regression is misleading because my calculation doesn´t involve missing data. Missing data would "erase" two lines and technical means (right or real means) would be:
    a = 14,5 (2007 and 2008 deleted)
    b = 35 (2007 and 2008 deleted)

    Manual calucalting is very tricky and so my question:
    Is it possible that I can save mean variables, produced by Stata (under investigation of missing), after running a regression?
    That would I mean that I don´t have to calculate means before.

    I hope you can understand my problem and let me know a practical solution

    Have a nice day,
    Marcus

  • #2
    Marcus:
    welcome to the list.
    First off, please devote some of your time in reading the FAQ and learn how to post more effectively (-search dataex- for posting examples/excerpts of your dataset is one wise step to take)..
    That said:
    -despite its limitations -hausman- test (as well as the literature in your research field) can point you out to -fe- or -re- specification;
    - if you have missing data, Stata will apply listwise deletion to panel_ids with missing values in any variable. Hence, if you cannot retrieve the missing values submitting queries to investigators or the like, you may want to impute them (please, see -help mi-) or linearly interpolate them (please, see -help ipolate-).
    It is also advisable to investigate whether missingness is informative or not before deciding how to fix this issue.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      There are different ways to accomplish your task, for example:
      Code:
      egen misobs = rowmiss(a b)
      by ID: egen a_mean = mean(a) if misobs == 0
      by ID: egen b_mean = mean(b) if misobs == 0
      xtreg y a b a_mean b_mean, re
      Alternatively, you could run the regression without the mean variables first and then use the stored estimation sample to determine the relevant observations:
      Code:
      xtreg y a b, re
      by ID: egen a_mean = mean(a) if e(sample)
      by ID: egen b_mean = mean(b) if e(sample)
      xtreg y a b a_mean b_mean, re
      With regard to time-invariant variables in panel data models, you can find various other discussions here on Statalist:
      Fixed Effects and time-invariant variables
      https://www.kripfganz.de/stata/

      Comment


      • #4
        Hello Carlo,
        hello Sebastian,

        thank you for your comments. I think my description isn´t as good as enough.

        Hausman-test is a nice tool but the problem is inherent which means houseman tends to re with huge samples (Stata also includes an alternative likelihood-ratio test for testing fe against re) - see Wooldrigde, Allison and others.
        FE isn´t a practical solution in my way but the literature tells us FE is the best way for getting "right" structure without bias. So i have to use a Quasi-RE or RE with mean controls (=hybrid, CRE) which are calculating coefficients and errors like FE.
        Imputation and Interpolation are just misleading because of the structure and bias my data.

        I add an example for specifying my problem. In this example there are 48 missing values within x1 and 14 missing values within x2. Stata will be delete listwise:
        48 missing values + 14 missing values = 62 missing values for every variable!

        This is exactly my problem because BEFORE running I generate mean variables (group-mean-centering) for Quasi-RE/RE with controls under investigation of missing values. So I include x1 and mean x1 with 48 missing values and x2 and mean x2 with 14 missing values but listwise deletion moves to x1 and mean x1 with 62 missing values and x2 and mean x2 with 62 missing values. This is the reason why my Quasi-RE or RE with mean controls <> FE.
        What I have to do is calculating all variables with every missing value from the other variables before running. This is very complicated.

        And so my question, can I store mean variables from FE procedure to use this ones for my Quasi-RE or RE with controls?


        Best regards,
        Marcus


        Attached Files

        Comment


        • #5
          Marcus:
          I would say you're making your life harder than necessary.
          You have missing values and Stata behaves as expected.
          You state that interpolation and mulptiple imputation induce biases in your regression (but having non-fixed informative missingness, if that were the case with your dataset, would bias your results as well).
          The proposal might be to report both the regession with and without missing values and comment on the differences in your research report.
          As an aside, please do not attach (by the way, please read the FAQ about attachments. Thanks) spreadsheet, as most of us do not download them, due to the risk of active malwares. Thanks.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Originally posted by Marcus Fiedler View Post
            And so my question, can I store mean variables from FE procedure to use this ones for my Quasi-RE or RE with controls?
            I still believe that my earlier suggestion just does exactly what you want.
            https://www.kripfganz.de/stata/

            Comment


            • #7
              Hello Sebastian,

              I still believe too
              You are right, I tested it with my creating style, i. e. egen A= mean(A), by(ID_Unt), in different data sets - with and without missing values.
              Stata always gives same results.

              Let me add a further question: Like the literatur all coefficients are the same and the errors are different. Many researches mentioned that significance in RE with Controls = FE but within my data it isn´t. Do you have a explanation why not? Maybe procedure problem?

              code: xtreg eAnmeldung_gesamt_ROT_lag eAnmeldung_2013 eAnmeldung_2013_mean UnternehmensAlter_CAP UnternehmensAlter_CAP_mean Ansprueche_Patent_CAP2013 Ansprueche_Patent_CAP2013_mean , re

              eAnmeldung_gesam~g | Coef. Std. Err. z P>|z| [95% Conf. Interval]
              -------------------+----------------------------------------------------------------
              eAnmeldung_2013 | .9604805 .0141822 67.72 0.000 .9326838 .9882771
              eAnmeldung_2013_~n | -.1490229 .0159141 -9.36 0.000 -.180214 -.1178317
              UnternehmensAlte~P | -3.190285 1.121075 -2.85 0.004 -5.38755 -.9930188
              UnternehmensAlte~n | 3.149205 1.12189 2.81 0.005 .9503412 5.348069
              Ansprueche_Pa~2013 | .2975801 1.537106 0.19 0.846 -2.715092 3.310253
              Ansprueche_Paten~n | -.0756286 2.427935 -0.03 0.975 -4.834294 4.683037
              _cons | 5.946127 5.482549 1.08 0.278 -4.799472 16.69172

              code: xtreg eAnmeldung_gesamt_ROT_lag eAnmeldung_2013 UnternehmensAlter_CAP Ansprueche_Patent_CAP2013 , fe

              eAnmeldung_gesam~g | Coef. Std. Err. z P>|z| [95% Conf. Interval]
              -------------------+----------------------------------------------------------------
              zenteAnmeldung_2~3 | .9604805 .0180785 53.13 0.000 .9250472 .9959137
              zentUnternehmens~P | -3.190283 1.429068 -2.23 0.026 -5.991204 -.389362
              zentAnspruech~2013 | .2975803 1.959396 0.15 0.879 -3.542765 4.137925
              _cons | 106.6609 16.9874 6.28 0.000 73.36621 139.9556

              Like the literatur all coefficients are the same and the errors are different

              Best regards,
              Marcus


              Comment


              • #8
                Hello agian,

                can anybody delete my last post? There are mistakes included, sorry.
                The right example: FE vs. QRE vs CRE

                xtreg eAnmeldung_gesamt_ROT_lag eAnmeldung_2013 UnternehmensAlter_CAP Ansprueche_Patent_CAP2013 , i(ID_Unt) fe
                eAnmeldung_gesamt_RO~g | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                -----------------------+----------------------------------------------------------------
                eAnmeldung_2013 | .9604805 .01486 64.64 0.000 .9312978 .9896632
                UnternehmensAlter_CAP | -3.190284 1.17465 -2.72 0.007 -5.497112 -.883457
                Ansprueche_Patent~2013 | .2975802 1.610564 0.18 0.853 -2.865312 3.460473
                _cons | 291.9372 115.2268 2.53 0.012 65.64994 518.2244

                mixed eAnmeldung_gesamt_ROT_lag eAnmeldung_2013_z UnternehmensAlter_CAP_z Ansprueche_Patent_CAP2013_z, || ID_Unt:,
                eAnmeldung_gesamt_RO~g | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                -----------------------+----------------------------------------------------------------
                eAnmeldung_2013_z | .9604805 .0148204 64.81 0.000 .9314331 .9895279
                UnternehmensAlter_CA~z | -3.190284 1.171519 -2.72 0.006 -5.486419 -.8941494
                Ansprueche_Patent_CA~z | .2975802 1.60627 0.19 0.853 -2.850651 3.445812
                _cons | 105.4913 29.95113 3.52 0.000 46.78819 164.1945
                ...

                mixed eAnmeldung_gesamt_ROT_lag eAnmeldung_2013 eAnmeldung_2013_mean UnternehmensAlter_CAP UnternehmensAlter_CAP_mean Ansprueche_Patent_CAP2013 Ansprueche_Patent_CAP2013_mean, || ID_Unt:,
                eAnmeldung_gesamt_RO~g | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                -----------------------+----------------------------------------------------------------
                eAnmeldung_2013 | .9604805 .014112 68.06 0.000 .9328214 .9881396
                eAnmeldung_2013_mean | -.1490229 .0158354 -9.41 0.000 -.1800596 -.1179861
                UnternehmensAlter_CAP | -3.190285 1.115527 -2.86 0.004 -5.376677 -1.003893
                UnternehmensAlter_CA~n | 3.149205 1.116338 2.82 0.005 .9612229 5.337187
                Ansprueche_Patent~2013 | .2975801 1.529499 0.19 0.846 -2.700183 3.295344
                Ansprueche_Patent_CA~n | -.0756286 2.41592 -0.03 0.975 -4.810745 4.659488
                _cons | 5.946127 5.455417 1.09 0.276 -4.746294 16.63855
                ...

                Let me add a further question: Like the literatur all coefficients are the same and the errors are different. Many researches mentioned that significance don´t vary in RE with Controls and FE but within my data it isn´t. Do you have a explanation why not?

                Best regards,
                Marcus

                Comment


                • #9
                  Markus:
                  your message is diffcult to read.
                  Please post what you typed and what Stata gave you back via CODE delimiters (see FAQ on that topic). Thanks.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Hello Carlo,

                    I checked my regressions and centered variables producing same results like variable+mean variable.
                    Using FE linear regression producing "same" results - deviations are very small.

                    FE
                    . xtreg eAnmeldung_gesamt_ROT_lag eAnmeldung_2013 UnternehmensAlter_CAP Ansprueche_Patent_CAP20
                    > 13 Selbst_Vorw2013 AlterVorw2013 Vorw_Patent2013 AlterRueck2013 Groesse_IMPADOC2013 FuE_Umsat
                    > z TsQ Risiko_NEU ROA Bilanzsumme_ROT , i(ID_Unt) fe vce(robust)

                    Fixed-effects (within) regression Number of obs = 709
                    Group variable: ID_Unt Number of groups = 93

                    R-sq: Obs per group:
                    within = 0.8914 min = 1
                    between = 0.9844 avg = 7.6
                    overall = 0.9521 max = 9

                    F(12,92) = .
                    corr(u_i, Xb) = -0.7010 Prob > F = .

                    (Std. Err. adjusted for 93 clusters in ID_Unt)
                    -------------------------------------------------------------------------------------------
                    | Robust
                    eAnmeldung_gesamt_ROT_lag | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                    --------------------------+----------------------------------------------------------------
                    eAnmeldung_2013 | .9559886 .0445846 21.44 0.000 .8674397 1.044538
                    UnternehmensAlter_CAP | -.3083685 .5470274 -0.56 0.574 -1.394812 .7780751
                    Ansprueche_Patent_CAP2013 | -.1538362 .7700064 -0.20 0.842 -1.683136 1.375463
                    Selbst_Vorw2013 | -9.698364 15.88519 -0.61 0.543 -41.24771 21.85099
                    AlterVorw2013 | 4.412097 2.950669 1.50 0.138 -1.448187 10.27238
                    Vorw_Patent2013 | -1.300005 2.939579 -0.44 0.659 -7.138263 4.538253
                    AlterRueck2013 | -1.253429 .5782712 -2.17 0.033 -2.401926 -.1049329
                    Groesse_IMPADOC2013 | 20.41099 10.26867 1.99 0.050 .0165373 40.80545
                    FuE_Umsatz | -1286.19 888.7834 -1.45 0.151 -3051.39 479.011
                    TsQ | 4.041007 8.375762 0.48 0.631 -12.59398 20.67599
                    Risiko_NEU | -60.14005 55.47918 -1.08 0.281 -170.3265 50.0464
                    ROA | 113.5713 45.21094 2.51 0.014 23.77845 203.3641
                    Bilanzsumme_ROT | -3.01e-11 2.63e-11 -1.14 0.255 -8.24e-11 2.21e-11
                    _cons | 64.43248 82.11338 0.78 0.435 -98.65177 227.5167

                    Quasi-FE (same results with CRE):
                    . mixed eAnmeldung_gesamt_ROT_lag eAnmeldung_2013_z UnternehmensAlter_CAP_z Ansprueche_Patent_
                    > CAP2013_z Selbst_Vorw2013_z AlterVorw2013_z Vorw_Patent2013_z AlterRueck2013_z Groesse_IMPADO
                    > C2013_z FuE_Umsatz_z TsQ_z Risiko_NEU_z ROA_z Bilanzsumme_ROT_z , || ID_Unt:, vce(robust)

                    Performing EM optimization:

                    Performing gradient-based optimization:

                    Iteration 0: log pseudolikelihood = -4241.2987
                    Iteration 1: log pseudolikelihood = -4241.2987

                    Computing standard errors:

                    Mixed-effects regression Number of obs = 709
                    Group variable: ID_Unt Number of groups = 93

                    Obs per group:
                    min = 1
                    avg = 7.6
                    max = 9

                    Wald chi2(12) = .
                    Log pseudolikelihood = -4241.2987 Prob > chi2 = .

                    (Std. Err. adjusted for 93 clusters in ID_Unt)
                    ---------------------------------------------------------------------------------------------
                    | Robust
                    eAnmeldung_gesamt_ROT_lag | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                    ----------------------------+----------------------------------------------------------------
                    eAnmeldung_2013_z | .9559886 .0441734 21.64 0.000 .8694103 1.042567
                    UnternehmensAlter_CAP_z | -.3083681 .541982 -0.57 0.569 -1.370633 .753897
                    Ansprueche_Patent_CAP2013_z | -.1538362 .7629044 -0.20 0.840 -1.649101 1.341429
                    Selbst_Vorw2013_z | -9.698364 15.73867 -0.62 0.538 -40.54559 21.14887
                    AlterVorw2013_z | 4.412097 2.923454 1.51 0.131 -1.317767 10.14196
                    Vorw_Patent2013_z | -1.300005 2.912466 -0.45 0.655 -7.008335 4.408324
                    AlterRueck2013_z | -1.253429 .5729376 -2.19 0.029 -2.376366 -.1304923
                    Groesse_IMPADOC2013_z | 20.41099 10.17395 2.01 0.045 .4704108 40.35158
                    FuE_Umsatz_z | -1286.19 880.5858 -1.46 0.144 -3012.106 439.7269
                    TsQ_z | 4.041007 8.29851 0.49 0.626 -12.22377 20.30579
                    Risiko_NEU_z | -60.14005 54.96748 -1.09 0.274 -167.8743 47.59423
                    ROA_z | 113.5713 44.79395 2.54 0.011 25.77675 201.3658
                    Bilanzsumme_ROT_z | -3.01e-11 2.61e-11 -1.16 0.248 -8.12e-11 2.10e-11
                    _cons | 105.4736 30.1092 3.50 0.000 46.46068 164.4866
                    ---------------------------------------------------------------------------------------------

                    ------------------------------------------------------------------------------
                    | Robust
                    Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
                    -----------------------------+------------------------------------------------
                    ID_Unt: Identity |
                    var(_cons) | 82495.12 46210.28 27518.27 247306.4
                    -----------------------------+------------------------------------------------
                    var(Residual) | 4926.054 1837.542 2371.263 10233.37

                    MIXED is only a basic for myself. I am using MENBREG (random interceopt), because it isn´t a linear model, and the results has a "greater" deviation than with MIXED but I can look over
                    For dealing with heteroscedasticity I am using vce(robust) = vce(cluster ID_Unt).
                    For dealing with heterogenity I include a lagged dependent variable as a control.
                    My data has, from a logical view, an n-order autoregressive structure.

                    Does anybody know how I can deal with n-order autoregressive?
                    Do you a test for n-order autoregressive in multiple panels?

                    Durbin-Watson has different problems:
                    1. From STATA: "dwstat sample may not include multiple panels r(459);"
                    2. It isn´t prossible to include an lagged dependent variable as a control.
                    3. DWSTATA only can first order.

                    Best regards,
                    Marcus

                    Comment


                    • #11
                      Hello again,

                      I wanna give you an update.
                      Looking for FE = CRE = Hybrid I tested models (always same variables) with xtreg, poisson and nbreg and their effects are:
                      xtreg poisson nbreg
                      FE = centered RE FE = centered RE (suggestion, because STATA only has conditioal FE) FE = centered RE (suggestion, because STATA only has conditioal FE)
                      Hybrid = CRE Hybrid = CRE Hybrid = CRE
                      centered RE = centered RE with mean controls centered RE = centered RE with mean controls NO! Why?
                      BE = centered RE with mean controls BE = centered RE with mean controls NO! Why?
                      centered RE = Hybrid centered RE = Hybrid NO! Why?
                      FE = Hybrd = CRE FE = Hybrd = CRE (suggestion, because STATA only has conditioal FE, centered RE = Hybrid >>> Hybrid = FE) NO! Why?
                      I cannot add my results exactly, they need to much space.

                      Do you have an explanation? Why there are so many problems with nbreg?
                      Why does linear or poisson fit and nbreg not?

                      Thank yu for your help.

                      Best regards,
                      Marcus

                      Comment

                      Working...
                      X