Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variance of Fixed effect residuals

    Hi all, I am performing the following fixed effect regression on an unbalanced panel model:
    Code:
    quietly xtreg tasso_crescita_sales_prod L.log_sales L.dummy_2 L2.dummy_2 L3.dummy_2 mean_gr_rate_atc2 recalls_sales ageprodcat1 ageprodcat2 ageprodcat3 ageprodcat4 newmolfirm newmolmarket i.Year, fe vce(cluster idpr)
    I would like to compute the variance of all the residuals and store It into a variable.
    Code:
    e(sigma_e) // or e(sigma_u)
    do not seem to do what I want...specifically: I constructed a variable consisting of the residuals of the estimation (variable av_it):

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(idproduct Year av_it)
      8 2004          .
      8 2015          .
     22 2004          .
     22 2005          .
     22 2006          .
     22 2007          0
     22 2008          .
     22 2009          .
     22 2010          .
     22 2011          .
     41 2004          .
     41 2005          .
     41 2006          .
     41 2007    .303154
     41 2008 -.14690971
     41 2009 .006593704
     41 2010   .2429447
     41 2011  -.1774826
     41 2012  -.8590889
     41 2013  .12342644
     41 2014  .50736046
     41 2015          .
     44 2005          .
     44 2006          .
     44 2007          .
     44 2008  -.9983988
     44 2009  .12073898
     44 2010  -.5993109
     44 2011          .
     44 2012 -1.4308786
     44 2013    1.45887
     44 2014  1.4698343
     44 2015 -.02085495
     62 2004          .
     62 2005          .
     62 2006          .
     62 2007          .
     62 2008          .
     62 2009          .
     62 2010          .
     62 2011          .
     62 2012          .
     62 2013          .
     62 2014          .
     62 2015          0
     99 2004          .
     99 2005          .
     99 2006          .
     99 2007          .
     99 2008          .
     99 2009          .
     99 2010          .
    107 2004          .
    107 2005          .
    107 2006          .
    107 2007          .
    107 2008          .
    107 2009   2.132491
    107 2010          .
    107 2011          .
    107 2012          .
    107 2013          .
    107 2014          .
    107 2015  -2.132491
    108 2004          .
    108 2005          .
    108 2006          .
    108 2007          .
    108 2008          .
    108 2009          .
    108 2010          .
    108 2011          .
    108 2012          .
    108 2013          .
    108 2014          .
    108 2015          .
    114 2004          .
    114 2005          .
    114 2006          .
    114 2007  -.7792645
    114 2008  .26324368
    114 2009   .1756668
    114 2010   .4246168
    114 2011  -.0842619
    114 2012          .
    114 2013          .
    114 2014          .
    114 2015          .
    130 2004          .
    130 2005          .
    130 2006          .
    130 2007   .6940765
    130 2008  -.6289883
    130 2009   .6858149
    130 2010          .
    130 2011  -.4151192
    130 2012   -.335784
    130 2013          .
    130 2014          .
    130 2015          .
    end
    What I would like to do is to compute Var(av_it) and then average it over time.
    Last edited by Federico Nutarelli; 29 Oct 2019, 16:39.

  • #2
    You may use - scalar - to save both values, then include them wherever you wish. I fail to understand the reason of saving a variable with just one (repeated) value in the whole dataset.

    I’m not with my Stata now, hence this is untested, but I gather - gen myvar = e(sigma_u) - would do the trick as well.
    Best regards,

    Marcos

    Comment


    • #3
      Thank you Marcos for the replay,

      I fail to understand the reason of saving a variable with just one (repeated) value in the whole dataset.
      indeed I would like to store a value for each one of the residuals. That's what I am struggling to do

      Comment


      • #4
        I do understand "what" you want to get. But I do not understand "why" you wish to create a variable with all observations having the same values. That is the reason I recommended to use - scalar - instead.

        That said, keep in mind that, if you need the variance you just have to square the SDs.

        Last but not least, shall you wish to focus on the variance of the residuals, please keep in mind that they are some sort of "nuisance" under xtreg, whereas they thrive in hierarchical "mixed" models as well as structural equation models.

        Please take a look at the example below, where I present the way to get both variables, the use of scalars and the estimations under both xtreg and mixed.

        Hopefully that helps.


        Code:
         
        
        . webuse nlswork
        (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
        
        . xtset idcode
               panel variable:  idcode (unbalanced)
        
        . xtreg ln_wage union c.age
        
        Random-effects GLS regression                   Number of obs     =     19,229
        Group variable: idcode                          Number of groups  =      4,150
        
        R-sq:                                           Obs per group:
             within  = 0.0955                                         min =          1
             between = 0.0504                                         avg =        4.6
             overall = 0.0610                                         max =         12
        
                                                        Wald chi2(2)      =    1808.17
        corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               union |   .1305272   .0066639    19.59   0.000     .1174661    .1435883
                 age |   .0147793    .000396    37.32   0.000     .0140031    .0155556
               _cons |    1.22913   .0139745    87.96   0.000      1.20174    1.256519
        -------------+----------------------------------------------------------------
             sigma_u |  .38505558
             sigma_e |  .26213464
                 rho |  .68331727   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . gen sig_u = e(sigma_u)
        
        . gen sig_e = e(sigma_e)
        
        . gen sig2_u = sig_u^2
        
        . gen sig2_e = sig_e^2
        
        . list sig_u sig_e sig2_u sig2_e in 1
        
             +-------------------------------------------+
             |    sig_u      sig_e     sig2_u     sig2_e |
             |-------------------------------------------|
          1. | .3850556   .2621346   .1482678   .0687146 |
             +-------------------------------------------+
        
        
        
        . mixed ln_wage union c.age  || idcode:,
        
        Performing EM optimization:
        
        Performing gradient-based optimization:
        
        Iteration 0:   log likelihood = -6157.0931  
        Iteration 1:   log likelihood = -6157.0931  
        
        Computing standard errors:
        
        Mixed-effects ML regression                     Number of obs     =     19,229
        Group variable: idcode                          Number of groups  =      4,150
        
                                                        Obs per group:
                                                                      min =          1
                                                                      avg =        4.6
                                                                      max =         12
        
                                                        Wald chi2(2)      =    1806.54
        Log likelihood = -6157.0931                     Prob > chi2       =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               union |   .1307709   .0066669    19.61   0.000     .1177039    .1438379
                 age |   .0147737   .0003963    37.28   0.000      .013997    .0155504
               _cons |   1.229306   .0139682    88.01   0.000     1.201929    1.256683
        ------------------------------------------------------------------------------
        
        ------------------------------------------------------------------------------
          Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
        -----------------------------+------------------------------------------------
        idcode: Identity             |
                          var(_cons) |   .1471143   .0037929      .1398649    .1547393
        -----------------------------+------------------------------------------------
                       var(Residual) |   .0690614   .0007978      .0675153     .070643
        ------------------------------------------------------------------------------
        LR test vs. linear model: chibar2(01) = 11681.89      Prob >= chibar2 = 0.0000
        
        . mixed ln_wage union c.age  || idcode:, stddev
        
        Performing EM optimization:
        
        Performing gradient-based optimization:
        
        Iteration 0:   log likelihood = -6157.0931  
        Iteration 1:   log likelihood = -6157.0931  
        
        Computing standard errors:
        
        Mixed-effects ML regression                     Number of obs     =     19,229
        Group variable: idcode                          Number of groups  =      4,150
        
                                                        Obs per group:
                                                                      min =          1
                                                                      avg =        4.6
                                                                      max =         12
        
                                                        Wald chi2(2)      =    1806.54
        Log likelihood = -6157.0931                     Prob > chi2       =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               union |   .1307709   .0066669    19.61   0.000     .1177039    .1438379
                 age |   .0147737   .0003963    37.28   0.000      .013997    .0155504
               _cons |   1.229306   .0139682    88.01   0.000     1.201929    1.256683
        ------------------------------------------------------------------------------
        
        ------------------------------------------------------------------------------
          Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
        -----------------------------+------------------------------------------------
        idcode: Identity             |
                           sd(_cons) |   .3835548   .0049445      .3739852    .3933692
        -----------------------------+------------------------------------------------
                        sd(Residual) |   .2627954    .001518       .259837    .2657875
        ------------------------------------------------------------------------------
        LR test vs. linear model: chibar2(01) = 11681.89      Prob >= chibar2 = 0.0000
        
        . scalar var_u = sig2_u
        
        . scalar var_e = sig2_e
        
        . scalar list
             var_e =  .06871457
             var_u =  .14826779
        Best regards,

        Marcos

        Comment


        • #5
          In the example above, it's a random-effects models. This message is just to add that the command is the same for the fixed-effects models (as you wanted):

          Code:
          . xtreg ln_wage union c.age, fe
          
          Fixed-effects (within) regression               Number of obs     =     19,229
          Group variable: idcode                          Number of groups  =      4,150
          
          R-sq:                                           Obs per group:
               within  = 0.0963                                         min =          1
               between = 0.0433                                         avg =        4.6
               overall = 0.0562                                         max =         12
          
                                                          F(2,15077)        =     803.76
          corr(u_i, Xb)  = 0.0127                         Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
               ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                 union |   .1055274   .0070879    14.89   0.000     .0916342    .1194205
                   age |   .0153507   .0004157    36.92   0.000     .0145358    .0161656
                 _cons |   1.248435   .0132474    94.24   0.000     1.222468    1.274401
          -------------+----------------------------------------------------------------
               sigma_u |  .42353003
               sigma_e |  .26213464
                   rho |  .72302816   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(4149, 15077) = 10.12                Prob > F = 0.0000
          
          . gen sig_u = e(sigma_u)
          
          . gen sig_e = e(sigma_e)
          
          . gen sig2_u = sig_u^2
          
          . gen sig2_e = sig_e^2
          
          . list sig_u sig_e sig2_u sig2_e in 1
          
               +-----------------------------------------+
               |  sig_u      sig_e     sig2_u     sig2_e |
               |-----------------------------------------|
            1. | .42353   .2621346   .1793777   .0687146 |
               +-----------------------------------------+
          
          . scalar var_u = sig2_u
          
          . scalar var_e = sig2_e
           
          . scalar list var_u var_e
               var_u =  .17937769
               var_e =  .06871457
          Best regards,

          Marcos

          Comment


          • #6
            That is very detailed explanation. Thank you a lot.
            Well the point is I am trying to reach something like this:

            Code:
            gen var_avt=(av_t-r(mean))^2/(r(N)-1)
            So a residual specific variance (whose specificity is given by the presence of "av_t"). So that's why I am trying to encode it in a variable . I think that
            Code:
             
             gen sig_e = e(sigma_e)
            gives the overall residual standard deviation. I guess it is the standard deviation of the mean residual or something like that. My doubts with respect to
            Code:
            gen var_avt=(av_t-r(mean))^2/(r(N)-1)
            are if I need to do it by(id) or by(time variable) or if it is fine to leave it as such.

            Thanks again!
            Last edited by Federico Nutarelli; 30 Oct 2019, 08:07.

            Comment


            • #7
              Originally posted by Federico Nutarelli View Post
              That is very detailed explanation. Thank you a lot.
              Well the point is I am trying to reach something like this:

              Code:
              gen var_avt=(av_t-r(mean))^2/(r(N)-1)
              So a residual specific variance (whose specificity is given by the presence of "av_t"). So that's why I am trying to encode it in a variable . I think that
              Code:
              gen sig_e = e(sigma_e)
              gives the overall residual standard deviation. I guess it is the standard deviation of the mean residual or something like that. My doubts with respect to
              Code:
              gen var_avt=(av_t-r(mean))^2/(r(N)-1)
              are if I need to do it by(id) or by(time variable) or if it is fine to leave it as such.

              Thanks again!
              Better, to quote "Econometric Analysis of Cross Section and Panel Datahttps://jrvargas.files.wordpress.com › 2011/01 › wooldridge_j-_2002_eco..." (pag. 271-272) what I would like to obtain are the (u_hat_it)^2 and not the (sigma_u)

              Comment

              Working...
              X