Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data Regression and ANOVA

    Hi,

    I am using panel data for US manufacturing companies for the period 2000-19. I am focusing on the impact of international diversification (or GeoGraphic Segment Diversification ie GSD) on Performance (or EBIT_ROA). I have divided my data into 3 eras i.e. pre-crisis period 2001-06 (era=1), crisis period 2007-09 (era=2) and post-crisis period (era=3). Based on my analysis, the margins impact of GSD on performance does not differ significantly in the 3 eras.

    I would like to now check whether the level of GSD itself varies in the 3 areas. I did this analysis using xtreg as shown below. My interpretation is that GSD varies significantly across the 3 eras. I would like to check if there is any way to do this analysis for panel data using ANOVA. Thank you.

    Code:
    . xtreg Ln_GSD l1.era2 l1.era3 if  CoAge>=0 & NATION=="UNITED STATES" & NATIONCODE==840 & FSTS>=1
    > 0 & GENERALINDUSTRYCLASSIFICATION ==1 & Year_<2020 & Year_<YearInactive & Discr_GS_Rev!=1, fe c
    > luster(n_CUSIP)
    
    Fixed-effects (within) regression               Number of obs     =     26,796
    Group variable: n_CUSIP                         Number of groups  =      3,563
    
    R-sq:                                           Obs per group:
         within  = 0.0203                                         min =          1
         between = 0.0000                                         avg =        7.5
         overall = 0.0022                                         max =         19
    
                                                    F(2,3562)         =      54.03
    corr(u_i, Xb)  = -0.0417                        Prob > F          =     0.0000
    
                                (Std. Err. adjusted for 3,563 clusters in n_CUSIP)
    ------------------------------------------------------------------------------
                 |               Robust
          Ln_GSD |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            era2 |
             L1. |   .0624621   .0080676     7.74   0.000     .0466444    .0782798
                 |
            era3 |
             L1. |   .0999507    .009785    10.21   0.000     .0807659    .1191355
                 |
           _cons |  -.4679418    .004862   -96.24   0.000    -.4774745   -.4584091
    -------------+----------------------------------------------------------------
         sigma_u |  .60796916
         sigma_e |  .26823125
             rho |  .83706486   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . test l1.era2 l1.era3
    
     ( 1)  L.era2 = 0
     ( 2)  L.era3 = 0
    
           F(  2,  3562) =   54.03
                Prob > F =    0.0000

  • #2
    Deepika:
    I'm not sure I'm following you right.
    In the following toy-example I'm trying to replicate your reserch path as far as I understood it:
    Code:
    use "https://www.stata-press.com/data/r16/nlswork.dta"
    . g lag1age=L1.age
    (17,643 missing values generated)
    
    . g lag2age=L2.age
    (15,300 missing values generated)
    
    . g lag3age=L3.age
    (17,566 missing values generated)
    
    . xtreg ln_wage lag1age lag2age lag3age, fe vce(cluster idcode )
    
    Fixed-effects (within) regression               Number of obs     =      1,950
    Group variable: idcode                          Number of groups  =      1,020
    
    R-sq:                                           Obs per group:
         within  = 0.0274                                         min =          1
         between = 0.0305                                         avg =        1.9
         overall = 0.0273                                         max =          3
    
                                                    F(3,1019)         =       6.84
    corr(u_i, Xb)  = 0.0091                         Prob > F          =     0.0001
    
                                 (Std. Err. adjusted for 1,020 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         lag1age |   .0243573   .0322829     0.75   0.451    -.0389913    .0877059
         lag2age |    .026116    .027086     0.96   0.335    -.0270347    .0792667
         lag3age |  -.0261597   .0304249    -0.86   0.390    -.0858622    .0335429
           _cons |    1.11441   .1442403     7.73   0.000     .8313679    1.397452
    -------------+----------------------------------------------------------------
         sigma_u |  .36626556
         sigma_e |  .14609876
             rho |  .86272996   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . test lag1age= lag2age= lag3age
    
     ( 1)  lag1age - lag2age = 0
     ( 2)  lag1age - lag3age = 0
    
           F(  2,  1019) =    0.66
                Prob > F =    0.5168
    
    .
    That said, ANOVA is basically -regress- and if you have panel data, as it seems from your description, I do not think you can go any far by replacing -xtreg,fe- with -anova- here.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you, Carlo. I will continue working with -xtreg, fe-. The reasons for my confusion was that in some places, I find ANOVA being used in place of regress (eg. https://stats.idre.ucla.edu/stata/we...al-predictors/) and I was wondering if it can be used for xtreg as well. Thanks for your clarification.

      Comment


      • #4
        Deepika:
        when I attended statistics classes at the university during the pre-computer-so-widespread-era (that is, during 1980's), simple one-way anova model can be calculated by hand. The ratio between variance/within variance to assess whether the mean of the >2 groups under investigation differed in a statistically significant way was also fascinating. In my opinion, these two characteristics made ANOVA so wonderful those days.
        Unfortunately, more demanding anova models (say with two predictors) were pretty unfeasible to calculate and, even more substantive, all the post-estimation procedures were not considered.
        In addition, if I take a look at the number of emails about -anova- on this list before it became a forum, it's 10-fold the one that we can find after 2014 on Stataforum.
        During the first half of 2000 I was also more interested in -anova- than in -regress- basically because OLS was covered very poorly when I was at the university. Becoming more familiar with quants thanks also to Stata, I've convinced myself that there's basically nothing that -anova- can do that -regress- can't do (much) better.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you Carlo. I am new to Stata and I am re-acquainting myself with Statistics (- I have pivoted to academia after a long career as a banker.) However, I enjoy using STATA and I am very impressed with the level of help and support on this forum.

          Comment

          Working...
          X