Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Maria:
    1) under -xtreg- , you use robustified/cluistered standard errors when you suspect heteroskedasticity and/or autocorrelation (although the latter usually does not bite that harder with a large N, small T panel datasets,for which -xtreg- is appropriate);
    2) -help xtoverid- (now that you've installed -xtoverid- you can use -help- instead of -search- to select its helpfile) will point you out to the necessary reference.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #47
      thank you very much.
      when I try to check for heteroskedasticity with
      Code:
       rvfplot
      or
      Code:
       hettest
      in my regression (xtreg) i will not work. it only works with regress....which is not the same, right? so far I could not find any information on that here.

      Comment


      • #48
        Maria:
        you can visually inspect your residual distribution (ei) to check for heteroskedasticity.
        As you can see from the following toy-example, with many observations even a minimal departure from normality can formally reject the null, whereas the visual inspection (with a superimposed normal plot) looks more reassuring:
        Code:
        . use http://www.stata-press.com/data/r15/nlswork.dta
        (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
        
        . xtreg ln_wage i.race tenure, vce(robust)
        
        Random-effects GLS regression                   Number of obs     =     28,101
        Group variable: idcode                          Number of groups  =      4,699
        
        R-sq:                                           Obs per group:
             within  = 0.0972                                         min =          1
             between = 0.2079                                         avg =        6.0
             overall = 0.1569                                         max =         15
        
                                                        Wald chi2(3)      =    1797.00
        corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
        
                                     (Std. Err. adjusted for 4,699 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                race |
              black  |  -.1345322   .0120266   -11.19   0.000    -.1581039   -.1109605
              other  |   .1039944    .062132     1.67   0.094     -.017782    .2257708
                     |
              tenure |   .0376405   .0009364    40.20   0.000     .0358052    .0394758
               _cons |    1.59266   .0067239   236.86   0.000     1.579481    1.605838
        -------------+----------------------------------------------------------------
             sigma_u |  .33623102
             sigma_e |  .30357621
                 rho |  .55090591   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . predict e_res, e
        (433 missing values generated)
        
        . sfrancia e_res
        
                          Shapiro-Francia W' test for normal data
        
            Variable |       Obs       W'          V'        z       Prob>z
        -------------+-----------------------------------------------------
               e_res |    28,101    0.92713   1077.971    19.615    0.00001
        
        Note: The normal approximation to the sampling distribution of W'
              is valid for 10<=n<=5000 under the log transformation.
        
        . histogram e_res, normal
        (bin=44, start=-1.8595626, width=.11358524)
        Last edited by Carlo Lazzaro; 29 Dec 2017, 11:11.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #49
          Dear Carlo,
          I have a question about the regression code you suggested for me.
          My old code reads:
          Code:
           xtreg RDlog POST_FINE_DUMMY LENIENCY_DUMMY post_len_inter fine_category fine_cat_inter i.year , fe vce(robust)
          with the variables POST_FINE_DUMMY, LENIENCY_DUMMY, fine_category and the interactions between POST_FINE_DUMMY and LENIENCY_DUMMY, as well as POST_FINE_DUMMY and fine_category

          i used the
          Code:
           g index=0 replace index=1....
          code to recode the fine_cat, so I have it as a facgor variable.
          the new code reads:

          Code:
           xtreg RDlog POST_FINE_DUMMY##LENIENCY_DUMMY POST_FINE_DUMMY##index i.year, fe
          >  vce(robust)
          or
          Code:
          xtreg RDlog i.POST_FINE_DUMMY##i.LENIENCY_DUMMY i.POST_FINE_DUMMY##i.index, f
          > e vce(robust)
          however, with the new code, my post_fine_dummy turns insignificant...
          for the old code, i created the interaction terms by
          Code:
           g POST_FINE_DUMMY * LENIENCY_DUMMY
          the fine_category variable merely got recoded and turned into the index variable. why does the sig of the variables change?

          Comment


          • #50
            Maria:
            change in statistical significance (for what it worths) may depend on the absence of conditional main effects in your previous code.
            Compare in detail the old vs the new regression code and see where they differ.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #51
              Dear Carlo,
              thats strange. the two regressions should be the same.
              One only includes the interaction terms that i created by multiplying the two variables (is that correct to do?)
              and the other one includes the way you suggested it for me using
              Code:
              ##
              also, looking at the output (exported with outreg2), it looks very confusing (just looking at the interaction terms):
              (1) (2)
              VARIABLES RDlog RDlog
              1.POST_FINE_DUMMY 0.105** 0.105**
              (0.0482) (0.0482)
              1o.LENIENCY_DUMMY - -
              0b.POST_FINE_DUMMY#0b.LENIENCY_DUMMY 0 0
              (0) (0)
              0b.POST_FINE_DUMMY#1o.LENIENCY_DUMMY 0 0
              (0) (0)
              1o.POST_FINE_DUMMY#0b.LENIENCY_DUMMY 0 0
              (0) (0)
              1.POST_FINE_DUMMY#1.LENIENCY_DUMMY -0.0354 -0.0354
              (0.0905) (0.0905)
              1o.index - -
              2o.index - -
              0b.POST_FINE_DUMMY#0b.index 0 0
              (0) (0)
              0b.POST_FINE_DUMMY#1o.index 0 0
              (0) (0)
              0b.POST_FINE_DUMMY#2o.index 0 0
              (0) (0)
              1o.POST_FINE_DUMMY#0b.index 0 0
              (0) (0)
              1.POST_FINE_DUMMY#1.index 0.0667 0.0667
              (0.0846) (0.0846)
              1.POST_FINE_DUMMY#2.index 0.0432 0.0432
              (0.113) (0.113)
              1997.year 0.142 0.142
              (0.0955) (0.0955)
              1998.year 0.242*** 0.242***
              (0.0907) (0.0907)
              1999.year 0.225** 0.225**
              (0.0977) (0.0977)
              2000.year 0.399*** 0.399***
              (0.0990) (0.0990)
              2001.year 0.343*** 0.343***
              (0.107) (0.107)
              2002.year 0.320*** 0.320***
              (0.111) (0.111)
              2003.year 0.261** 0.261**
              (0.111) (0.111)
              2004.year 0.256** 0.256**
              (0.113) (0.113)
              2005.year 0.252** 0.252**
              (0.117) (0.117)
              2006.year 0.271** 0.271**
              (0.122) (0.122)
              2007.year 0.240* 0.240*
              (0.129) (0.129)
              2008.year 0.178 0.178
              (0.134) (0.134)
              2009.year 0.166 0.166
              (0.137) (0.137)
              2010.year 0.233 0.233
              (0.142) (0.142)
              2011.year 0.272* 0.272*
              (0.149) (0.149)
              2012.year 0.394** 0.394**
              (0.159) (0.159)
              2013.year 0.136 0.136
              (0.173) (0.173)
              2014.year 0.163 0.163
              (0.182) (0.182)
              2015.year 0.517** 0.517**
              (0.211) (0.211)
              Constant 18.81*** 18.81***
              (0.117) (0.117)
              Observations 1,446 1,446
              R-squared 0.096 0.096
              Number of ID 145 145
              Time FE YES
              Year FE YES

              Robust standard errors in parentheses
              *** p<0.01, ** p<0.05, * p<0.1

              i dont thinl i have to include the year dummies when refering to it in the bottom right?
              is there a better way to extract output tables? what is te prederred and best way?
              thank you

              Comment


              • #52
                Dear Carlo,
                thats strange. the two regressions should be the same.
                One only includes the interaction terms that i created by multiplying the two variables (is that correct to do?)
                and the other one includes the way you suggested it for me using
                Code:
                ##
                also, looking at the output (exported with outreg2), it looks very confusing (just looking at the interaction terms):
                (1) (2)
                VARIABLES RDlog RDlog
                1.POST_FINE_DUMMY 0.105** 0.105**
                (0.0482) (0.0482)
                1o.LENIENCY_DUMMY - -
                0b.POST_FINE_DUMMY#0b.LENIENCY_DUMMY 0 0
                (0) (0)
                0b.POST_FINE_DUMMY#1o.LENIENCY_DUMMY 0 0
                (0) (0)
                1o.POST_FINE_DUMMY#0b.LENIENCY_DUMMY 0 0
                (0) (0)
                1.POST_FINE_DUMMY#1.LENIENCY_DUMMY -0.0354 -0.0354
                (0.0905) (0.0905)
                1o.index - -
                2o.index - -
                0b.POST_FINE_DUMMY#0b.index 0 0
                (0) (0)
                0b.POST_FINE_DUMMY#1o.index 0 0
                (0) (0)
                0b.POST_FINE_DUMMY#2o.index 0 0
                (0) (0)
                1o.POST_FINE_DUMMY#0b.index 0 0
                (0) (0)
                1.POST_FINE_DUMMY#1.index 0.0667 0.0667
                (0.0846) (0.0846)
                1.POST_FINE_DUMMY#2.index 0.0432 0.0432
                (0.113) (0.113)
                Robust standard errors in parentheses
                *** p&lt;0.01, ** p&lt;0.05, * p&lt;0.1

                i dont thinl i have to include the year dummies when refering to it in the bottom right?
                is there a better way to extract output tables? what is te prederred and best way?
                thank you

                Comment


                • #53
                  Dear Carlo,
                  thats strange, the two regressions are the same...
                  for the frst, i just created the interaction terms by multiplying the two variables (is that correct?)
                  for the second i used your suggestion with
                  Code:
                  ##
                  in the first there is a categorial variable measuring the varibale of the fine, and in the second the index i created

                  also,
                  when I do the regression as a RE instead of FE model, two coefficients (that were dropped out in the FE model) become highly significant. However, their interaction terms not. here, the interaction terms are of interest, right?
                  Code:
                   xtreg RDlog POST_FINE_DUMMY##LENIENCY_DUMMY POST_FINE_DUMMY## index i.year, re vce (robust)
                  
                  Random-effects GLS regression                   Number of obs     =      1,446
                  Group variable: ID                              Number of groups  =        145
                  
                  R-sq:                                           Obs per group:
                       within  = 0.0954                                         min =          7
                       between = 0.2532                                         avg =       10.0
                       overall = 0.2934                                         max =         19
                  
                                                                  Wald chi2(26)     =     284.05
                  corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                  
                                                                       (Std. Err. adjusted for 145 clusters in ID)
                  ------------------------------------------------------------------------------------------------
                                                 |               Robust
                                           RDlog |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  -------------------------------+----------------------------------------------------------------
                               1.POST_FINE_DUMMY |   .0937966    .047636     1.97   0.049     .0004317    .1871615
                                1.LENIENCY_DUMMY |  -1.188936   .5741838    -2.07   0.038    -2.314315   -.0635563
                                                 |
                  POST_FINE_DUMMY#LENIENCY_DUMMY |
                                            1 1  |   -.032627   .0906532    -0.36   0.719     -.210304      .14505
                                                 |
                                           index |
                                              1  |  -1.414091   .3144783    -4.50   0.000    -2.030457   -.7977244
                                              2  |  -3.318151   .4542453    -7.30   0.000    -4.208456   -2.427847
                                                 |
                           POST_FINE_DUMMY#index |
                                            1 1  |   .0667658   .0847663     0.79   0.431    -.0993731    .2329047
                                            1 2  |    .043198   .1129048     0.38   0.702    -.1780913    .2644874
                                                 |
                                            year |
                                           1997  |   .1428365   .0964652     1.48   0.139    -.0462318    .3319048
                                           1998  |   .2415162   .0897411     2.69   0.007     .0656268    .4174055
                                           1999  |   .2279303   .0960447     2.37   0.018     .0396862    .4161744
                                           2000  |   .4015464   .0971418     4.13   0.000      .211152    .5919409
                                           2001  |   .3466999   .1053512     3.29   0.001     .1402154    .5531845
                                           2002  |   .3250426   .1089631     2.98   0.003     .1114788    .5386064
                                           2003  |   .2719324   .1089493     2.50   0.013     .0583956    .4854692
                                           2004  |   .2688308   .1111449     2.42   0.016     .0509907    .4866708
                                           2005  |     .26701   .1150357     2.32   0.020     .0415442    .4924758
                                           2006  |   .2871169   .1197246     2.40   0.016      .052461    .5217727
                                           2007  |   .2585619   .1270097     2.04   0.042     .0096274    .5074964
                                           2008  |   .1993598   .1317654     1.51   0.130    -.0588956    .4576152
                                           2009  |    .189391    .134246     1.41   0.158    -.0737264    .4525083
                                           2010  |   .2571608   .1396395     1.84   0.066    -.0165276    .5308492
                                           2011  |   .2989827   .1465978     2.04   0.041     .0116562    .5863091
                                           2012  |   .4223348   .1564944     2.70   0.007     .1156114    .7290581
                                           2013  |   .1673456   .1705159     0.98   0.326    -.1668593    .5015505
                                           2014  |   .1950685   .1806357     1.08   0.280     -.158971     .549108
                                           2015  |   .5490416   .2103324     2.61   0.009     .1367977    .9612854
                                                 |
                                           _cons |   19.86893   .2233757    88.95   0.000     19.43112    20.30674
                  -------------------------------+----------------------------------------------------------------
                                         sigma_u |  1.6148069
                                         sigma_e |  .33627256
                                             rho |  .95843715   (fraction of variance due to u_i)
                  Last edited by Maria Kohnen; 30 Dec 2017, 04:11.

                  Comment


                  • #54
                    I just want to end up with simple regression...haha...this becomes very frustrating.
                    All I want to test is the influence of a fine on the R&D expenses of a firm..

                    I have R&Dlog as DV
                    I have the POST_FINE_DUMMY as a predictor, comparing the period before and after the fine
                    BUT, since the data also contains firms that were granted full leniency and ultimately paid no fine at all, I want to include a LENIENCY_DUMMY with 0=leniency and 1= no lneincy, and creat an interaction between the POST_FINE_DUMMY and the LENIENCY_DUMMY to control for that and see, if the fine is significant
                    further, I want to test if the level of the fine has an impact. I created the categorial variable small,medium and large fine and created the index variable accounting for it with 0,1,2
                    now i creat an interaction term between the index variable and the POST_FINE_DUMMY to check if it is relevant.
                    does that make sense? f yes, what regression to use?

                    in the first model i check:

                    Code:
                     xtreg RDlog POST_FINE_DUMMY i.year, fe vce (robust)
                    just so see if there is an effect. of course I know the full leniency firms are included.. so i check my second model, including leneincy (and also fine, though i could make a third model from this):

                    Code:
                     xtreg RDlog POST_FINE_DUMMY LENIENCY_DUMMY post_len_inter fine_category fine_cat_inter i.year , fe vce(robust)
                    or
                    Code:
                     xtreg RDlog POST_FINE_DUMMY##LENIENCY_DUMMY POST_FINE_DUMMY##index i.year, fe vce (robust)
                    only the POT_FINE_DUMMY on itself is significant.

                    if I do it as a RE model, the index variables are also signifiant. However, the interaction terms are never sig..

                    i don]´þ know what is correct to use..maybe you have a suggestion?

                    thank you,
                    best

                    Comment


                    • #55
                      Maria:
                      1) i would implement the specification (fe or re) supported by -xtoverid- output;
                      2) if the interaction is not significant, you can decide tp remove it;
                      3) taking a methodological decision (and justify it) is often not that easy. However, you're seemingly circling around that issue: hence, break the loop and implement your model!
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #56

                        thank you Carlo,
                        your feedback is much appreciatet.

                        Comment


                        • #57
                          Dear Carlo,

                          I have a question about interpretation of my findings.

                          I ran the regression:

                          Code:
                           xtreg RDlog i.POST_FINE_DUMMY##i.LENIENCY_DUMMY i.POST_FINE_DUMMY##i.index i.year, fe vce(robust)
                          and received the output:

                          Code:
                           . xtreg RDlog i.POST_FINE_DUMMY##i.LENIENCY_DUMMY i.POST_FINE_DUMMY##i.index i.
                          > year, fe vce(robust)
                          note: 1.LENIENCY_DUMMY omitted because of collinearity
                          note: 2.index omitted because of collinearity
                          note: 3.index omitted because of collinearity
                          
                          Fixed-effects (within) regression               Number of obs     =      1,446
                          Group variable: ID                              Number of groups  =        145
                          
                          R-sq:                                           Obs per group:
                               within  = 0.0956                                         min =          7
                               between = 0.0973                                         avg =       10.0
                               overall = 0.0007                                         max =         19
                          
                                                                          F(23,144)         =       7.51
                          corr(u_i, Xb)  = -0.0796                        Prob > F          =     0.0000
                          
                                                             (Std. Err. adjusted for 145 clusters in ID)
                          ------------------------------------------------------------------------------
                                       |               Robust
                                 RDlog |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                          1.POST_FIN~Y |   .1048217   .0482441     2.17   0.031     .0094637    .2001798
                          1.LENIENCY~Y |          0  (omitted)
                                       |
                          POST_FINE_~Y#|
                          LENIENCY_D~Y |
                                  1 1  |  -.0353759     .09053    -0.39   0.697    -.2143153    .1435635
                                       |
                                 index |
                                    2  |          0  (omitted)
                                    3  |          0  (omitted)
                                       |
                          POST_FINE_~Y#|
                                 index |
                                  1 2  |   .0667084   .0845967     0.79   0.432    -.1005034    .2339202
                                  1 3  |   .0432342   .1125363     0.38   0.701    -.1792023    .2656706
                                       |
                                  year |
                                 1997  |   .1420507   .0955104     1.49   0.139    -.0467328    .3308342
                                 1998  |   .2417893   .0907284     2.66   0.009     .0624577    .4211208
                                 1999  |   .2254666   .0977008     2.31   0.022     .0323537    .4185795
                                 2000  |   .3989208   .0990069     4.03   0.000     .2032262    .5946153
                                 2001  |   .3433501   .1072809     3.20   0.002     .1313014    .5553989
                                 2002  |   .3195207   .1108762     2.88   0.005     .1003656    .5386757
                                 2003  |    .260894   .1112029     2.35   0.020     .0410931     .480695
                                 2004  |   .2556291   .1134245     2.25   0.026      .031437    .4798211
                                 2005  |    .252128   .1173151     2.15   0.033     .0202459    .4840101
                                 2006  |   .2709381   .1220501     2.22   0.028     .0296969    .5121794
                                 2007  |   .2402154   .1294255     1.86   0.065    -.0156037    .4960346
                                 2008  |   .1780891   .1343336     1.33   0.187    -.0874314    .4436095
                                 2009  |    .166175   .1366889     1.22   0.226    -.1040009    .4363509
                                 2010  |   .2325389   .1420704     1.64   0.104    -.0482739    .5133518
                                 2011  |   .2720732   .1491341     1.82   0.070    -.0227016    .5668479
                                 2012  |   .3936022   .1586722     2.48   0.014     .0799746    .7072298
                                 2013  |   .1361632   .1726021     0.79   0.431    -.2049979    .4773242
                                 2014  |   .1633292   .1820071     0.90   0.371    -.1964214    .5230799
                                 2015  |   .5168145   .2113234     2.45   0.016     .0991178    .9345112
                                       |
                                 _cons |   18.81188   .1168634   160.97   0.000     18.58089    19.04287
                          -------------+----------------------------------------------------------------
                               sigma_u |  2.0608159
                               sigma_e |  .33627256
                                   rho |  .97406464   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          as my DV is a log variable (RDlog), and the predictor variab POST_FINE_DUMMY a dummy comparing two periods of time, 0= 5 years pre fine, and 1=5 post fine, does the coefficient .1048217 mean that R&D expenses are around 10.5 % higher in the post perdiod?

                          and second,about the index variable I created. I have three categories, small, medium and large fine, 1,2 and 3. should i rather create two dummies and include them in the regression (takes small as a baseline, and include m and l), or is it okay to use this index variable as a categorial variable with three levels?
                          when interpreting this variable in the interaction with the POST_FINE_DUMMY, i guess cat 1 is taken as the baseline? and the regression compares cat 1 and 2 and cat 1 and 3 and sees if there is a sig difference with respect two the periods?

                          or, something else i noticed: when i include allthree categories (s,m,l) as interactiond with the POST_FINE_DUMMY, the whole model changes...
                          i mean, usually you leave out one level of the dummies as the baseline. thats why i thought the small catt would be left out. buisnt the 0 for each individual dummy already the baseline? so, category small: 0=no, 1=yes? while, if taken as a 3 cat variable, small is the baselin. the results however are very , very different...whats the right way?

                          thank you!
                          Last edited by Maria Kohnen; 31 Dec 2017, 01:11.

                          Comment


                          • #58
                            Maria:
                            your interpretation of log-linear model is correct (+10.5%).
                            As far as the query about interactions is concerned, you should not include all the categories to avoid the so called dummy trap.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #59
                              Dear Carlo,

                              I hope you had a nice new years eve. thank you very much for the answer.
                              Unfortunately, the correct log-linear interpretation means that my model is absolute nonsense. In no way did R&D expenses rise by 10% for any of te companies over the two periods I compared...also R_sqaured and adj. R_squared are around 0.08...meaning its a total useless model i guess.....especiyll when you go fixed effect...those should usually become quite high, right?

                              something with my regressions seems off...
                              first of all, i recked my sample data...the data on R&D, taken from compustat and datastream are correct....but theres no chance the R&D expenses got up by 10.5% between thr two periods 3 years prior to the fine and 3 years after the fine....it seems that my design is off.....
                              is it correct to do what i do? creating a dummy variable whic is 0 for the period before the fine and 1 for the period after, and then doing
                              Code:
                               xtreg RDlog POST_FINE_DUMMY, fe vce (robust)
                              to check if there is a signiicant difference in R&D spending between these two periods?

                              thnak you very much,
                              best
                              Last edited by Maria Kohnen; 01 Jan 2018, 10:49.

                              Comment


                              • #60
                                Maria:
                                thanks. I do hope the same was for you and your dears.
                                I would interpret the result of your regression with a further statement: when adjusted for the remaining predictors,
                                Please note that, in multiple regression, the statistical significance (for what itworths) of a given predictor should be considered in the light (ie, adjustmed for) of the remaining ones.
                                If you use your last regression code to check whether a statistical significant difference exists in R&D spending between the periods included in -POST_FINE_DUMMY- you're actually running a regression model which is totally different from the previous one.
                                Again, I would recommend you to take a look at the literature in your research field and see what others did in the past when presented with the same research topic.
                                Kind regards,
                                Carlo
                                (Stata 19.0)

                                Comment

                                Working...
                                X