Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Normality of dependent variable in panel data

    Hi,

    I want to perform a panel data regression analysis. I have a dataset of 40 countries for 15 years. I plotted my dependent variable and found that the dependent variable is not normally distributed. I know that one of the ways to achieve normality is by taking the natural log of the dependent variable. However, I wanted to ask if the variable actually needs to be normally distributed? Somehow I could not convince myself that it is absolutely necessary for the dependent variable to be normally distributed in a panel data. I would be thankful for any help and guidance on this!

  • #2
    Assuming you want to use OLS regression / xtreg, normality of the dep. var is nice to have but not a must. The internet is full of discussion about that, just google it a bit (https://www.researchgate.net/post/Is...ly_distributed). That said, maybe OLS is not the best way for your var? Maybe tell us what you are measuring or show some plots so we get an impression of your data. Otherwise, if taking the log gives you a nice and normal var, I do not see a reason not to use it (given that I know nothing more about your var at the moment).
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      Originally posted by Felix Bittmann View Post
      Assuming you want to use OLS regression / xtreg, normality of the dep. var is nice to have but not a must. The internet is full of discussion about that, just google it a bit (https://www.researchgate.net/post/Is...ly_distributed). That said, maybe OLS is not the best way for your var? Maybe tell us what you are measuring or show some plots so we get an impression of your data. Otherwise, if taking the log gives you a nice and normal var, I do not see a reason not to use it (given that I know nothing more about your var at the moment).
      Thanks for your response! I am measuring unemployment rates of different countries. I am using fixed effects regression model and I am accounting for heteroskedasticity through the heteroskedastic robust standard errors. I am not using pooled OLS.

      Comment


      • #4
        Himani:
        if you actually have panel data and go -regress-, the -robust- option does not take the within panel correlation of your observations into account and treat them as independent (ignoring the panel data structure of your data).
        You should switch to -vce(cluster panelid)- instead.
        That said, as recommended by the FAQ, things would be easier if you post what you typed and what Stata gave you back. Thanks.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Himani:
          if you actually have panel data and go -regress-, the -robust- option does not take the within panel correlation of your observations into account and treat them as independent (ignoring the panel data structure of your data).
          You should switch to -vce(cluster panelid)- instead.
          That said, as recommended by the FAQ, things would be easier if you post what you typed and what Stata gave you back. Thanks.
          Hi Carlo,

          Thank for your response. Sorry for the lack of details. I am doing a fixed effects regression model where I have 40 countries for 15 years. My dependent variable is unemployment rate .My command for it is as follows:

          xtreg unemployment (X regressors) i.Country i.Year i.Country1##c.Year, fe vce(cluster Country)

          Hence, I control for heteroskedasticity and autocorrelation in my regression model. My main query is if I have to ensure the normality of my dependent variable, unemployment rate. The unemployment rate belongs to different countries for different years and hence it is not normally distributed. In order to get correct results, in general , is it necessary to ensure that the dependent variable is normal? I searched about it on the internet and it seems like the answer is "no", but I am still getting confused about it. Thank you.

          Comment


          • #6
            Himani:
            normality is a (weak) requirement for residual distribution only.
            That said:
            - if you have -xtset- your data with -Country- as -panelid-, why including -i:Country- in the right hand-side of your regression equation?
            - what's the meaning of including -Year- as both categorical and continuous regressor?
            Eventually the previous recommendation (as per FAQ) to post what you typed and what Stata gave you back still applies Thanks.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Originally posted by Carlo Lazzaro View Post
              Himani:
              normality is a (weak) requirement for residual distribution only.
              That said:
              - if you have -xtset- your data with -Country- as -panelid-, why including -i:Country- in the right hand-side of your regression equation?
              - what's the meaning of including -Year- as both categorical and continuous regressor?
              Eventually the previous recommendation (as per FAQ) to post what you typed and what Stata gave you back still applies Thanks.
              Hi Carlo,

              Thanks for our response. For the first point, yes, I do not need to add the i.Country1 since I am doing fixed effects and it automatically gives me the id fixed effects.
              For the second point, Year is included as time dummies(hence, categorical) and the term, i.Country1##c.Year are included as time trends (1 for every country) and hence Year is continuous when considering time trends.
              I have also attached the output for your reference this time. Thanks a lot!
              ------------------------------------------------------------------------------------------------------
              . egen Country1=group(Country)

              .
              . xtset Country1 Year,yearly
              panel variable: Country1 (strongly balanced)
              time variable: Year, 2000 to 2018
              delta: 1 year

              .
              .
              .
              . gen BB1=BB/1000000
              (27 missing values generated)

              . xtreg UTA HC BB1 FD Pop i.Country1 i.Year i.Country1##c.Year,fe vce(cluster Country1)
              note: 2.Country1 omitted because of collinearity
              note: 3.Country1 omitted because of collinearity
              note: 4.Country1 omitted because of collinearity
              note: 5.Country1 omitted because of collinearity
              note: 6.Country1 omitted because of collinearity
              note: 7.Country1 omitted because of collinearity
              note: 8.Country1 omitted because of collinearity
              note: 9.Country1 omitted because of collinearity
              note: 10.Country1 omitted because of collinearity
              note: 11.Country1 omitted because of collinearity
              note: 12.Country1 omitted because of collinearity
              note: 13.Country1 omitted because of collinearity
              note: 14.Country1 omitted because of collinearity
              note: 15.Country1 omitted because of collinearity
              note: 16.Country1 omitted because of collinearity
              note: 17.Country1 omitted because of collinearity
              note: 18.Country1 omitted because of collinearity
              note: 19.Country1 omitted because of collinearity
              note: 20.Country1 omitted because of collinearity
              note: 21.Country1 omitted because of collinearity
              note: 22.Country1 omitted because of collinearity
              note: 23.Country1 omitted because of collinearity
              note: 24.Country1 omitted because of collinearity
              note: 25.Country1 omitted because of collinearity
              note: 26.Country1 omitted because of collinearity
              note: 27.Country1 omitted because of collinearity
              note: 28.Country1 omitted because of collinearity
              note: 29.Country1 omitted because of collinearity
              note: 30.Country1 omitted because of collinearity
              note: 31.Country1 omitted because of collinearity
              note: 32.Country1 omitted because of collinearity
              note: 33.Country1 omitted because of collinearity
              note: 34.Country1 omitted because of collinearity
              note: 35.Country1 omitted because of collinearity
              note: 36.Country1 omitted because of collinearity
              note: 37.Country1 omitted because of collinearity
              note: 38.Country1 omitted because of collinearity
              note: 39.Country1 omitted because of collinearity
              note: 40.Country1 omitted because of collinearity
              note: 41.Country1 omitted because of collinearity
              note: 42.Country1 omitted because of collinearity
              note: Year omitted because of collinearity

              Fixed-effects (within) regression Number of obs = 696
              Group variable: Country1 Number of groups = 42

              R-sq: Obs per group:
              within = 0.6703 min = 12
              between = 0.2352 avg = 16.6
              overall = 0.1419 max = 18

              F(21,41) = .
              corr(u_i, Xb) = -1.0000 Prob > F = .

              (Std. Err. adjusted for 42 clusters in Country1)
              ---------------------------------------------------------------------------------
              | Robust
              UTA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
              ----------------+----------------------------------------------------------------
              HC | -4.676685 2.180334 -2.14 0.038 -9.079959 -.2734105
              BB1 | .0437888 .0317947 1.38 0.176 -.020422 .1079996
              FD | .0007307 .0007847 0.93 0.357 -.0008541 .0023155
              Pop| -.6092847 .4089752 -1.49 0.144 -1.435227 .2166576
              |
              Country1 |
              2 | 0 (omitted)
              3 | 0 (omitted)
              4 | 0 (omitted)
              5 | 0 (omitted)
              6 | 0 (omitted)
              7 | 0 (omitted)
              8 | 0 (omitted)
              9 | 0 (omitted)
              10 | 0 (omitted)
              11 | 0 (omitted)
              12 | 0 (omitted)
              13 | 0 (omitted)
              14 | 0 (omitted)
              15 | 0 (omitted)
              16 | 0 (omitted)
              17 | 0 (omitted)
              18 | 0 (omitted)
              19 | 0 (omitted)
              20 | 0 (omitted)
              21 | 0 (omitted)
              22 | 0 (omitted)
              23 | 0 (omitted)
              24 | 0 (omitted)
              25 | 0 (omitted)
              26 | 0 (omitted)
              27 | 0 (omitted)
              28 | 0 (omitted)
              29 | 0 (omitted)
              30 | 0 (omitted)
              31 | 0 (omitted)
              32 | 0 (omitted)
              33 | 0 (omitted)
              34 | 0 (omitted)
              35 | 0 (omitted)
              36 | 0 (omitted)
              37 | 0 (omitted)
              38 | 0 (omitted)
              39 | 0 (omitted)
              40 | 0 (omitted)
              41 | 0 (omitted)
              42 | 0 (omitted)
              |
              Year |
              2001 | -.1952207 .2280089 -0.86 0.397 -.655694 .2652526
              2002 | -.1966589 .2317793 -0.85 0.401 -.6647467 .271429
              2003 | .164012 .2776475 0.59 0.558 -.3967085 .7247326
              2004 | .0180518 .3075936 0.06 0.953 -.6031461 .6392497
              2005 | -.2736581 .3532516 -0.77 0.443 -.9870643 .439748
              2006 | -.8547991 .3835659 -2.23 0.031 -1.629426 -.0801721
              2007 | -1.283025 .4222257 -3.04 0.004 -2.135727 -.4303226
              2008 | -1.634142 .3938418 -4.15 0.000 -2.429521 -.8387618
              2009 | -.6105343 .3959356 -1.54 0.131 -1.410142 .1890739
              2010 | -.5005376 .4033419 -1.24 0.222 -1.315103 .3140278
              2011 | -.6435924 .3680321 -1.75 0.088 -1.386848 .0996636
              2012 | -.3768544 .3704019 -1.02 0.315 -1.124896 .3711873
              2013 | -.3017373 .3932904 -0.77 0.447 -1.096003 .4925289
              2014 | -.5574258 .4027407 -1.38 0.174 -1.370777 .2559256
              2015 | -1.013585 .4516883 -2.24 0.030 -1.925788 -.1013822
              2016 | -1.526365 .5184349 -2.94 0.005 -2.573365 -.4793641
              2017 | -2.127781 .5755326 -3.70 0.001 -3.290093 -.9654697
              |
              Year | 0 (omitted)
              |
              Country1#c.Year |
              2 | .2385705 .0469372 5.08 0.000 .1437789 .3333621
              3 | .1908602 .0326708 5.84 0.000 .1248802 .2568401
              4 | .1162035 .0390429 2.98 0.005 .0373548 .1950523
              5 | .2623986 .0533175 4.92 0.000 .1547218 .3700754
              6 | .4052331 .0805052 5.03 0.000 .2426496 .5678167
              7 | .7810907 .0494137 15.81 0.000 .6812976 .8808837
              8 | .100915 .0280542 3.60 0.001 .0442584 .1575717
              9 | .2521241 .0476898 5.29 0.000 .1558126 .3484357
              10 | .1183214 .0586837 2.02 0.050 -.0001927 .2368354
              11 | .2251814 .0549353 4.10 0.000 .1142373 .3361256
              12 | .0751454 .062816 1.20 0.238 -.0517141 .2020049
              13 | -.1629682 .0614669 -2.65 0.011 -.287103 -.0388334
              14 | 1.225554 .0519606 23.59 0.000 1.120618 1.330491
              15 | .0591233 .0516546 1.14 0.259 -.0451953 .1634419
              16 | .270799 .0645895 4.19 0.000 .1403579 .4012401
              17 | .3094958 .0594255 5.21 0.000 .1894836 .429508
              18 | .3938101 .0539832 7.30 0.000 .2847889 .5028313
              19 | -.0022528 .0756756 -0.03 0.976 -.1550829 .1505772
              20 | .2295271 .0562484 4.08 0.000 .1159311 .3431231
              21 | .1027779 .0481674 2.13 0.039 .0055018 .200054
              22 | -.0065416 .0493304 -0.13 0.895 -.1061662 .0930831
              23 | .3952148 .1137134 3.48 0.001 .1655659 .6248637
              24 | .1020943 .0915495 1.12 0.271 -.0827937 .2869824
              25 | .2058541 .0766968 2.68 0.010 .0509618 .3607464
              26 | .4610483 .0490938 9.39 0.000 .3619015 .5601952
              27 | .2286578 .039286 5.82 0.000 .149318 .3079976
              28 | .1745403 .0295912 5.90 0.000 .1147796 .234301
              29 | .1576823 .0561821 2.81 0.008 .0442201 .2711444
              30 | -.7348902 .0418542 -17.56 0.000 -.8194164 -.650364
              31 | -.0457858 .0571643 -0.80 0.428 -.1612314 .0696599
              32 | .5076547 .0479479 10.59 0.000 .4108221 .6044874
              33 | .2181217 .0577357 3.78 0.001 .1015221 .3347212
              34 | .5318886 .1693027 3.14 0.003 .1899748 .8738024
              35 | .3337262 .068911 4.84 0.000 .1945577 .4728947
              36 | .4056563 .0473992 8.56 0.000 .3099318 .5013809
              37 | .4238595 .0557037 7.61 0.000 .3113636 .5363554
              38 | .2547221 .0531261 4.79 0.000 .1474318 .3620124
              39 | .2385692 .0378104 6.31 0.000 .1622095 .3149288
              40 | .1551168 .0597813 2.59 0.013 .034386 .2758476
              41 | -.0898687 .1842088 -0.49 0.628 -.461886 .2821486
              42 | .0503888 .0454926 1.11 0.274 -.0414854 .142263
              |
              _cons | -417.1883 92.74888 -4.50 0.000 -604.4984 -229.8781
              ----------------+----------------------------------------------------------------
              sigma_u | 568.34893
              sigma_e | 1.0597447
              rho | .99999652 (fraction of variance due to u_i)
              ---------------------------------------------------------------------------------

              ------------------------------------------------------------------------------------------------------

              Comment


              • #8
                Originally posted by Carlo Lazzaro View Post
                Himani:
                normality is a (weak) requirement for residual distribution only.
                That said:
                - if you have -xtset- your data with -Country- as -panelid-, why including -i:Country- in the right hand-side of your regression equation?
                - what's the meaning of including -Year- as both categorical and continuous regressor?
                Eventually the previous recommendation (as per FAQ) to post what you typed and what Stata gave you back still applies Thanks.
                Also, is residual distribution not affected by the distribution of the dependent variable in which case the distribution of the dependent variable becomes important.

                Comment


                • #9
                  Himany:
                  with 696 observations, residual normality is not an issue.
                  As far as your interaction is concerned, as you do not have the linear term for -County- (due to the collinearity with -panelid-), I do not think that it is informative.
                  That said, I would re-run the regression without -i.County- and with categorical -i.Year- and see what Stata gives you back.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Originally posted by Carlo Lazzaro View Post
                    Himany:
                    with 696 observations, residual normality is not an issue.
                    As far as your interaction is concerned, as you do not have the linear term for -County- (due to the collinearity with -panelid-), I do not think that it is informative.
                    That said, I would re-run the regression without -i.County- and with categorical -i.Year- and see what Stata gives you back.
                    Hi Carlo,

                    Thanks for your response. The results differ this time implying that the omitted variables might make a difference in time trends. Thanks you!

                    . egen Country1=group(Country)

                    . xtset Country1 Year,yearly
                    panel variable: Country1 (strongly balanced)
                    time variable: Year, 2000 to 2018
                    delta: 1 year

                    . gen BB1=BB/1000000
                    (27 missing values generated)

                    . xtreg UTA HC BB1 FD Pop i.Year ,fe vce(cluster Country1)

                    Fixed-effects (within) regression Number of obs = 696
                    Group variable: Country1 Number of groups = 42

                    R-sq: Obs per group:
                    within = 0.2171 min = 12
                    between = 0.1464 avg = 16.6
                    overall = 0.1700 max = 18

                    F(21,41) = 9.08
                    corr(u_i, Xb) = -0.0918 Prob > F = 0.0000

                    (Std. Err. adjusted for 42 clusters in Country1)
                    ------------------------------------------------------------------------------
                    | Robust
                    UTA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                    HC | -1.255373 1.451126 -0.87 0.392 -4.185982 1.675235
                    BB1 | -.0141203 .0167748 -0.84 0.405 -.0479978 .0197571
                    FD | .0040062 .0043229 0.93 0.359 -.004724 .0127365
                    Pop | -1.128107 .5515876 -2.05 0.047 -2.24206 -.0141529
                    |
                    Year |
                    2001 | .0418953 .2798289 0.15 0.882 -.5232306 .6070212
                    2002 | .1500829 .2882968 0.52 0.605 -.4321443 .7323102
                    2003 | .6327896 .377685 1.68 0.101 -.1299608 1.39554
                    2004 | .7775548 .3484674 2.23 0.031 .0738106 1.481299
                    2005 | .6872495 .3464599 1.98 0.054 -.0124405 1.38694
                    2006 | .3309535 .3631662 0.91 0.367 -.4024756 1.064383
                    2007 | .128128 .4167739 0.31 0.760 -.7135639 .9698199
                    2008 | .0024826 .4049075 0.01 0.995 -.8152447 .82021
                    2009 | 1.113937 .3942821 2.83 0.007 .3176682 1.910206
                    2010 | 1.334925 .4331057 3.08 0.004 .4602505 2.2096
                    2011 | 1.329069 .520583 2.55 0.014 .2777306 2.380408
                    2012 | 1.803349 .5965465 3.02 0.004 .5985985 3.008099
                    2013 | 2.065362 .6834191 3.02 0.004 .6851691 3.445555
                    2014 | 2.006568 .674044 2.98 0.005 .6453085 3.367827
                    2015 | 1.722273 .6931554 2.48 0.017 .3224175 3.122129
                    2016 | 1.393074 .6905086 2.02 0.050 -.0014361 2.787585
                    2017 | 1.036102 .7078948 1.46 0.151 -.3935203 2.465725
                    |
                    _cons | 8.173861 4.492838 1.82 0.076 -.8996091 17.24733
                    -------------+----------------------------------------------------------------
                    sigma_u | 1.9689641
                    sigma_e | 1.5792198
                    rho | .60853379 (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------

                    Comment


                    • #11
                      Himani:
                      I would try:
                      Code:
                      xtreg UTA HC BB1 FD Pop c.Year##c.Year,fe vce(cluster Country1)
                      This code explores whether a non-linear relationship exists between regressand and -Year- within each panel.

                      I would also explore in the postestimation session, whether your model is correctly specified.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Originally posted by Carlo Lazzaro View Post
                        Himani:
                        I would try:
                        Code:
                        xtreg UTA HC BB1 FD Pop c.Year##c.Year,fe vce(cluster Country1)
                        This code explores whether a non-linear relationship exists between regressand and -Year- within each panel.

                        I would also explore in the postestimation session, whether your model is correctly specified.
                        Hi Carlo,

                        Thanks for your response. I just verified using your code that no non linear relationship exists between the regressand and Year within each panel.
                        For the postestimation session, what are the tests that are required to be performed? Is there something I could refer to for the process? Thanks a lot for all your help.

                        Comment


                        • #13
                          Himani:
                          after estimating your regression model, you should calculate -fitted- and -generate- -sq_fitted-.
                          Then you can run:
                          - an augmented regression;
                          - an auxiliary regression (-fitted- and -sq_fitted- as the only regressors):
                          Code:
                          . use "https://www.stata-press.com/data/r16/nlswork.dta"
                          (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
                          
                          . xtreg ln_wage c.age##c.age, fe vce(cluster idcode)
                          
                          Fixed-effects (within) regression               Number of obs     =     28,510
                          Group variable: idcode                          Number of groups  =      4,710
                          
                          R-sq:                                           Obs per group:
                               within  = 0.1087                                         min =          1
                               between = 0.1006                                         avg =        6.1
                               overall = 0.0865                                         max =         15
                          
                                                                          F(2,4709)         =     507.42
                          corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000
                          
                                                       (Std. Err. adjusted for 4,710 clusters in idcode)
                          ------------------------------------------------------------------------------
                                       |               Robust
                               ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                   age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
                                       |
                           c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
                                       |
                                 _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
                          -------------+----------------------------------------------------------------
                               sigma_u |   .4039153
                               sigma_e |  .30245467
                                   rho |  .64073314   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          
                          . predict fitted, xb
                          (24 missing values generated)
                          
                          . g sq_fitted=fitted^2
                          (24 missing values generated)
                          
                          . xtreg ln_wage c.age##c.age fitted sq_fitted , fe vce(cluster idcode)
                          note: c.age#c.age omitted because of collinearity
                          
                          Fixed-effects (within) regression               Number of obs     =     28,510
                          Group variable: idcode                          Number of groups  =      4,710
                          
                          R-sq:                                           Obs per group:
                               within  = 0.1105                                         min =          1
                               between = 0.1029                                         avg =        6.1
                               overall = 0.0882                                         max =         15
                          
                                                                          F(3,4709)         =     355.44
                          corr(u_i, Xb)  = 0.0411                         Prob > F          =     0.0000
                          
                                                       (Std. Err. adjusted for 4,710 clusters in idcode)
                          ------------------------------------------------------------------------------
                                       |               Robust
                               ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                   age |   .0184474    .004408     4.19   0.000     .0098057    .0270891
                                       |
                           c.age#c.age |          0  (omitted)
                                       |
                                fitted |   6.920927   1.152074     6.01   0.000     4.662324    9.179531
                             sq_fitted |  -2.079755   .4060541    -5.12   0.000    -2.875811   -1.283699
                                 _cons |  -4.586115   .8935105    -5.13   0.000    -6.337813   -2.834416
                          -------------+----------------------------------------------------------------
                               sigma_u |  .40319282
                               sigma_e |  .30215883
                                   rho |  .64035936   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          
                          . test sq_fitted
                          
                           ( 1)  sq_fitted = 0
                          
                                 F(  1,  4709) =   26.23
                                      Prob > F =    0.0000
                          
                          . xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)
                          
                          Fixed-effects (within) regression               Number of obs     =     28,510
                          Group variable: idcode                          Number of groups  =      4,710
                          
                          R-sq:                                           Obs per group:
                               within  = 0.1092                                         min =          1
                               between = 0.1033                                         avg =        6.1
                               overall = 0.0881                                         max =         15
                          
                                                                          F(2,4709)         =     523.09
                          corr(u_i, Xb)  = 0.0467                         Prob > F          =     0.0000
                          
                                                       (Std. Err. adjusted for 4,710 clusters in idcode)
                          ------------------------------------------------------------------------------
                                       |               Robust
                               ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                fitted |   2.569185   .7085064     3.63   0.000     1.180181    3.958189
                             sq_fitted |    -.47432   .2153021    -2.20   0.028    -.8964128   -.0522272
                                 _cons |  -1.290258    .580562    -2.22   0.026    -2.428431   -.1520844
                          -------------+----------------------------------------------------------------
                               sigma_u |    .403403
                               sigma_e |  .30238578
                                   rho |  .64025357   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          
                          . test sq_fitted
                          
                           ( 1)  sq_fitted = 0
                          
                                 F(  1,  4709) =    4.85
                                      Prob > F =    0.0276
                          
                          .
                          No matter the approach, -test- outcome, reaching statistical significance, tells us the model is misspecified (as expected, since withi one predictor only it's hard to give a fair and true view of the data generating process under investigation).
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            Originally posted by Carlo Lazzaro View Post
                            Himani:
                            after estimating your regression model, you should calculate -fitted- and -generate- -sq_fitted-.
                            Then you can run:
                            - an augmented regression;
                            - an auxiliary regression (-fitted- and -sq_fitted- as the only regressors):
                            Code:
                            . use "https://www.stata-press.com/data/r16/nlswork.dta"
                            (National Longitudinal Survey. Young Women 14-26 years of age in 1968)
                            
                            . xtreg ln_wage c.age##c.age, fe vce(cluster idcode)
                            
                            Fixed-effects (within) regression Number of obs = 28,510
                            Group variable: idcode Number of groups = 4,710
                            
                            R-sq: Obs per group:
                            within = 0.1087 min = 1
                            between = 0.1006 avg = 6.1
                            overall = 0.0865 max = 15
                            
                            F(2,4709) = 507.42
                            corr(u_i, Xb) = 0.0440 Prob > F = 0.0000
                            
                            (Std. Err. adjusted for 4,710 clusters in idcode)
                            ------------------------------------------------------------------------------
                            | Robust
                            ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                            age | .0539076 .004307 12.52 0.000 .0454638 .0623515
                            |
                            c.age#c.age | -.0005973 .000072 -8.30 0.000 -.0007384 -.0004562
                            |
                            _cons | .639913 .0624195 10.25 0.000 .5175415 .7622845
                            -------------+----------------------------------------------------------------
                            sigma_u | .4039153
                            sigma_e | .30245467
                            rho | .64073314 (fraction of variance due to u_i)
                            ------------------------------------------------------------------------------
                            
                            . predict fitted, xb
                            (24 missing values generated)
                            
                            . g sq_fitted=fitted^2
                            (24 missing values generated)
                            
                            . xtreg ln_wage c.age##c.age fitted sq_fitted , fe vce(cluster idcode)
                            note: c.age#c.age omitted because of collinearity
                            
                            Fixed-effects (within) regression Number of obs = 28,510
                            Group variable: idcode Number of groups = 4,710
                            
                            R-sq: Obs per group:
                            within = 0.1105 min = 1
                            between = 0.1029 avg = 6.1
                            overall = 0.0882 max = 15
                            
                            F(3,4709) = 355.44
                            corr(u_i, Xb) = 0.0411 Prob > F = 0.0000
                            
                            (Std. Err. adjusted for 4,710 clusters in idcode)
                            ------------------------------------------------------------------------------
                            | Robust
                            ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                            age | .0184474 .004408 4.19 0.000 .0098057 .0270891
                            |
                            c.age#c.age | 0 (omitted)
                            |
                            fitted | 6.920927 1.152074 6.01 0.000 4.662324 9.179531
                            sq_fitted | -2.079755 .4060541 -5.12 0.000 -2.875811 -1.283699
                            _cons | -4.586115 .8935105 -5.13 0.000 -6.337813 -2.834416
                            -------------+----------------------------------------------------------------
                            sigma_u | .40319282
                            sigma_e | .30215883
                            rho | .64035936 (fraction of variance due to u_i)
                            ------------------------------------------------------------------------------
                            
                            . test sq_fitted
                            
                            ( 1) sq_fitted = 0
                            
                            F( 1, 4709) = 26.23
                            Prob > F = 0.0000
                            
                            . xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)
                            
                            Fixed-effects (within) regression Number of obs = 28,510
                            Group variable: idcode Number of groups = 4,710
                            
                            R-sq: Obs per group:
                            within = 0.1092 min = 1
                            between = 0.1033 avg = 6.1
                            overall = 0.0881 max = 15
                            
                            F(2,4709) = 523.09
                            corr(u_i, Xb) = 0.0467 Prob > F = 0.0000
                            
                            (Std. Err. adjusted for 4,710 clusters in idcode)
                            ------------------------------------------------------------------------------
                            | Robust
                            ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                            fitted | 2.569185 .7085064 3.63 0.000 1.180181 3.958189
                            sq_fitted | -.47432 .2153021 -2.20 0.028 -.8964128 -.0522272
                            _cons | -1.290258 .580562 -2.22 0.026 -2.428431 -.1520844
                            -------------+----------------------------------------------------------------
                            sigma_u | .403403
                            sigma_e | .30238578
                            rho | .64025357 (fraction of variance due to u_i)
                            ------------------------------------------------------------------------------
                            
                            . test sq_fitted
                            
                            ( 1) sq_fitted = 0
                            
                            F( 1, 4709) = 4.85
                            Prob > F = 0.0276
                            
                            .
                            No matter the approach, -test- outcome, reaching statistical significance, tells us the model is misspecified (as expected, since withi one predictor only it's hard to give a fair and true view of the data generating process under investigation).
                            Hi Carlo,

                            Thanks for your detailed response! One question I am facing about the code is why can't we in infer the statistical significance from the p value of the regression? Why do we need to use the test command after regression? Is the p value in the regression not enough? Thanks a lot for your help!

                            Comment


                            • #15
                              Himani:
                              the test is intended to explore the functional form of the regressand (see -linktest- entry in Stata .pdf manual).
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X