Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with dummy variable regression

    Hello everyone,

    I am currently working on my bachelor thesis and I am having a problem with output interpretation. I am new to this program as well and I am learning how to use it with the thesis I am writing. I running this regression:

    xi: reg rca_v rca_c i.sector, r

    where rca is the comparative advantage of Vietnam over a 15 years time frame, in 20 group of products, rca china is the same for China and i.sector is the dummies created for these 20 group of products I am considering. I know that the coefficients of the dummy variables are measured in comparison with the sector that stata omitted in order to run the regression, so that for example sector 6 is 6,56 times better at exporting its products, but I cannot insert this interpretation with the country comparison.

    I thank everyone in advance for taking the time to read my question, I would be really grateful to anyone who would help me with the understanding of this problem...thank you!

    Alexia






    [HTML]
    . xi: reg rca_v rca_c i.sector, r
    i.sector _Isector_1-20 (_Isector_1 for sector==animal omitted)
    Linear regression Number of obs = 320
    F(20, 299) = 157.00
    Prob > F = 0.0000
    R-squared = 0.9607
    Root MSE = .7143
    |Robust
    rca_v | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    rca_c | 1.522164 .3141637 4.85 0.000 .9039122 2.140416
    _Isector_2 | -3.916549 .3693484 -10.60 0.000 -4.6434 -3.189697
    _Isector_3 | -2.875985 .2018616 -14.25 0.000 -3.273235 -2.478736
    _Isector_4 | -2.857842 .3585056 -7.97 0.000 -3.563356 -2.152328
    _Isector_5 | -2.182294 .2021623 -10.79 0.000 -2.580135 -1.784453
    _Isector_6 | 6.560848 1.641889 4.00 0.000 3.329726 9.79197
    _Isector_7 | -1.282726 .2865632 -4.48 0.000 -1.846662 -.7187895
    _Isector_8 | -3.006791 .216144 -13.91 0.000 -3.432147 -2.581435
    _Isector_9 | -4.306507 .4645494 -9.27 0.000 -5.220708 -3.392307
    _Isector_10 | -3.447117 .2559917 -13.47 0.000 -3.95089 -2.943343
    _Isector_11 | -2.244465 .2230231 -10.06 0.000 -2.683358 -1.805571
    _Isector_12 | -3.815585 .3967278 -9.62 0.000 -4.596317 -3.034852
    _Isector_13 | -2.842299 .2330736 -12.19 0.000 -3.300971 -2.383626
    _Isector_14 | -.619659 .2875194 -2.16 0.032 -1.185477 -.0538411
    _Isector_15 | -4.45899 .9704257 -4.59 0.000 -6.368719 -2.54926
    _Isector_16 | -2.858558 .2369532 -12.06 0.000 -3.324865 -2.392251
    _Isector_17 | -2.627071 .7526744 -3.49 0.001 -4.108281 -1.145861
    _Isector_18 | -2.614191 .210671 -12.41 0.000 -3.028777 -2.199605
    _Isector_19 | .0892393 .2573679 0.35 0.729 -.4172426 .5957213
    _Isector_20 | -2.670938 .2141246 -12.47 0.000 -3.09232 -2.249556
    _cons | 2.38135 .2390579 9.96 0.000 1.910901 2.851799
    /HTML]


  • #2
    Alexia:
    welcome to this forum.
    Some comments about your query:
    - if you're using a pretty recent Stata release, please note that the -xi-prefix is definitely redundant if you use -fvvarlist- notation;
    - more substantively: if your dataset is actually composef of a cross-sectional (20 groups of products) and a time-series dimension (15 years), you should consider a panel data regression via -xtreg-.
    As an aside, for the future use CODE delimiters (# symbol) instead <>(HTML) to share what you typed and what Stata gave you back. Thanks.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Alexia:
      welcome to this forum.
      Some comments about your query:
      - if you're using a pretty recent Stata release, please note that the -xi-prefix is definitely redundant if you use -fvvarlist- notation;
      - more substantively: if your dataset is actually composef of a cross-sectional (20 groups of products) and a time-series dimension (15 years), you should consider a panel data regression via -xtreg-.
      As an aside, for the future use CODE delimiters (# symbol) instead <>(HTML) to share what you typed and what Stata gave you back. Thanks.
      Thank you very much for your kind answer!
      I just used xtreg insead of the simple commad "reg" (therefore: xtreg: rca_v rca_c i.sector ) and the output that I got considers random effects automatically. I tried running the same regression adding "fe" after the comma, so it would be xtreg: rca_v rca_c i.sector, fe , but the output omits all the variables because of collinearity, so I suppose that the right model uses random effects. I ran an hausman test to verify, with xtreg: rca_v rca_c i.sector, fe and xtreg: rca_v rca_c i.sector, re, but the Chi2 is minor than 0, I suppose I am doing something wrong...

      Thank you,

      Kind regards,

      Alexia

      Code:
      hausman fe re
      
          Coefficients ----
      |    (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
      |    fe           re         Difference          S.E.
          
      rca_c |    1.522164     1.522164        8.88e-16               .
          
          b = consistent under Ho and Ha; obtained from xtreg
      B =    inconsistent under Ha, efficient under Ho; obtained from xtreg
      
      Test:  Ho:    difference in coefficients not systematic
      
          chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
          =    -0.00    chi2<0 ==> model fitted on these
          data fails to meet the asymptotic
          assumptions of the Hausman test;
          see suest for a generalized test

      Comment


      • #4
        Alexia:
        some comments about your query:
        - as you surmise, -xtreg- considers -re- specification by default;
        - as you know,, -fe- machinery wipes out all time-invariant predictors (and those collinear with the fixed effect);
        - you do not report anything about the F-test appearing as a footnote of the -xtreg,fe- outcome table: if it lacks statistical significance, you should switch to a pooled OLS;
        - the way you compare -fe- vs -re- specification via -hausman- is correct. The nasty -hausman- output is not unusual; try to re-run -hausman- with the -sigmamore- option and see what happens.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Alexia:
          some comments about your query:
          - as you surmise, -xtreg- considers -re- specification by default;
          - as you know,, -fe- machinery wipes out all time-invariant predictors (and those collinear with the fixed effect);
          - you do not report anything about the F-test appearing as a footnote of the -xtreg,fe- outcome table: if it lacks statistical significance, you should switch to a pooled OLS;
          - the way you compare -fe- vs -re- specification via -hausman- is correct. The nasty -hausman- output is not unusual; try to re-run -hausman- with the -sigmamore- option and see what happens.
          Thank you very much again!I tried to re run the hausman test with the sigmamore option but the result unfortunately is the same...

          Both of the outcome tables seem to be statistically significant. Before running a model with the variable "i.sector" and therefore having all the dummies created and specified in the output, I ran the simple regression xtreg: rca_v rca_c, fe and compared it with the same model with random effects -xtreg,re-, and the hausman test then rejected the null hypotesis - so it suggested me to use the fixed effect model - (the variable rca_j, where j is the country =v, c, is a matrix that includes the revealed comparative advantage for 20 sector in the year 2000, then the same but for year 2001 etc..).

          Sorry for the long answer, thank you very much for your time and for your precious help.

          Kind regards,

          Alexia

          Code:
          . xi: xtreg rca_v rca_c i.sector, re
          i.sector          _Isector_1-20       (_Isector_1 for sector==animal omitted)
          
          Random-effects GLS regression                   Number of obs     =        320
          Group variable: id                              Number of groups  =         20
          
          R-sq:                                           Obs per group:
               within  = 0.3229                                         min =         16
               between = 1.0000                                         avg =       16.0
               overall = 0.9607                                         max =         16
          
                                                          Wald chi2(20)     =    7309.23
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
          
          ------------------------------------------------------------------------------
                 rca_v |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                 rca_c |   1.522164   .1274647    11.94   0.000     1.272338     1.77199
            _Isector_2 |  -3.916549    .280524   -13.96   0.000    -4.466366   -3.366732
            _Isector_3 |  -2.875985   .2525529   -11.39   0.000     -3.37098   -2.380991
            _Isector_4 |  -2.857842   .2790836   -10.24   0.000    -3.404836   -2.310848
            _Isector_5 |  -2.182294   .2525897    -8.64   0.000     -2.67736   -1.687227
            _Isector_6 |   6.560848   .5925776    11.07   0.000     5.399417    7.722279
            _Isector_7 |  -1.282726   .2554231    -5.02   0.000    -1.783346   -.7821057
            _Isector_8 |  -3.006791   .2545469   -11.81   0.000    -3.505694   -2.507888
            _Isector_9 |  -4.306507   .3026671   -14.23   0.000    -4.899724   -3.713291
           _Isector_10 |  -3.447117   .2605991   -13.23   0.000    -3.957881   -2.936352
           _Isector_11 |  -2.244465   .2529246    -8.87   0.000    -2.740188   -1.748742
           _Isector_12 |  -3.815585   .2861533   -13.33   0.000    -4.376435   -3.254734
           _Isector_13 |  -2.842299   .2566227   -11.08   0.000     -3.34527   -2.339328
           _Isector_14 |   -.619659   .2541948    -2.44   0.015    -1.117872   -.1214464
           _Isector_15 |   -4.45899   .4507773    -9.89   0.000    -5.342497   -3.575483
           _Isector_16 |  -2.858558   .2557288   -11.18   0.000    -3.359777   -2.357339
           _Isector_17 |  -2.627071    .382199    -6.87   0.000    -3.376167   -1.877975
           _Isector_18 |  -2.614191   .2535559   -10.31   0.000    -3.111151    -2.11723
           _Isector_19 |   .0892393   .2526777     0.35   0.724    -.4059999    .5844785
           _Isector_20 |  -2.670938   .2541767   -10.51   0.000    -3.169115   -2.172761
                 _cons |    2.38135   .1858658    12.81   0.000      2.01706     2.74564
          -------------+----------------------------------------------------------------
               sigma_u |          0
               sigma_e |  .71430351
                   rho |          0   (fraction of variance due to u_i)
          --------------------------------------------------------------------------
          Code:
          . xi: xtreg rca_v rca_c i.sector, fe
          i.sector          _Isector_1-20       (_Isector_1 for sector==animal omitted)
          note: _Isector_2 omitted because of collinearity
          note: _Isector_3 omitted because of collinearity
          note: _Isector_4 omitted because of collinearity
          note: _Isector_5 omitted because of collinearity
          note: _Isector_6 omitted because of collinearity
          note: _Isector_7 omitted because of collinearity
          note: _Isector_8 omitted because of collinearity
          note: _Isector_9 omitted because of collinearity
          note: _Isector_10 omitted because of collinearity
          note: _Isector_11 omitted because of collinearity
          note: _Isector_12 omitted because of collinearity
          note: _Isector_13 omitted because of collinearity
          note: _Isector_14 omitted because of collinearity
          note: _Isector_15 omitted because of collinearity
          note: _Isector_16 omitted because of collinearity
          note: _Isector_17 omitted because of collinearity
          note: _Isector_18 omitted because of collinearity
          note: _Isector_19 omitted because of collinearity
          note: _Isector_20 omitted because of collinearity
          
          Fixed-effects (within) regression               Number of obs     =        320
          Group variable: id                              Number of groups  =         20
          
          R-sq:                                           Obs per group:
               within  = 0.3229                                         min =         16
               between = 0.5804                                         avg =       16.0
               overall = 0.5615                                         max =         16
          
                                                          F(1,299)          =     142.61
          corr(u_i, Xb)  = 0.3410                         Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
                 rca_v |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                 rca_c |   1.522164   .1274647    11.94   0.000     1.271323    1.773006
            _Isector_2 |          0  (omitted)
            _Isector_3 |          0  (omitted)
            _Isector_4 |          0  (omitted)
            _Isector_5 |          0  (omitted)
            _Isector_6 |          0  (omitted)
            _Isector_7 |          0  (omitted)
            _Isector_8 |          0  (omitted)
            _Isector_9 |          0  (omitted)
           _Isector_10 |          0  (omitted)
           _Isector_11 |          0  (omitted)
           _Isector_12 |          0  (omitted)
           _Isector_13 |          0  (omitted)
           _Isector_14 |          0  (omitted)
           _Isector_15 |          0  (omitted)
           _Isector_16 |          0  (omitted)
           _Isector_17 |          0  (omitted)
           _Isector_18 |          0  (omitted)
           _Isector_19 |          0  (omitted)
           _Isector_20 |          0  (omitted)
                 _cons |   .2824759   .1487423     1.90   0.059    -.0102384    .5751903
          -------------+----------------------------------------------------------------
               sigma_u |  2.4015857
               sigma_e |  .71430351
                   rho |  .91872535   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(19, 299) = 159.83                   Prob > F = 0.0000

          Comment


          • #6
            Alexia:
            thanks for providing Stata stuff.
            My educated guess is that your model are probably overfitted: a between R-sq=1 for -re- is particularly stunning in this respect.
            I would start it all over aging trying to collect more predictors.
            That said,the main issue is that you seemingky have a T>N panel dataset (ie, the cross-sectional dimension<time-series dimension): if I'm correct at this diagnosis, you should switch to Stata commans suited for long panel datasets, such as -xtgls-.
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Originally posted by Carlo Lazzaro View Post
              Alexia:
              thanks for providing Stata stuff.
              My educated guess is that your model are probably overfitted: a between R-sq=1 for -re- is particularly stunning in this respect.
              I would start it all over aging trying to collect more predictors.
              That said,the main issue is that you seemingky have a T>N panel dataset (ie, the cross-sectional dimension<time-series dimension): if I'm correct at this diagnosis, you should switch to Stata commans suited for long panel datasets, such as -xtgls-.
              Thank you Carlo, I am sure it is so, I used this model because it is particularly difficult to retrieve datas for Vietnam (for example, total factor productivity per sector/group of products, etc), but I realize that these datas alone are not enough for a well fitted model. I will try to collect more predictors.
              My T has 15 observations and my N has 20... thank you very much for your help, it was really needed and appreciated, I hope you have a good day!

              Kind regards,

              Alexia

              Comment


              • #8
                Alexia:
                with N still a bit>T, you can stick with-xtreg-.
                Perhaps you should consider using clustered standard errors to take autocorrelation into account.
                As you noticed, the issue of a scant number of predictors is the main problem here.
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment

                Working...
                X