Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multicollinearity and heteroskedasticity issue in small panel data

    Hi. I am currently working on my dissertation project. I have data of 29 provinces and 11 years. I have 6 independent variables (no categorical variable at all). Initially, I found that my Fixed Effect model has heteroskedasticity and autocorrelation issues. I tried to use the robust command. The VIFs result after the xtreg robust are really high. However, after going through some discussion in the forum, I found out that with only 29 provinces, I should not use robust command and should not use Hausman test if I used robust command. I also read some advice to perform linktest by hand, but I still don't understand how it works. The reason why I used ln for several variables is because the histograms showed that the distribustions are not normal. I am also not sure if I should add i.year in my FE syntax. So I wonder if there is any advice for my problem.

    Code:
    . egen prov_id = group (province)
    
    . xtset prov_id year
    
    Panel variable: prov_id (strongly balanced)
     Time variable: year, 2010 to 2020
             Delta: 1 unit
    Code:
    . * data summary *
    . xtsum pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed
    
    Variable         |      Mean   Std. dev.       Min        Max |    Observations
    -----------------+--------------------------------------------+----------------
    pop65    overall |  5.286301   1.833573       1.54       10.5 |     N =     319
             between |              1.80598   1.931818   10.42727 |     n =      29
             within  |             .4505427   3.736301   7.326301 |     T =      11
                     |                                            |
    lntfr    overall |  .9595275   .1336687   .6678294   1.302913 |     N =     319
             between |             .1239804   .7653753   1.233511 |     n =      29
             within  |             .0545853   .7779375   1.109783 |     T =      11
                     |                                            |
    lngdpcap overall |  10.30609   .5308984   9.139573   12.07147 |     N =     319
             between |             .5233785   9.312996   11.86483 |     n =      29
             within  |             .1286158   9.908724   10.74314 |     T =      11
                     |                                            |
    lifeexp  overall |  69.42608   2.436952      63.82      74.99 |     N =     319
             between |             2.425443   65.12727       74.6 |     n =      29
             within  |             .4908727   68.05699    70.8479 |     T =      11
                     |                                            |
    lnpopu~n overall |  3.662959   .3840131   2.827314    4.60517 |     N =     319
             between |             .3872141   2.923856    4.60517 |     n =      29
             within  |              .047395   3.434371   3.832794 |     T =      11
                     |                                            |
    meanye~u overall |  8.412665   .8615742       6.53       10.7 |     N =     319
             between |             .7618745   7.222727   10.56364 |     n =      29
             within  |             .4243944   7.262665   9.802665 |     T =      11
                     |                                            |
    fpaide~d overall |  55.74514   6.781141       40.2       80.3 |     N =     319
             between |             6.316996   42.76364   76.41818 |     n =      29
             within  |             2.708189    47.3815    65.3815 |     T =      11
    These are what I have been doing:
    1. I used corr command to see if there's any correlation higher than 0.75. The result suggested no high correlation
    Code:
    corr pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed
    Code:
    . corr pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed
    (obs=319)
    
                 |    pop65    lntfr lngdpcap  lifeexp lnpopu~n meanye~u fpaide~d
    -------------+---------------------------------------------------------------
           pop65 |   1.0000
           lntfr |  -0.5376   1.0000
        lngdpcap |  -0.3408  -0.2165   1.0000
         lifeexp |   0.4290  -0.6272   0.4329   1.0000
      lnpopurban |   0.2166  -0.5700   0.5091   0.6036   1.0000
    meanyearsedu |   0.0143  -0.2484   0.5286   0.3607   0.4317   1.0000
    fpaidemplo~d |   0.2707  -0.1676  -0.0185   0.1182   0.0852  -0.0450   1.0000
    2. I used reg command then vif to check for multicollinearity, all vifs are below 10
    Code:
    reg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed
    vif
    Code:
    . reg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed
    
          Source |       SS           df       MS      Number of obs   =       319
    -------------+----------------------------------   F(6, 312)       =     96.22
           Model |  694.035273         6  115.672546   Prob > F        =    0.0000
        Residual |  375.077162       312  1.20217039   R-squared       =    0.6492
    -------------+----------------------------------   Adj R-squared   =    0.6424
           Total |  1069.11244       318  3.36198879   Root MSE        =    1.0964
    
    -------------------------------------------------------------------------------
            pop65 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    --------------+----------------------------------------------------------------
            lntfr |  -4.596609   .6382869    -7.20   0.000      -5.8525   -3.340718
         lngdpcap |  -2.370571   .1511881   -15.68   0.000    -2.668048   -2.073094
          lifeexp |   .3157796   .0363818     8.68   0.000      .244195    .3873642
       lnpopurban |   .2589153   .2316014     1.12   0.264    -.1967828    .7146134
     meanyearsedu |    .268064   .0867419     3.09   0.002     .0973909    .4387371
    fpaidemployed |   .0414489    .009249     4.48   0.000     .0232507    .0596471
            _cons |   6.690743   2.726136     2.45   0.015     1.326807    12.05468
    -------------------------------------------------------------------------------
    
    . vif
    
        Variable |       VIF       1/VIF  
    -------------+----------------------
      lnpopurban |      2.09    0.477930
         lifeexp |      2.08    0.480925
           lntfr |      1.93    0.519335
        lngdpcap |      1.70    0.586788
    meanyearsedu |      1.48    0.676855
    fpaidemplo~d |      1.04    0.961055
    -------------+----------------------
        Mean VIF |      1.72
    3. I used fixed effect without robust command (Before I checked for heteroskedasticity, I am quite happy with the result since I have 2 significant variables)
    Code:
    xtreg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed, fe
    Code:
    . xtreg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed, fe
    
    Fixed-effects (within) regression               Number of obs     =        319
    Group variable: prov_id                         Number of groups  =         29
    
    R-squared:                                      Obs per group:
         Within  = 0.3317                                         min =         11
         Between = 0.1525                                         avg =       11.0
         Overall = 0.1620                                         max =         11
    
                                                    F(6, 284)         =      23.49
    corr(u_i, Xb) = -0.1498                         Prob > F          =     0.0000
    
    -------------------------------------------------------------------------------
            pop65 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    --------------+----------------------------------------------------------------
            lntfr |  -2.311204   .6861332    -3.37   0.001    -3.661756    -.960652
         lngdpcap |   .3397021   .3815607     0.89   0.374    -.4113438    1.090748
          lifeexp |   .3442395   .1246646     2.76   0.006     .0988558    .5896233
       lnpopurban |  -.3996937   .5352642    -0.75   0.456    -1.453282    .6538948
     meanyearsedu |  -.1877593   .1341237    -1.40   0.163    -.4517621    .0762434
    fpaidemployed |  -.0175182   .0089955    -1.95   0.052    -.0352244    .0001881
            _cons |  -15.87607   7.224003    -2.20   0.029    -30.09545   -1.656684
    --------------+----------------------------------------------------------------
          sigma_u |  1.6829894
          sigma_e |  .38975102
              rho |  .94909934   (fraction of variance due to u_i)
    -------------------------------------------------------------------------------
    F test that all u_i=0: F(28, 284) = 78.04                    Prob > F = 0.0000
    4. I used random effect without robust command
    Code:
    xtreg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed, re
    Code:
     xtreg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed, re
    
    Random-effects GLS regression                   Number of obs     =        319
    Group variable: prov_id                         Number of groups  =         29
    
    R-squared:                                      Obs per group:
         Within  = 0.3127                                         min =         11
         Between = 0.4259                                         avg =       11.0
         Overall = 0.4189                                         max =         11
    
                                                    Wald chi2(6)      =     152.56
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
    
    -------------------------------------------------------------------------------
            pop65 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    --------------+----------------------------------------------------------------
            lntfr |  -3.135798   .6510992    -4.82   0.000    -4.411929   -1.859667
         lngdpcap |   -.649408    .302663    -2.15   0.032    -1.242617   -.0561994
          lifeexp |   .3743336   .0849561     4.41   0.000     .2078228    .5408444
       lnpopurban |  -.0605336   .4189387    -0.14   0.885    -.8816384    .7605712
     meanyearsedu |  -.0495519   .1127146    -0.44   0.660    -.2704684    .1713646
    fpaidemployed |  -.0099397   .0086585    -1.15   0.251    -.0269101    .0070307
            _cons |  -9.807787   5.293423    -1.85   0.064    -20.18271    .5671316
    --------------+----------------------------------------------------------------
          sigma_u |  1.1319697
          sigma_e |  .38975102
              rho |  .89401384   (fraction of variance due to u_i)
    -------------------------------------------------------------------------------
    5. Hausman test (result indicated fixed effect is more appropriate)
    Code:
    quietly xtreg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed, fe
    estimates store fixed
    
    quietly xtreg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed, re
    estimates store random
    
    hausman fixed random, sigmamore
    Code:
    . hausman fixed random, sigmamore
    
                     ---- Coefficients ----
                 |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                 |     fixed        random       Difference       Std. err.
    -------------+----------------------------------------------------------------
           lntfr |   -2.311204    -3.135798        .8245942        .2795573
        lngdpcap |    .3397021     -.649408        .9891101         .252318
         lifeexp |    .3442395     .3743336        -.030094        .0967322
      lnpopurban |   -.3996937    -.0605336       -.3391601         .360623
    meanyearsedu |   -.1877593    -.0495519       -.1382075        .0805032
    fpaidemplo~d |   -.0175182    -.0099397       -.0075785        .0033659
    ------------------------------------------------------------------------------
                              b = Consistent under H0 and Ha; obtained from xtreg.
               B = Inconsistent under Ha, efficient under H0; obtained from xtreg.
    
    Test of H0: Difference in coefficients not systematic
    
        chi2(6) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                =  25.08
    Prob > chi2 = 0.0003
    6. Heteroskedasticity
    Code:
    xtreg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed, fe
    xttest3
    Code:
    . xttest3
    
    Modified Wald test for groupwise heteroskedasticity
    in fixed effect regression model
    
    H0: sigma(i)^2 = sigma^2 for all i
    
    chi2 (29)  =   73302.94
    Prob>chi2 =      0.0000
    7. Autocorrelation
    Code:
    xtserial pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed
    Code:
    . xtserial pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed
    
    Wooldridge test for autocorrelation in panel data
    H0: no first-order autocorrelation
        F(  1,      28) =    638.418
               Prob > F =      0.0000
    8. I used vce corr to diagnose multicollinearity
    Code:
    xtreg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed, fe 
    estat vce, corr
    Code:
    . estat vce, corr
    
    Correlation matrix of coefficients of xtreg model
    
            e(V) |    lntfr  lngdpcap   lifeexp  lnpopu~n  meanye~u  fpaide~d     _cons 
    -------------+---------------------------------------------------------------------
           lntfr |   1.0000                                                             
        lngdpcap |   0.2507    1.0000                                                   
         lifeexp |   0.1683   -0.3903    1.0000                                         
      lnpopurban |  -0.2512   -0.3562   -0.0661    1.0000                               
    meanyearsedu |   0.1599   -0.2257   -0.6059    0.2122    1.0000                     
    fpaidemplo~d |  -0.0733   -0.2164    0.0498    0.3418    0.1803    1.0000           
           _cons |  -0.3810    0.0473   -0.8918   -0.0323    0.6079   -0.1255    1.0000
    9. I also checked the VIF after xtreg
    Code:
    xtreg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed, fe
    vif, uncentered
    Code:
    . vif, uncentered
    
        Variable |       VIF       1/VIF  
    -------------+----------------------
        lngdpcap |    633.03    0.001580
         lifeexp |    586.08    0.001706
      lnpopurban |    192.76    0.005188
    meanyearsedu |    142.67    0.007009
    fpaidemplo~d |     68.55    0.014588
           lntfr |     56.15    0.017811
    -------------+----------------------
        Mean VIF |    279.87
    10. I also used xtoverid command
    Code:
    xtreg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed, fe
    xtreg pop65 lntfr lngdpcap lifeexp lnpopurban meanyearsedu fpaidemployed, re
    xtoverid
    Code:
    xtoverid
    
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re   
    Sargan-Hansen statistic  26.746  Chi-sq(6)    P-value = 0.0002
    Additionally, I also performed Joint F-test and the result indicated that Fixed effect is indeed needed. LM test also indicated that Random effect is needed as well. So I figured I should go with Fixed effect.

    I apologize for the long question. In summary, my problems are heteroskedasticity, autocorrelation, indication of multicollinearity (although I am not sure since the initial corr result doesn't show any high correlation between independent variables), and insignificant variables. I have tried many variables combinations and the results are the same: insignificant variables and heteroskedasticity etc.

    Thank you so much for your time and assistance.

  • #2
    Another information

    After I added robust command. No variable is significant in the FE robust model and only 1 variable is significant in the RE robust model. Both models have high VIFs.

    Comment


    • #3
      use reghdfe. absorb province and year. use clustered errors on province. With 29, you're probably safe. You can boottest the coefficients/stats of interest just to make sure. The typical rule of thumb is 30, at your right at it.

      You don't need to worry about autocorr or hetero if you use clustered errors--it corrects for both.

      I wouldn't worry about VIFS unless you're getting low t's and high R2. The reported ones above are not large, but you should include i.province i.year in that regression.

      I'd include year fixed effects in all models, otherwise your estimates have time embedded in them.

      Comment


      • #4
        Thank you George.

        I tried this syntax. However, the F-test showed that the model is not okay.

        Code:
        reghdfe pop65 lngdpcap lntfr lifeexp meanyearsedu fpaidemployed lnpopurban, absorb(prov_id year) cluster(prov_id)
        Code:
        . reghdfe pop65 lngdpcap lntfr lifeexp meanyearsedu fpaidemployed lnpopurban, absorb(prov_id year) cluster(prov_id)
        (MWFE estimator converged in 2 iterations)
        
        HDFE Linear regression                            Number of obs   =        319
        Absorbing 2 HDFE groups                           F(   6,     28) =       1.74
        Statistics robust to heteroskedasticity           Prob > F        =     0.1475
                                                          R-squared       =     0.9693
                                                          Adj R-squared   =     0.9644
                                                          Within R-sq.    =     0.1925
        Number of clusters (prov_id) =         29         Root MSE        =     0.3461
        
                                        (Std. err. adjusted for 29 clusters in prov_id)
        -------------------------------------------------------------------------------
                      |               Robust
                pop65 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        --------------+----------------------------------------------------------------
             lngdpcap |  -.6483645   .6460922    -1.00   0.324    -1.971824    .6750953
                lntfr |   -2.61743   1.995649    -1.31   0.200    -6.705331    1.470471
              lifeexp |  -.4342002   .3650981    -1.19   0.244     -1.18207    .3136694
         meanyearsedu |  -.8228862   .4402884    -1.87   0.072    -1.724776    .0790038
        fpaidemployed |  -.0333591   .0188987    -1.77   0.088    -.0720714    .0053532
           lnpopurban |  -.1921305   1.341664    -0.14   0.887    -2.940405    2.556144
                _cons |   54.11075   26.29352     2.06   0.049     .2509161    107.9706
        -------------------------------------------------------------------------------
        
        Absorbed degrees of freedom:
        -----------------------------------------------------+
         Absorbed FE | Categories  - Redundant  = Num. Coefs |
        -------------+---------------------------------------|
             prov_id |        29          29           0    *|
                year |        11           1          10     |
        -----------------------------------------------------+
        * = FE nested within cluster; treated as redundant for DoF computation

        Comment


        • #5
          I tried to absorb year only and cluster error on prov_id. The F test result is okay. However, the cons is not significant.
          Code:
          reghdfe pop65 lngdpcap lntfr lifeexp meanyearsedu fpaidemployed lnpopurban, absorb(year) cluster(prov_id)
          Code:
          . reghdfe pop65 lngdpcap lntfr lifeexp meanyearsedu fpaidemployed lnpopurban, absorb(year) cluster(prov_id)
          (MWFE estimator converged in 1 iterations)
          
          HDFE Linear regression                            Number of obs   =        319
          Absorbing 1 HDFE group                            F(   6,     28) =      17.57
          Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                            R-squared       =     0.6568
                                                            Adj R-squared   =     0.6386
                                                            Within R-sq.    =     0.6490
          Number of clusters (prov_id) =         29         Root MSE        =     1.1022
          
                                          (Std. err. adjusted for 29 clusters in prov_id)
          -------------------------------------------------------------------------------
                        |               Robust
                  pop65 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          --------------+----------------------------------------------------------------
               lngdpcap |  -2.417531   .3793451    -6.37   0.000    -3.194585   -1.640478
                  lntfr |  -3.860103    2.29263    -1.68   0.103    -8.556344    .8361372
                lifeexp |   .3183152     .11596     2.75   0.010     .0807819    .5558486
           meanyearsedu |   .1511429   .3255904     0.46   0.646    -.5157989    .8180847
          fpaidemployed |   .0440367   .0288885     1.52   0.139    -.0151388    .1032122
             lnpopurban |   .5323898   .7956578     0.67   0.509    -1.097441    2.162221
                  _cons |   6.129618   8.466332     0.72   0.475    -11.21288    23.47211
          -------------------------------------------------------------------------------
          
          Absorbed degrees of freedom:
          -----------------------------------------------------+
           Absorbed FE | Categories  - Redundant  = Num. Coefs |
          -------------+---------------------------------------|
                  year |        11           0          11     |
          -----------------------------------------------------+
          
          .

          Comment


          • #6
            It’s not appropriate to use the method that gives you the results you want. Most of the tests you’ve done are unnecessary. This problem points to fixed effects estimation with clustered standard errors. But you should also include year fixed effects.

            Comment


            • #7
              Thank you Jeff.
              is this the right syntax?
              Code:
              xtreg pop65 lngdpcap lntfr lifeexp meanyearsedu fpaidemployed i.year, fe cluster(prov_id)
              I discussed this matter with one of my lecturers. He thought I might have multicollinearity problem and suggested to do forward stepwise and transform variables.
              I am a bit lost here. Does this mean I have to run the fixed effect model with year fixed effect and clustered standard errors with each variables individually? Because when I did, none of the variables is significant. So I'm not sure with which variable I have to start building the model.

              Comment


              • #8
                Niara:
                what if you go:
                Code:
                 
                 reghdfe pop65 lngdpcap lntfr lifeexp meanyearsedu fpaidemployed lnpopurban, absorb(prov_id year) cluster(prov_id)
                As far as stepwise regression is concerned, please see Stata | FAQ: Problems with stepwise regression
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9

                  This is the proper form: reghdfe pop65 lngdpcap lntfr lifeexp meanyearsedu fpaidemployed lnpopurban, absorb(prov_id year) cluster(prov_id) If the F is insignificant, then the model doesn't explain pop65 very well. The constant term is irrelevant in FE regression.

                  Comment


                  • #10
                    Niara:
                    shamelessly exploiting George's elegant assist, you may want to take a look at FAQ: Interpreting the intercept in the fixed-effects model | Stata
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Thank you everyone for your input.

                      I have tried the syntax. But unfortunately, the F is insignificant. so I tried to transform everything to log. The F is significant. I suppose I don't have to run RE model and Hausman test then.
                      Do you think it's good if I go with this log log model instead?
                      There is still only 1 variable significant. However, I think I can still discuss the direction of the coefficients instead of focusing on p-value? Nonetheless, according to literatures, these variables should be significant.
                      And with that, is this model already immune from heteroskedasticity and autocorrelation? Should I check for multicollinearity?


                      Code:
                      reghdfe pop65 lntfr lifeexp lngdpcap meanyearsedu fpaidemployed, absorb(prov_id year) cluster(prov_id)
                      Code:
                      . reghdfe pop65 lntfr lifeexp lngdpcap meanyearsedu fpaidemployed, absorb(prov_id year) cluster(prov_id)
                      (MWFE estimator converged in 2 iterations)
                      
                      HDFE Linear regression                            Number of obs   =        319
                      Absorbing 2 HDFE groups                           F(   5,     28) =       1.98
                      Statistics robust to heteroskedasticity           Prob > F        =     0.1120
                                                                        R-squared       =     0.9693
                                                                        Adj R-squared   =     0.9645
                                                                        Within R-sq.    =     0.1921
                      Number of clusters (prov_id) =         29         Root MSE        =     0.3455
                      
                                                      (Std. err. adjusted for 29 clusters in prov_id)
                      -------------------------------------------------------------------------------
                                    |               Robust
                              pop65 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                      --------------+----------------------------------------------------------------
                              lntfr |  -2.669191   2.027476    -1.32   0.199    -6.822288    1.483906
                            lifeexp |  -.4349448   .3626482    -1.20   0.240    -1.177796    .3079063
                           lngdpcap |  -.6968168   .6382243    -1.09   0.284     -2.00416    .6105264
                       meanyearsedu |  -.8141222   .4619964    -1.76   0.089    -1.760479    .1322346
                      fpaidemployed |  -.0322921    .019881    -1.62   0.116    -.0730166    .0084323
                              _cons |    53.8745   26.05559     2.07   0.048     .5020492    107.2469
                      -------------------------------------------------------------------------------
                      
                      Absorbed degrees of freedom:
                      -----------------------------------------------------+
                       Absorbed FE | Categories  - Redundant  = Num. Coefs |
                      -------------+---------------------------------------|
                           prov_id |        29          29           0    *|
                              year |        11           1          10     |
                      -----------------------------------------------------+
                      * = FE nested within cluster; treated as redundant for DoF computation
                      
                      . 
                      end of do-file
                      
                      .

                      Code:
                      reghdfe ln_pop65 lngdpcap ln_tfr ln_lifeexp ln_meanyearsedu ln_fpaidemployed, absorb(prov_id year) cluster(prov_id)
                      Code:
                      . reghdfe ln_pop65 lngdpcap ln_tfr ln_lifeexp ln_meanyearsedu ln_fpaidemployed, absorb(prov_id year) cluster(prov_id)
                      (MWFE estimator converged in 2 iterations)
                      
                      HDFE Linear regression                            Number of obs   =        319
                      Absorbing 2 HDFE groups                           F(   5,     28) =       3.20
                      Statistics robust to heteroskedasticity           Prob > F        =     0.0208
                                                                        R-squared       =     0.9710
                                                                        Adj R-squared   =     0.9665
                                                                        Within R-sq.    =     0.1587
                      Number of clusters (prov_id) =         29         Root MSE        =     0.0632
                      
                                                         (Std. err. adjusted for 29 clusters in prov_id)
                      ----------------------------------------------------------------------------------
                                       |               Robust
                              ln_pop65 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                      -----------------+----------------------------------------------------------------
                              lngdpcap |  -.3108015   .1231624    -2.52   0.018    -.5630883   -.0585147
                                ln_tfr |  -.3775037   .3353599    -1.13   0.270    -1.064457    .3094499
                            ln_lifeexp |  -5.121433   4.587409    -1.12   0.274    -14.51832    4.275449
                       ln_meanyearsedu |  -.4852401   .8291874    -0.59   0.563    -2.183754    1.213273
                      ln_fpaidemployed |  -.2065409   .1978102    -1.04   0.305    -.6117367     .198655
                                 _cons |   28.74542    18.7995     1.53   0.137    -9.763608    67.25445
                      ----------------------------------------------------------------------------------
                      
                      Absorbed degrees of freedom:
                      -----------------------------------------------------+
                       Absorbed FE | Categories  - Redundant  = Num. Coefs |
                      -------------+---------------------------------------|
                           prov_id |        29          29           0    *|
                              year |        11           1          10     |
                      -----------------------------------------------------+
                      * = FE nested within cluster; treated as redundant for DoF computation

                      Comment


                      • #12
                        define these variables for us

                        Comment


                        • #13
                          Niara:
                          you're striving to reach statistical significance, but this is not scientifis: results are what they are.
                          I suspect that you've an overfitting problem that you should deal with (see https://stats.stackexchange.com/ques...t-coefficients).
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            Thank you for your responses.

                            The variables I use:
                            1. Pop65: percentage of population aged 65 and over
                            2. lntfr: ln total fertility rate
                            3. lngdpcap: ln GDP per capita
                            4. meanyearsedu: mean of years education of adult age 20 and over
                            5. fpaidemployment: percentage of women with paid employment

                            I tried to run regdhfe for each variable.

                            Code:
                            . reghdfe pop65 lntfr, absorb(prov_id year) cluster(prov_id)
                            (MWFE estimator converged in 2 iterations)
                            
                            HDFE Linear regression                            Number of obs   =        319
                            Absorbing 2 HDFE groups                           F(   1,     28) =       0.29
                            Statistics robust to heteroskedasticity           Prob > F        =     0.5914
                                                                              R-squared       =     0.9625
                                                                              Adj R-squared   =     0.9573
                                                                              Within R-sq.    =     0.0140
                            Number of clusters (prov_id) =         29         Root MSE        =     0.3790
                            
                                                           (Std. err. adjusted for 29 clusters in prov_id)
                            ------------------------------------------------------------------------------
                                         |               Robust
                                   pop65 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                   lntfr |  -1.209807   2.227722    -0.54   0.591    -5.773088    3.353474
                                   _cons |   6.447144    2.13756     3.02   0.005     2.068551    10.82574
                            ------------------------------------------------------------------------------
                            
                            Absorbed degrees of freedom:
                            -----------------------------------------------------+
                             Absorbed FE | Categories  - Redundant  = Num. Coefs |
                            -------------+---------------------------------------|
                                 prov_id |        29          29           0    *|
                                    year |        11           1          10     |
                            -----------------------------------------------------+
                            * = FE nested within cluster; treated as redundant for DoF computation
                            
                            . reghdfe pop65 lifeexp, absorb(prov_id year) cluster(prov_id)
                            (MWFE estimator converged in 2 iterations)
                            
                            HDFE Linear regression                            Number of obs   =        319
                            Absorbing 2 HDFE groups                           F(   1,     28) =       1.48
                            Statistics robust to heteroskedasticity           Prob > F        =     0.2342
                                                                              R-squared       =     0.9629
                                                                              Adj R-squared   =     0.9577
                                                                              Within R-sq.    =     0.0228
                            Number of clusters (prov_id) =         29         Root MSE        =     0.3773
                            
                                                           (Std. err. adjusted for 29 clusters in prov_id)
                            ------------------------------------------------------------------------------
                                         |               Robust
                                   pop65 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                 lifeexp |  -.3858727   .3173604    -1.22   0.234    -1.035956    .2642105
                                   _cons |   32.07593   22.03309     1.46   0.157     -13.0568    77.20867
                            ------------------------------------------------------------------------------
                            
                            Absorbed degrees of freedom:
                            -----------------------------------------------------+
                             Absorbed FE | Categories  - Redundant  = Num. Coefs |
                            -------------+---------------------------------------|
                                 prov_id |        29          29           0    *|
                                    year |        11           1          10     |
                            -----------------------------------------------------+
                            * = FE nested within cluster; treated as redundant for DoF computation
                            
                            . reghdfe pop65 lngdpcap, absorb(prov_id year) cluster(prov_id)
                            (MWFE estimator converged in 2 iterations)
                            
                            HDFE Linear regression                            Number of obs   =        319
                            Absorbing 2 HDFE groups                           F(   1,     28) =       1.45
                            Statistics robust to heteroskedasticity           Prob > F        =     0.2391
                                                                              R-squared       =     0.9624
                                                                              Adj R-squared   =     0.9572
                                                                              Within R-sq.    =     0.0111
                            Number of clusters (prov_id) =         29         Root MSE        =     0.3795
                            
                                                           (Std. err. adjusted for 29 clusters in prov_id)
                            ------------------------------------------------------------------------------
                                         |               Robust
                                   pop65 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                lngdpcap |   -.644752   .5359797    -1.20   0.239    -1.742657    .4531527
                                   _cons |   11.93117   5.523854     2.16   0.039     .6160695    23.24627
                            ------------------------------------------------------------------------------
                            
                            Absorbed degrees of freedom:
                            -----------------------------------------------------+
                             Absorbed FE | Categories  - Redundant  = Num. Coefs |
                            -------------+---------------------------------------|
                                 prov_id |        29          29           0    *|
                                    year |        11           1          10     |
                            -----------------------------------------------------+
                            * = FE nested within cluster; treated as redundant for DoF computation
                            
                            . reghdfe pop65 meanyearsedu, absorb(prov_id year) cluster(prov_id)
                            (MWFE estimator converged in 2 iterations)
                            
                            HDFE Linear regression                            Number of obs   =        319
                            Absorbing 2 HDFE groups                           F(   1,     28) =       2.13
                            Statistics robust to heteroskedasticity           Prob > F        =     0.1554
                                                                              R-squared       =     0.9647
                                                                              Adj R-squared   =     0.9598
                                                                              Within R-sq.    =     0.0719
                            Number of clusters (prov_id) =         29         Root MSE        =     0.3677
                            
                                                           (Std. err. adjusted for 29 clusters in prov_id)
                            ------------------------------------------------------------------------------
                                         |               Robust
                                   pop65 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                            meanyearsedu |  -.6262353   .4289504    -1.46   0.155      -1.5049    .2524298
                                   _cons |   10.55461   3.608616     2.92   0.007     3.162694    17.94652
                            ------------------------------------------------------------------------------
                            
                            Absorbed degrees of freedom:
                            -----------------------------------------------------+
                             Absorbed FE | Categories  - Redundant  = Num. Coefs |
                            -------------+---------------------------------------|
                                 prov_id |        29          29           0    *|
                                    year |        11           1          10     |
                            -----------------------------------------------------+
                            * = FE nested within cluster; treated as redundant for DoF computation
                            
                            . reghdfe pop65 fpaidemployed, absorb(prov_id year) cluster(prov_id)
                            (MWFE estimator converged in 2 iterations)
                            
                            HDFE Linear regression                            Number of obs   =        319
                            Absorbing 2 HDFE groups                           F(   1,     28) =       0.61
                            Statistics robust to heteroskedasticity           Prob > F        =     0.4426
                                                                              R-squared       =     0.9626
                                                                              Adj R-squared   =     0.9574
                                                                              Within R-sq.    =     0.0159
                            Number of clusters (prov_id) =         29         Root MSE        =     0.3786
                            
                                                            (Std. err. adjusted for 29 clusters in prov_id)
                            -------------------------------------------------------------------------------
                                          |               Robust
                                    pop65 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                            --------------+----------------------------------------------------------------
                            fpaidemployed |  -.0170178   .0218494    -0.78   0.443    -.0617742    .0277386
                                    _cons |    6.23496   1.217997     5.12   0.000     3.740007    8.729913
                            -------------------------------------------------------------------------------
                            
                            Absorbed degrees of freedom:
                            -----------------------------------------------------+
                             Absorbed FE | Categories  - Redundant  = Num. Coefs |
                            -------------+---------------------------------------|
                                 prov_id |        29          29           0    *|
                                    year |        11           1          10     |
                            -----------------------------------------------------+
                            * = FE nested within cluster; treated as redundant for DoF computation

                            I also run simple linear regression for each variable. Based on these results, the only variable insignificant is meanyearsedu
                            Code:
                            . regress pop65 lntfr
                            
                                  Source |       SS           df       MS      Number of obs   =       319
                            -------------+----------------------------------   F(1, 317)       =    128.88
                                   Model |  309.014605         1  309.014605   Prob > F        =    0.0000
                                Residual |   760.09783       317  2.39778495   R-squared       =    0.2890
                            -------------+----------------------------------   Adj R-squared   =    0.2868
                                   Total |  1069.11244       318  3.36198879   Root MSE        =    1.5485
                            
                            ------------------------------------------------------------------------------
                                   pop65 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                   lntfr |   -7.37473   .6496235   -11.35   0.000    -8.652849   -6.096612
                                   _cons |   12.36256   .6293321    19.64   0.000     11.12436    13.60075
                            ------------------------------------------------------------------------------
                            
                            . regress pop65 lifeexp
                            
                                  Source |       SS           df       MS      Number of obs   =       319
                            -------------+----------------------------------   F(1, 317)       =     71.51
                                   Model |  196.775225         1  196.775225   Prob > F        =    0.0000
                                Residual |   872.33721       317   2.7518524   R-squared       =    0.1841
                            -------------+----------------------------------   Adj R-squared   =    0.1815
                                   Total |  1069.11244       318  3.36198879   Root MSE        =    1.6589
                            
                            ------------------------------------------------------------------------------
                                   pop65 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                 lifeexp |   .3227934   .0381726     8.46   0.000     .2476897    .3978971
                                   _cons |  -17.12398   2.651803    -6.46   0.000    -22.34134   -11.90662
                            ------------------------------------------------------------------------------
                            
                            . regress pop65 lngdpcap
                            
                                  Source |       SS           df       MS      Number of obs   =       319
                            -------------+----------------------------------   F(1, 317)       =     41.66
                                   Model |   124.17711         1   124.17711   Prob > F        =    0.0000
                                Residual |  944.935325       317  2.98086853   R-squared       =    0.1161
                            -------------+----------------------------------   Adj R-squared   =    0.1134
                                   Total |  1069.11244       318  3.36198879   Root MSE        =    1.7265
                            
                            ------------------------------------------------------------------------------
                                   pop65 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                lngdpcap |  -1.177052   .1823671    -6.45   0.000    -1.535855   -.8182496
                                   _cons |   17.41711   1.881975     9.25   0.000     13.71437    21.11985
                            ------------------------------------------------------------------------------
                            
                            . regress pop65 meanyearsedu
                            
                                  Source |       SS           df       MS      Number of obs   =       319
                            -------------+----------------------------------   F(1, 317)       =      0.06
                                   Model |  .217240191         1  .217240191   Prob > F        =    0.7998
                                Residual |  1068.89519       317  3.37190913   R-squared       =    0.0002
                            -------------+----------------------------------   Adj R-squared   =   -0.0030
                                   Total |  1069.11244       318  3.36198879   Root MSE        =    1.8363
                            
                            ------------------------------------------------------------------------------
                                   pop65 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                            meanyearsedu |   .0303364   .1195175     0.25   0.800    -.2048115    .2654842
                                   _cons |   5.031091   1.010704     4.98   0.000     3.042556    7.019626
                            ------------------------------------------------------------------------------
                            
                            . regress pop65 fpaidemployed
                            
                                  Source |       SS           df       MS      Number of obs   =       319
                            -------------+----------------------------------   F(1, 317)       =     25.07
                                   Model |  78.3430653         1  78.3430653   Prob > F        =    0.0000
                                Residual |   990.76937       317  3.12545543   R-squared       =    0.0733
                            -------------+----------------------------------   Adj R-squared   =    0.0704
                                   Total |  1069.11244       318  3.36198879   Root MSE        =    1.7679
                            
                            -------------------------------------------------------------------------------
                                    pop65 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                            --------------+----------------------------------------------------------------
                            fpaidemployed |   .0731954   .0146198     5.01   0.000     .0444314    .1019595
                                    _cons |   1.206011   .8209698     1.47   0.143    -.4092267    2.821249
                            -------------------------------------------------------------------------------

                            Comment


                            • #15
                              Niara:
                              running separate regression for each predictor is not helpful, as you're not adjusting for the other independent variables that contribute to the data generating process.
                              And, if I may, please free yourself from significance-addiction!
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X