Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why variables are insignificant

    I am trying to identify the factors that most affects 'Access to electricity' using 24 countries over a two year span. Even though this is a short panel data set the article I am replicating used a similar approach.
    My results are;
    Code:
     xtreg accesstoelectricityofpopulatione loans renew gdp rents edu var24, re
    
    Random-effects GLS regression                   Number of obs      =        48
    Group variable: country                         Number of groups   =        24
    
    R-sq:  within  = 0.7520                         Obs per group: min =         2
           between = 0.0001                                        avg =       2.0
           overall = 0.0011                                        max =         2
    
                                                    Wald chi2(6)       =     58.62
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000
    
    ------------------------------------------------------------------------------
    accesstoel~e |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           loans |   .0287223   .0651532     0.44   0.659    -.0989756    .1564202
           renew |  -.0032136   .1823182    -0.02   0.986    -.3605507    .3541234
             gdp |  -.0009196   .0005703    -1.61   0.107    -.0020374    .0001982
           rents |  -.0590546   .0385032    -1.53   0.125    -.1345195    .0164103
             edu |  -.0051925   .0076635    -0.68   0.498    -.0202127    .0098277
           var24 |   2.309331   .3686467     6.26   0.000     1.586796    3.031865
           _cons |   72.37429    5.85379    12.36   0.000     60.90107     83.8475
    -------------+----------------------------------------------------------------
         sigma_u |  28.521862
         sigma_e |  1.1818464
             rho |  .99828596   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    My question is not only the variables are insignificant the expected coefficients are signs are generating. Foe example GDP per capita (gdp in model) must be positively correlated with the dependent variable. However it is not true with my case.
    Can someone please suggest me any solution?
    data goes like this,
    time country access loans renew gdp rents edu
    1 1 41 3.9 0 1629 2.37 45
    2 1 43 4.3 0 1933 1.75 48
    1 2 52.2 66 0 2401 4.5 53
    2 2 59.6 85 0 2763 3.8 60

  • #2
    Hello Drau,

    Welcome to the Stata Forum. At first, it seems you have too many predictors for such a small sample size. Also, var24 (not explained by you in #1) seems to convey most of the effect from the model. Being "adjusted" for the remaining predictors, the sign of the non-significant coefficients, IMHO, may not be a core issue under certain situations. By the way, the coefficient for gdp is almost zero, as the remaining ones, with the exception of var24. Finally, the much high ICC caught my attention.

    Best,

    Marcos
    Last edited by Marcos Almeida; 28 May 2016, 07:15.
    Best regards,

    Marcos

    Comment


    • #3
      Hello Marcos,

      Thank you very much for the comment.
      var24=dummytime variable.
      Well, but my issue is how the authors of the replicated paper obtained very significant results.
      The paper I am referring is http://www.sciencedirect.com/science...60544216301682 They too have lesser observations but many variables.
      Please help me with this.

      Thank you.

      Comment


      • #4
        Drau:
        echoing Marcos' wise remarks, your results are not hat surprising given the handful of observations in your data set.
        Anyway, I would posit some comments:
        -gdp- explain no interesting variation of the -depvar- when adjusted for the other predictors, as (possibly for the reason mentioned above) it does not reach statistical signicance; the same remark would hold true even if the coefficient for -gdp- were positive but not significant;
        - your -re- specification should be carefully revised, given your very low R-sq within;
        - residuals seems to play a too relevant role in your model;
        -I've read the working paper of the article you quoted (available free of charge at http://dse.univr.it/workingpapers/wp2014n15.pdf) and I was surprised by the huge number of predictors given the limited sample size.
        Moreover, given the high R2 (page 35), I suspect that their model may suffer from overfitting.
        It seems also worhtnoting that the authors report that results from a (pooled?) OLS were not so different from the ones obtained via panel data regression (page 36); were the individual effects negligible (i.e. F-test at the foot of the outcome table after -xtreg, fe- non-significant?).

        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Given that there's so little within-country variation from only two years of data, there's just a ton of noise in predicting access to electricity that you are not explaining. Noise blows up standard errors. If you want descriptive statistics on the correlations between access to electricity and other factors, gather more years of data. You should also use logged gdp or logged gdp per capita; taking logs helps reduce the effects of outliers. I'd also recommend to avoid replicating papers that are pretty weak like the one you cited; it's a lot harder, but also more fruitful to try and replicate ones in solid field journals at least!

          Comment


          • #6
            Thank you everyone for your valuable comments.
            I chose this paper to replicate because it has some great results. this was published in 'Energy' journal.
            http://ac.els-cdn.com/S0360544216301...72734bf5720f09


            Well I tried different models to see whta the issue could be. They are as follows,
            Pooled OLS. Results are,
            Code:
            reg accesstoelectricityofpopulatione loans renew gdp ruralpopulationoftotalpopulation rents edu dummytime
            
                  Source |       SS       df       MS              Number of obs =      48
            -------------+------------------------------           F(  7,    40) =    3.41
                   Model |  12292.3596     7  1756.05138           Prob > F      =  0.0060
                Residual |  20613.4948    40  515.337371           R-squared     =  0.3736
            -------------+------------------------------           Adj R-squared =  0.2639
                   Total |  32905.8545    47  700.124563           Root MSE      =  22.701
            
            --------------------------------------------------------------------------------------------------
            accesstoelectricityofpopulatione |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            ---------------------------------+----------------------------------------------------------------
                                       loans |  -.2215006   .4381079    -0.51   0.616     -1.10695    .6639486
                                       renew |   .5811815   .5952781     0.98   0.335    -.6219203    1.784283
                                         gdp |  -.0038491   .0055828    -0.69   0.495    -.0151323    .0074342
            ruralpopulationoftotalpopulation |  -.9185718   .2081521    -4.41   0.000    -1.339263   -.4978806
                                       rents |  -.1312647   .2819288    -0.47   0.644    -.7010641    .4385347
                                         edu |   .0651776   .1034794     0.63   0.532    -.1439621    .2743173
                                   dummytime |   .3721357   6.686289     0.06   0.956    -13.14136    13.88563
                                       _cons |   126.7854   17.32538     7.32   0.000     91.76954    161.8013
            --------------------------------------------------------------------------------------------------
            When I ran between regression it is,
            Code:
            Between regression (regression on group means)  Number of obs      =        48
            Group variable: country                         Number of groups   =        24
            
            R-sq:  within  = 0.0386                         Obs per group: min =         2
                   between = 0.4119                                        avg =       2.0
                   overall = 0.3947                                        max =         2
            
                                                            F(6,17)            =      1.98
            sd(u_i + avg(e_i.))=   23.8208                  Prob > F           =    0.1246
            
            --------------------------------------------------------------------------------------------------
            accesstoelectricityofpopulatione |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            ---------------------------------+----------------------------------------------------------------
                                       loans |  -.4677628   .7692754    -0.61   0.551    -2.090792    1.155266
                                       renew |   .3680803   .9854063     0.37   0.713    -1.710945    2.447106
                                        lgdp |  -5.617209    5.41582    -1.04   0.314    -17.04359    5.809173
            ruralpopulationoftotalpopulation |  -.9802427    .324824    -3.02   0.008    -1.665561   -.2949238
                                       rents |   .0561992   .5190417     0.11   0.915    -1.038883    1.151281
                                         edu |   .0753235   .2274486     0.33   0.745    -.4045512    .5551981
                                   dummytime |          0  (omitted)
                                       _cons |   146.6539   37.23083     3.94   0.001     68.10367     225.204
            --------------------------------------------------------------------------------------------------
            Finally the suggested model by the paper, that is random effects with robust standard errors results are,
            Code:
             xtreg accesstoelectricityofpopulatione loans renew lgdp ruralpopulationoftotalpopulation rents edu dummytime , re vce (robust)
            
            Random-effects GLS regression                   Number of obs      =        48
            Group variable: country                         Number of groups   =        24
            
            R-sq:  within  = 0.6744                         Obs per group: min =         2
                   between = 0.3148                                        avg =       2.0
                   overall = 0.3148                                        max =         2
            
                                                            Wald chi2(7)       =     46.86
            corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000
            
                                                               (Std. Err. adjusted for 24 clusters in country)
            --------------------------------------------------------------------------------------------------
                                             |               Robust
            accesstoelectricityofpopulatione |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            ---------------------------------+----------------------------------------------------------------
                                       loans |  -.0110571   .0949275    -0.12   0.907    -.1971116    .1749973
                                       renew |  -.1077398   .2495429    -0.43   0.666    -.5968348    .3813553
                                        lgdp |  -.3808155   .4217631    -0.90   0.367    -1.207456    .4458249
            ruralpopulationoftotalpopulation |  -.5285755   .2495516    -2.12   0.034    -1.017688   -.0394634
                                       rents |   -.034656   .0285442    -1.21   0.225    -.0906016    .0212896
                                         edu |  -.0053836   .0082997    -0.65   0.517    -.0216507    .0108834
                                   dummytime |   1.929858   .4812587     4.01   0.000     .9866082    2.873108
                                       _cons |   106.4823   14.47746     7.36   0.000     78.10695    134.8576
            ---------------------------------+----------------------------------------------------------------
                                     sigma_u |  23.805255
                                     sigma_e |  1.2167894
                                         rho |  .99739413   (fraction of variance due to u_i)
            --------------------------------------------------------------------------------------------------
            my questions are,
            1. Why some of the variables that were significant and positive in the paper seems negative and insignificant in my model? [ex: "edu", "renew"]
            2. Why most variables are insignificant that are not in the paper? [ex: "edu", "rents" "renew"]

            Please help me.
            Thank you.

            Comment


            • #7
              Hi,
              Correlation matrix is as follows,
              Code:
              correlate loans renew gdp ruralpopulationoftotalpopulation rents edu
              (obs=48)
              
                           |    loans    renew      gdp ruralp~n    rents      edu
              -------------+------------------------------------------------------
                     loans |   1.0000
                     renew |  -0.1633   1.0000
                       gdp |  -0.1132  -0.1107   1.0000
              ruralpopul~n |  -0.3237   0.0203   0.0609   1.0000
                     rents |   0.1605   0.4301  -0.1405  -0.1554   1.0000
                       edu |  -0.0786   0.1859  -0.1416  -0.0886  -0.0518   1.0000

              Comment


              • #8
                Can someone please help me with this?

                Thank you

                Comment


                • #9
                  Drau:
                  most of the contents of previous reply stil hold.
                  You're seemingly hunting for the "best fitting" model, as approach that should be discouraged.
                  You are still worring about immaterial issues such as the sign of non-significant coefficients, which are simply telling you that your data do not support the evidence of a ststistical significant effect in explaining the variation in the -depvar- when adjusted for the remaining predictors.
                  As usual, the absence of evidence is not the evidence of absence (my favourite reference on this topic follows: http://www.bmj.com/content/311/7003/485): it may well be that an effect exists, but your scarce handful of data simply can't allow you to detect it.
                  The limited sample size plagues indeed your analysis altogether.
                  Last but not least, as Christos wisely said, I would address my attention to a more methodologically sound paper: it is not gold all that glitters.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Hi Carlo,
                    Thank you for your reply.
                    Yes, now it is high time I use a more wide data set for this matter and just get rid of this paper.
                    Thank you so much for the valuable comments throughout the discussion.

                    Cheers,
                    Drau

                    Comment

                    Working...
                    X