Why variables are insignificant

Drau Nee

Join Date: May 2016
Posts: 8

Why variables are insignificant

28 May 2016, 06:43

I am trying to identify the factors that most affects 'Access to electricity' using 24 countries over a two year span. Even though this is a short panel data set the article I am replicating used a similar approach.
My results are;

Code:

 xtreg accesstoelectricityofpopulatione loans renew gdp rents edu var24, re

Random-effects GLS regression                   Number of obs      =        48
Group variable: country                         Number of groups   =        24

R-sq:  within  = 0.7520                         Obs per group: min =         2
       between = 0.0001                                        avg =       2.0
       overall = 0.0011                                        max =         2

                                                Wald chi2(6)       =     58.62
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
accesstoel~e |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       loans |   .0287223   .0651532     0.44   0.659    -.0989756    .1564202
       renew |  -.0032136   .1823182    -0.02   0.986    -.3605507    .3541234
         gdp |  -.0009196   .0005703    -1.61   0.107    -.0020374    .0001982
       rents |  -.0590546   .0385032    -1.53   0.125    -.1345195    .0164103
         edu |  -.0051925   .0076635    -0.68   0.498    -.0202127    .0098277
       var24 |   2.309331   .3686467     6.26   0.000     1.586796    3.031865
       _cons |   72.37429    5.85379    12.36   0.000     60.90107     83.8475
-------------+----------------------------------------------------------------
     sigma_u |  28.521862
     sigma_e |  1.1818464
         rho |  .99828596   (fraction of variance due to u_i)
------------------------------------------------------------------------------

My question is not only the variables are insignificant the expected coefficients are signs are generating. Foe example GDP per capita (gdp in model) must be positively correlated with the dependent variable. However it is not true with my case.
Can someone please suggest me any solution?
data goes like this,

time	country	access	loans	renew	gdp	rents	edu
1	1	41	3.9	0	1629	2.37	45
2	1	43	4.3	0	1933	1.75	48
1	2	52.2	66	0	2401	4.5	53
2	2	59.6	85	0	2763	3.8	60

Tags: None

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

28 May 2016, 07:12

Hello Drau,

Welcome to the Stata Forum. At first, it seems you have too many predictors for such a small sample size. Also, var24 (not explained by you in #1) seems to convey most of the effect from the model. Being "adjusted" for the remaining predictors, the sign of the non-significant coefficients, IMHO, may not be a core issue under certain situations. By the way, the coefficient for gdp is almost zero, as the remaining ones, with the exception of var24. Finally, the much high ICC caught my attention.

Best,

Marcos

Last edited by Marcos Almeida; 28 May 2016, 07:15.

Best regards,

Marcos
Comment
Drau Nee

Join Date: May 2016

Posts: 8
#3

28 May 2016, 07:41

Hello Marcos,

Thank you very much for the comment.
var24=dummytime variable.
Well, but my issue is how the authors of the replicated paper obtained very significant results.
The paper I am referring is http://www.sciencedirect.com/science...60544216301682 They too have lesser observations but many variables.
Please help me with this.

Thank you.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#4

28 May 2016, 08:57

Drau:
echoing Marcos' wise remarks, your results are not hat surprising given the handful of observations in your data set.
Anyway, I would posit some comments:
-gdp- explain no interesting variation of the -depvar- when adjusted for the other predictors, as (possibly for the reason mentioned above) it does not reach statistical signicance; the same remark would hold true even if the coefficient for -gdp- were positive but not significant;
- your -re- specification should be carefully revised, given your very low R-sq within;
- residuals seems to play a too relevant role in your model;
-I've read the working paper of the article you quoted (available free of charge at http://dse.univr.it/workingpapers/wp2014n15.pdf) and I was surprised by the huge number of predictors given the limited sample size.
Moreover, given the high R2 (page 35), I suspect that their model may suffer from overfitting.
It seems also worhtnoting that the authors report that results from a (pooled?) OLS were not so different from the ones obtained via panel data regression (page 36); were the individual effects negligible (i.e. F-test at the foot of the outcome table after -xtreg, fe- non-significant?).

Kind regards,
Carlo
(Stata 19.0)
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#5

28 May 2016, 14:22

Given that there's so little within-country variation from only two years of data, there's just a ton of noise in predicting access to electricity that you are not explaining. Noise blows up standard errors. If you want descriptive statistics on the correlations between access to electricity and other factors, gather more years of data. You should also use logged gdp or logged gdp per capita; taking logs helps reduce the effects of outliers. I'd also recommend to avoid replicating papers that are pretty weak like the one you cited; it's a lot harder, but also more fruitful to try and replicate ones in solid field journals at least!
1 like
Comment

Drau Nee

Join Date: May 2016
Posts: 8

28 May 2016, 19:03

Thank you everyone for your valuable comments.
I chose this paper to replicate because it has some great results. this was published in 'Energy' journal.
http://ac.els-cdn.com/S0360544216301...72734bf5720f09

Well I tried different models to see whta the issue could be. They are as follows,
Pooled OLS. Results are,

Code:

reg accesstoelectricityofpopulatione loans renew gdp ruralpopulationoftotalpopulation rents edu dummytime

      Source |       SS       df       MS              Number of obs =      48
-------------+------------------------------           F(  7,    40) =    3.41
       Model |  12292.3596     7  1756.05138           Prob > F      =  0.0060
    Residual |  20613.4948    40  515.337371           R-squared     =  0.3736
-------------+------------------------------           Adj R-squared =  0.2639
       Total |  32905.8545    47  700.124563           Root MSE      =  22.701

--------------------------------------------------------------------------------------------------
accesstoelectricityofpopulatione |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------------------+----------------------------------------------------------------
                           loans |  -.2215006   .4381079    -0.51   0.616     -1.10695    .6639486
                           renew |   .5811815   .5952781     0.98   0.335    -.6219203    1.784283
                             gdp |  -.0038491   .0055828    -0.69   0.495    -.0151323    .0074342
ruralpopulationoftotalpopulation |  -.9185718   .2081521    -4.41   0.000    -1.339263   -.4978806
                           rents |  -.1312647   .2819288    -0.47   0.644    -.7010641    .4385347
                             edu |   .0651776   .1034794     0.63   0.532    -.1439621    .2743173
                       dummytime |   .3721357   6.686289     0.06   0.956    -13.14136    13.88563
                           _cons |   126.7854   17.32538     7.32   0.000     91.76954    161.8013
--------------------------------------------------------------------------------------------------

When I ran between regression it is,

Code:

Between regression (regression on group means)  Number of obs      =        48
Group variable: country                         Number of groups   =        24

R-sq:  within  = 0.0386                         Obs per group: min =         2
       between = 0.4119                                        avg =       2.0
       overall = 0.3947                                        max =         2

                                                F(6,17)            =      1.98
sd(u_i + avg(e_i.))=   23.8208                  Prob > F           =    0.1246

--------------------------------------------------------------------------------------------------
accesstoelectricityofpopulatione |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------------------+----------------------------------------------------------------
                           loans |  -.4677628   .7692754    -0.61   0.551    -2.090792    1.155266
                           renew |   .3680803   .9854063     0.37   0.713    -1.710945    2.447106
                            lgdp |  -5.617209    5.41582    -1.04   0.314    -17.04359    5.809173
ruralpopulationoftotalpopulation |  -.9802427    .324824    -3.02   0.008    -1.665561   -.2949238
                           rents |   .0561992   .5190417     0.11   0.915    -1.038883    1.151281
                             edu |   .0753235   .2274486     0.33   0.745    -.4045512    .5551981
                       dummytime |          0  (omitted)
                           _cons |   146.6539   37.23083     3.94   0.001     68.10367     225.204
--------------------------------------------------------------------------------------------------

Finally the suggested model by the paper, that is random effects with robust standard errors results are,

Code:

 xtreg accesstoelectricityofpopulatione loans renew lgdp ruralpopulationoftotalpopulation rents edu dummytime , re vce (robust)

Random-effects GLS regression                   Number of obs      =        48
Group variable: country                         Number of groups   =        24

R-sq:  within  = 0.6744                         Obs per group: min =         2
       between = 0.3148                                        avg =       2.0
       overall = 0.3148                                        max =         2

                                                Wald chi2(7)       =     46.86
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

                                                   (Std. Err. adjusted for 24 clusters in country)
--------------------------------------------------------------------------------------------------
                                 |               Robust
accesstoelectricityofpopulatione |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------------------+----------------------------------------------------------------
                           loans |  -.0110571   .0949275    -0.12   0.907    -.1971116    .1749973
                           renew |  -.1077398   .2495429    -0.43   0.666    -.5968348    .3813553
                            lgdp |  -.3808155   .4217631    -0.90   0.367    -1.207456    .4458249
ruralpopulationoftotalpopulation |  -.5285755   .2495516    -2.12   0.034    -1.017688   -.0394634
                           rents |   -.034656   .0285442    -1.21   0.225    -.0906016    .0212896
                             edu |  -.0053836   .0082997    -0.65   0.517    -.0216507    .0108834
                       dummytime |   1.929858   .4812587     4.01   0.000     .9866082    2.873108
                           _cons |   106.4823   14.47746     7.36   0.000     78.10695    134.8576
---------------------------------+----------------------------------------------------------------
                         sigma_u |  23.805255
                         sigma_e |  1.2167894
                             rho |  .99739413   (fraction of variance due to u_i)
--------------------------------------------------------------------------------------------------

my questions are,
1. Why some of the variables that were significant and positive in the paper seems negative and insignificant in my model? [ex: "edu", "renew"]
2. Why most variables are insignificant that are not in the paper? [ex: "edu", "rents" "renew"]

Please help me.
Thank you.

Comment

Drau Nee

Join Date: May 2016
Posts: 8

28 May 2016, 19:19

Hi,
Correlation matrix is as follows,

Code:

correlate loans renew gdp ruralpopulationoftotalpopulation rents edu
(obs=48)

             |    loans    renew      gdp ruralp~n    rents      edu
-------------+------------------------------------------------------
       loans |   1.0000
       renew |  -0.1633   1.0000
         gdp |  -0.1132  -0.1107   1.0000
ruralpopul~n |  -0.3237   0.0203   0.0609   1.0000
       rents |   0.1605   0.4301  -0.1405  -0.1554   1.0000
         edu |  -0.0786   0.1859  -0.1416  -0.0886  -0.0518   1.0000

Comment

Drau Nee

Join Date: May 2016

Posts: 8
#8

28 May 2016, 22:56

Can someone please help me with this?

Thank you
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#9

29 May 2016, 01:59

Drau:
most of the contents of previous reply stil hold.
You're seemingly hunting for the "best fitting" model, as approach that should be discouraged.
You are still worring about immaterial issues such as the sign of non-significant coefficients, which are simply telling you that your data do not support the evidence of a ststistical significant effect in explaining the variation in the -depvar- when adjusted for the remaining predictors.
As usual, the absence of evidence is not the evidence of absence (my favourite reference on this topic follows: http://www.bmj.com/content/311/7003/485): it may well be that an effect exists, but your scarce handful of data simply can't allow you to detect it.
The limited sample size plagues indeed your analysis altogether.
Last but not least, as Christos wisely said, I would address my attention to a more methodologically sound paper: it is not gold all that glitters.

Kind regards,
Carlo
(Stata 19.0)
Comment
Drau Nee

Join Date: May 2016

Posts: 8
#10

29 May 2016, 05:54

Hi Carlo,
Thank you for your reply.
Yes, now it is high time I use a more wide data set for this matter and just get rid of this paper.
Thank you so much for the valuable comments throughout the discussion.

Cheers,
Drau
Comment

Announcement

Why variables are insignificant

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment