Correlation vs causation

Rose Simmons

Join Date: Feb 2017
Posts: 114

Correlation vs causation

17 Apr 2017, 11:54

Hi,

I have an unbalanced panel dataset (N=2976, T=13), using survey responses.
My dependent variable is the household's ability to save (saving=1 if able to save, 0 otherwise), and I intend to use -xtprobit, re- to run my model.
hhid is the Household's unique identifier, and the data is yearly.

Code:

. xtset hhid year
       panel variable:  hhid (unbalanced)
        time variable:  year, 2004 to 2016, but with gaps
                delta:  1 unit

. xtdes

    hhid:  6, 21, ..., 89972                                 n =       2976
    year:  2004, 2005, ..., 2016                             T =         13
           Delta(year) = 1 unit
           Span(year)  = 13 periods
           (hhid*year uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       1       1         3         7      13      13

I run my regression as follows:

Code:

. xtprobit saving $xlist $controllist i.year, re vce(cluster hhid) nolog

Calculating robust standard errors:

Random-effects probit regression                Number of obs     =      5,248
Group variable: hhid                            Number of groups  =      1,721

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        3.0
                                                              max =         13

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(32)     =    1015.46
Log pseudolikelihood  = -2326.9353              Prob > chi2       =     0.0000

                               (Std. Err. adjusted for 1,721 clusters in hhid)
------------------------------------------------------------------------------
             |               Robust
      saving |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        prec |   .0051596    .010072     0.51   0.608    -.0145812    .0249004
    purchase |  -.0063193   .0096012    -0.66   0.510    -.0251374    .0124987
      retire |   .0077005   .0077755     0.99   0.322    -.0075392    .0229401
     bequest |   .0040619   .0062252     0.65   0.514    -.0081392     .016263
     mediumh |   .2634191   .0520474     5.06   0.000     .1614082    .3654301
       longh |   .2016656   .1390572     1.45   0.147    -.0708814    .4742127
        male |   .2665245   .0934311     2.85   0.004     .0834029    .4496461
         age |  -.0152692   .0155381    -0.98   0.326    -.0457233    .0151849
             |
 c.age#c.age |   .0001046   .0001458     0.72   0.473    -.0001812    .0003903
             |
    employed |   -.015734   .1085609    -0.14   0.885    -.2285093    .1970414
     retired |   .0340827   .1105945     0.31   0.758    -.1826785    .2508438
      health |   .0917022   .0439204     2.09   0.037     .0056198    .1777845
      income |   4.70e-06   1.37e-06     3.44   0.001     2.03e-06    7.38e-06
        risk |  -.0016178    .004678    -0.35   0.729    -.0107865    .0075508
 selfcontrol |   .2846024    .022067    12.90   0.000     .2413519    .3278529
       child |  -.1379202   .0371526    -3.71   0.000    -.2107378   -.0651025
  saving1exp |   1.636882   .0604239    27.09   0.000     1.518453    1.755311
     partner |  -.1470189   .0840967    -1.75   0.080    -.3118455    .0178077
         uni |   .1797381   .0828612     2.17   0.030     .0173332    .3421431
       owner |   .1961271   .0746949     2.63   0.009     .0497278    .3425264
             |
        year |
       2005  |  -.9836225   .1001202    -9.82   0.000    -1.179854   -.7873906
       2006  |  -1.076696   .1112256    -9.68   0.000    -1.294694   -.8586978
       2007  |   -1.03664   .1086714    -9.54   0.000    -1.249632   -.8236479
       2008  |  -.9681669   .1064297    -9.10   0.000    -1.176765   -.7595685
       2009  |  -.8765504   .1051705    -8.33   0.000    -1.082681   -.6704199
       2010  |  -1.093482   .1064268   -10.27   0.000    -1.302075   -.8848896
       2011  |  -1.023239   .1335747    -7.66   0.000     -1.28504    -.761437
       2012  |  -.9060006    .131999    -6.86   0.000    -1.164714   -.6472873
       2013  |  -1.018888   .1422776    -7.16   0.000    -1.297747   -.7400292
       2014  |  -1.023546   .1235291    -8.29   0.000    -1.265658   -.7814332
       2015  |  -.9400109   .1363781    -6.89   0.000    -1.207307   -.6727149
       2016  |  -1.099619   .1318772    -8.34   0.000    -1.358094   -.8411444
             |
       _cons |  -2.061224   .4942043    -4.17   0.000    -3.029846   -1.092601
-------------+----------------------------------------------------------------
    /lnsig2u |  -.8696584   .1389346                     -1.141965   -.5973516
-------------+----------------------------------------------------------------
     sigma_u |   .6473752   .0449714                        .56497    .7417999
         rho |   .2953254   .0289135                      .2419597    .3549498
------------------------------------------------------------------------------

I then compute average marginal effects (AMEs):

Code:

. sum income

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      income |      7,458    32699.04    32851.71          0    1370179


. margins, dydx(income)

Average marginal effects                        Number of obs     =      5,248
Model VCE    : Robust

Expression   : Pr(saving=1), predict(pr)
dy/dx w.r.t. : income

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |   1.09e-06   3.28e-07     3.32   0.001     4.45e-07    1.73e-06
------------------------------------------------------------------------------

. di 1.09*exp(-6)
.00270184

Question 1: Please could you advise me if you notice anything incorrect in the AME calculation?
Question 2:For income, how might I interpret the AME? Would it be that a 1 unit increase in income is associated with an increase in the probability of saving by 0.27 percentage points?
Question 3: Is there a way to establish whether this association is merely an association (i.e. correlation), or whether it may be a direct causation? For example I have heard of Granger causality tests in time series and wondered if I may apply a similar concept to panel data, or if you would be able to recommend any tests please?

Thanks in advance

Tags: None

Tiago Pereira

Join Date: Jan 2016

Posts: 415
#2

17 Apr 2017, 12:07

I will go directly to Question 3, which I judge is the most important here. To sum up, there is no single statistical test or model that can confirm whether or not an association means a direct cause-effect relationship. Traditionally, a strong association (high effect size + tiny p-value) augments our confidence that there might be a cause-effect, but rarely this type of inference is useful in practice. Search for Bradford Hill criteria.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30193
#3

17 Apr 2017, 14:08

I'll address your interpretation of the -margins- output.

The first thing is the correct reading of the numbers in the output themselves. 1.09e-06 means 1.09 X 10^-6. It is the computer-ese version of scientific notation. It is not 1.09*exp(-6) (which is 1.09*e^-6, a much larger number, because e is approximately 2.718.) So a better interpretation here would be that a unit increase in income is associated, on average, with an increase of 0.00000109 in the probability of saving, or, in percentage points, 0.000109.

That is a very, very small number. But if your income variable is measured in ordinary currency units such as dollars, euros, pounds, yuan, yen, etc., it isn't surprising. Earning one more dollar would have only a microscopic impact on my ability to save. So I don't think a marginal effect denominated in those units is very meaningful. You might want to consider rescaling the income variable. I think it would be more sensible to talk about the impact of an additional $1,000 or $10,000 (pick a comparable scaling factor for other currencies) on probability of saving. Then re-do the regression with the new variable -gen income_scaled = income/1000- (or 10,000, or maybe even larger numbers for a currency like the yen) and re-run margins.

I will spare you my rant about why I don't think average marginal effects in logit/probit models are particularly useful--but that is a matter of taste anyway.
Comment

Rose Simmons

Join Date: Feb 2017
Posts: 114

17 Apr 2017, 14:49

Tiago Pereira thanks, I will look into the Bradford Hill criteria to justify why my results may be more due to causation rather than correlation

Clyde Schechter
Thank you for the correction of 1.09e-06
The income variable is the net household income over the past year, measured in euros

Apologies I did not spot this sooner, but I have just thought to create a variable for log(income):

Code:

. gen lnincome=ln(income)
(5,803 missing values generated)

-income-

-lnincome-

The first image above is income, and the second image is ln(income).
Question 4: Looking at the histograms, would it be better for me to include lnincome rather than income in my regression? Or would you still recommend income_scaled=income/1000?

Code:

. xtprobit saving $xlist $controllist i.year, re vce(cluster hhid) nolog

Calculating robust standard errors:

Random-effects probit regression                Number of obs     =      5,229
Group variable: hhid                            Number of groups  =      1,717

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        3.0
                                                              max =         13

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(32)     =    1006.73
Log pseudolikelihood  = -2323.1832              Prob > chi2       =     0.0000

                               (Std. Err. adjusted for 1,717 clusters in hhid)
------------------------------------------------------------------------------
             |               Robust
      saving |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        prec |   .0045298   .0100804     0.45   0.653    -.0152275     .024287
    purchase |  -.0067768   .0096368    -0.70   0.482    -.0256645    .0121109
      retire |   .0076771   .0077868     0.99   0.324    -.0075847    .0229389
     bequest |   .0045411   .0062097     0.73   0.465    -.0076297    .0167118
     mediumh |   .2642363   .0519589     5.09   0.000     .1623987    .3660739
       longh |   .2290869   .1391222     1.65   0.100    -.0435877    .5017614
        male |   .2459332   .0932037     2.64   0.008     .0632573    .4286092
         age |  -.0140615   .0155576    -0.90   0.366    -.0445538    .0164308
             |
 c.age#c.age |   .0000969   .0001461     0.66   0.507    -.0001894    .0003832
             |
    employed |   -.022424   .1083645    -0.21   0.836    -.2348146    .1899665
     retired |   .0291244   .1105525     0.26   0.792    -.1875545    .2458033
      health |   .0880439   .0438645     2.01   0.045     .0020711    .1740167
    lnincome |   .1062808    .034314     3.10   0.002     .0390266     .173535
        risk |  -.0013808   .0046762    -0.30   0.768     -.010546    .0077843
 selfcontrol |   .2842433    .021958    12.94   0.000     .2412065    .3272802
       child |  -.1314346   .0370083    -3.55   0.000    -.2039696   -.0588996
   savingexp |   1.643237   .0604975    27.16   0.000     1.524664     1.76181
     partner |  -.1280734    .084093    -1.52   0.128    -.2928926    .0367458
         uni |   .1979592   .0827689     2.39   0.017     .0357352    .3601831
       owner |   .1975978   .0748395     2.64   0.008     .0509151    .3442805
             |
        year |
       2005  |  -.9878203   .1001985    -9.86   0.000    -1.184206   -.7914348
       2006  |  -1.075814   .1112789    -9.67   0.000    -1.293917   -.8577115
       2007  |  -1.033597   .1085819    -9.52   0.000    -1.246414   -.8207804
       2008  |   -.954156   .1066032    -8.95   0.000    -1.163094   -.7452177
       2009  |  -.8569364   .1050763    -8.16   0.000    -1.062882   -.6509907
       2010  |   -1.06336   .1061656   -10.02   0.000    -1.271441    -.855279
       2011  |  -1.007659   .1338136    -7.53   0.000    -1.269929   -.7453892
       2012  |   -.894026   .1311794    -6.82   0.000    -1.151133   -.6369191
       2013  |  -1.002616   .1419595    -7.06   0.000    -1.280851   -.7243801
       2014  |  -1.001264   .1236705    -8.10   0.000    -1.243654   -.7588746
       2015  |  -.9257081   .1360592    -6.80   0.000    -1.192379   -.6590369
       2016  |  -1.079159   .1316689    -8.20   0.000    -1.337225   -.8210926
             |
       _cons |  -3.016636   .5839036    -5.17   0.000    -4.161066   -1.872206
-------------+----------------------------------------------------------------
    /lnsig2u |  -.8763844   .1397156                     -1.150222   -.6025468
-------------+----------------------------------------------------------------
     sigma_u |   .6452018   .0450724                      .5626424    .7398755
         rho |   .2939276   .0289958                      .2404485    .3537612
------------------------------------------------------------------------------

I now run margins again:

Code:

. sum lnincome

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    lnincome |      7,414    10.12951    .8965137          0   14.13045

. margins, dydx(lnincome)

Average marginal effects                        Number of obs     =      5,229
Model VCE    : Robust

Expression   : Pr(saving=1), predict(pr)
dy/dx w.r.t. : lnincome

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    lnincome |   .0246547   .0079288     3.11   0.002     .0091145     .040195
------------------------------------------------------------------------------

Question 5: If I should use lnincome instead of income, would the AME above be interpreted as:
A 1% increase in income is associated with an increase in the probability of saving by 0.025% points
A 10% increase in income is associated with an increase in the probability of saving by 0.0025% points.

Many thanks

Last edited by Rose Simmons; 17 Apr 2017, 14:52. Reason: Graphs added for clarity

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30193
#5

17 Apr 2017, 15:05

The choice between income and log income as a predictor is not a matter of the distribution of the variable. It is a matter of which model better fits the data. In fitting a probit model to savings as a function of log income you are saying that constant multiples of income are associated with the same difference in invnormal(probability of savings). By contrast, if you use income directly, you are saying that constant additive increments of income are associated with the same difference in invnormal(probability of savings). I have no idea which of those is true (or closer to true)--this is way out of my area. But this may be something that has been well studied in the economic literature and you might be able to find an answer there, or by consulting a colleague with expertise in this area. If there is no prior information about this issue, then you should look at which model produces a better fit to your data.

As for question 5, if lnincome has a marginal effect of 0.025 (approx.), then a 1% increase in income would be associated, on average, with an increase in predicted probability of .025 (or, equivalently, 2.5 percentage points.) This is, by the way, an approximation based on the fact that ln(1.01) is very close to 0.01. It is an approximation that begins to fail when you try to apply the same rule of thumb to a 10% increase in income--ln(1.10) = .095, so that it would be an increase in predicted probability of about 23.75 (= 2.5*9.5) percentage points, not 25. Actually, it's worse than that. Probit is a non linear model, and the marginal effect changes as the value of income (or ln(income)) changes. So the rule of thumb interpretation of marginal effects really breaks down very badly when you get beyond very small changes like 1 or 2%, because the marginal effect is likely to be substantially different when ln(income) increases by 10% from any given baseline value.
Comment

Rose Simmons

Join Date: Feb 2017
Posts: 114

17 Apr 2017, 15:37

I have not come across the use of lnincome in the economic literature, so perhaps it is best to stick to income.
Also, I have conducted a Wald test which I think indicates income should be used instead of lnincome:

Code:

. xtprobit saving $xlist employed retired health income lnincome risk selfcontrol child savingexp
> partner uni owner male c.age c.age#c.age  i.year, re vce(cluster hhid) nolog

Calculating robust standard errors:

Random-effects probit regression                Number of obs     =      5,229
Group variable: hhid                            Number of groups  =      1,717

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        3.0
                                                              max =         13

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(33)     =    1012.83
Log pseudolikelihood  = -2318.0444              Prob > chi2       =     0.0000

                               (Std. Err. adjusted for 1,717 clusters in hhid)
------------------------------------------------------------------------------
             |               Robust
      saving |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        prec |    .004333   .0100784     0.43   0.667    -.0154204    .0240863
    purchase |  -.0066828   .0096283    -0.69   0.488    -.0255539    .0121884
      retire |   .0083274   .0077809     1.07   0.285    -.0069228    .0235777
     bequest |   .0047083   .0062142     0.76   0.449    -.0074714     .016888
     mediumh |   .2650664   .0520052     5.10   0.000      .163138    .3669948
       longh |   .2237275   .1395066     1.60   0.109    -.0497004    .4971553
    employed |  -.0241512   .1083568    -0.22   0.824    -.2365266    .1882242
     retired |   .0320783   .1106649     0.29   0.772    -.1848209    .2489775
      health |   .0908535   .0439819     2.07   0.039     .0046506    .1770564
      income |   4.01e-06   1.76e-06     2.28   0.023     5.61e-07    7.47e-06
    lnincome |   .0312861   .0449681     0.70   0.487    -.0568497     .119422
        risk |  -.0019036   .0046757    -0.41   0.684    -.0110677    .0072606
 selfcontrol |   .2862064   .0221227    12.94   0.000     .2428468    .3295661
       child |  -.1343374   .0371685    -3.61   0.000    -.2071864   -.0614884
   savingexp |   1.637051   .0604185    27.10   0.000     1.518633    1.755469
     partner |  -.1466265   .0839361    -1.75   0.081    -.3111383    .0178852
         uni |   .1773929   .0827722     2.14   0.032     .0151623    .3396235
       owner |   .1907355   .0747272     2.55   0.011     .0442729    .3371981
        male |   .2550749   .0932809     2.73   0.006     .0722476    .4379022
         age |  -.0149071   .0155204    -0.96   0.337    -.0453265    .0155123
             |
 c.age#c.age |   .0001021   .0001457     0.70   0.483    -.0001834    .0003876
             |
        year |
       2005  |  -.9855769   .1001328    -9.84   0.000    -1.181833   -.7893203
       2006  |  -1.075177   .1111691    -9.67   0.000    -1.293064   -.8572895
       2007  |   -1.03513   .1085451    -9.54   0.000    -1.247875   -.8223859
       2008  |  -.9656881    .106539    -9.06   0.000    -1.174501   -.7568755
       2009  |  -.8685176    .105183    -8.26   0.000    -1.074673   -.6623627
       2010  |  -1.097256   .1062568   -10.33   0.000    -1.305515   -.8889961
       2011  |  -1.023308   .1335108    -7.66   0.000    -1.284985    -.761632
       2012  |  -.9060072   .1318514    -6.87   0.000    -1.164431   -.6475833
       2013  |  -1.015178   .1426392    -7.12   0.000    -1.294746   -.7356104
       2014  |   -1.01549   .1236541    -8.21   0.000    -1.257848   -.7731325
       2015  |  -.9385416   .1362736    -6.89   0.000    -1.205633   -.6714503
       2016  |  -1.097808   .1319412    -8.32   0.000    -1.356408    -.839208
             |
       _cons |  -2.351527   .6392282    -3.68   0.000    -3.604392   -1.098663
-------------+----------------------------------------------------------------
    /lnsig2u |  -.8843267   .1401559                     -1.159027   -.6096262
-------------+----------------------------------------------------------------
     sigma_u |   .6426446   .0450352                      .5601708    .7372611
         rho |    .292282   .0289917                      .2388441    .3521445
------------------------------------------------------------------------------

. test income lnincome

 ( 1)  [saving]income = 0
 ( 2)  [saving]lnincome = 0

           chi2(  2) =   15.18
         Prob > chi2 =    0.0005

. test income

 ( 1)  [saving]income = 0

           chi2(  1) =    5.19
         Prob > chi2 =    0.0227

. test lnincome

 ( 1)  [saving]lnincome = 0

           chi2(  1) =    0.48
         Prob > chi2 =    0.4866

Thank you for the clear numerical explanation, I see that the AME interpretations become more unstable as we go above the small % increases in income.

Returning to your suggestion in #3, I have generated incomescaled:

Code:

. gen incomescaled = income/1000
(5,759 missing values generated)

. xtprobit saving $xlist employed retired health incomescaled risk selfcontrol child savingexp par
> tner uni owner male c.age c.age#c.age  i.year, re vce(cluster hhid) nolog

Calculating robust standard errors:

Random-effects probit regression                Number of obs     =      5,248
Group variable: hhid                            Number of groups  =      1,721

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        3.0
                                                              max =         13

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(32)     =    1015.46
Log pseudolikelihood  = -2326.9353              Prob > chi2       =     0.0000

                               (Std. Err. adjusted for 1,721 clusters in hhid)
------------------------------------------------------------------------------
             |               Robust
      saving |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        prec |   .0051596    .010072     0.51   0.608    -.0145812    .0249004
    purchase |  -.0063193   .0096012    -0.66   0.510    -.0251374    .0124987
      retire |   .0077005   .0077755     0.99   0.322    -.0075392    .0229401
     bequest |   .0040619   .0062252     0.65   0.514    -.0081392     .016263
     mediumh |   .2634191   .0520474     5.06   0.000     .1614082    .3654301
       longh |   .2016656   .1390572     1.45   0.147    -.0708814    .4742127
    employed |   -.015734   .1085609    -0.14   0.885    -.2285093    .1970414
     retired |   .0340827   .1105945     0.31   0.758    -.1826785    .2508438
      health |   .0917022   .0439204     2.09   0.037     .0056198    .1777845
incomescaled |   .0047015   .0013652     3.44   0.001     .0020257    .0073772
        risk |  -.0016178    .004678    -0.35   0.729    -.0107865    .0075508
 selfcontrol |   .2846024    .022067    12.90   0.000     .2413519    .3278529
       child |  -.1379202   .0371526    -3.71   0.000    -.2107378   -.0651025
   savingexp |   1.636882   .0604239    27.09   0.000     1.518453    1.755311
     partner |  -.1470189   .0840967    -1.75   0.080    -.3118455    .0178077
         uni |   .1797381   .0828612     2.17   0.030     .0173332    .3421431
       owner |   .1961271   .0746949     2.63   0.009     .0497278    .3425264
        male |   .2665245   .0934311     2.85   0.004     .0834029    .4496461
         age |  -.0152692   .0155381    -0.98   0.326    -.0457233    .0151849
             |
 c.age#c.age |   .0001046   .0001458     0.72   0.473    -.0001812    .0003903
             |
        year |
       2005  |  -.9836225   .1001202    -9.82   0.000    -1.179854   -.7873906
       2006  |  -1.076696   .1112256    -9.68   0.000    -1.294694   -.8586978
       2007  |   -1.03664   .1086714    -9.54   0.000    -1.249632   -.8236479
       2008  |  -.9681669   .1064297    -9.10   0.000    -1.176765   -.7595685
       2009  |  -.8765504   .1051705    -8.33   0.000    -1.082681   -.6704199
       2010  |  -1.093482   .1064268   -10.27   0.000    -1.302075   -.8848896
       2011  |  -1.023239   .1335747    -7.66   0.000     -1.28504    -.761437
       2012  |  -.9060006    .131999    -6.86   0.000    -1.164714   -.6472873
       2013  |  -1.018888   .1422776    -7.16   0.000    -1.297747   -.7400292
       2014  |  -1.023546   .1235291    -8.29   0.000    -1.265658   -.7814332
       2015  |  -.9400109   .1363781    -6.89   0.000    -1.207307   -.6727149
       2016  |  -1.099619   .1318772    -8.34   0.000    -1.358094   -.8411444
             |
       _cons |  -2.061224   .4942043    -4.17   0.000    -3.029846   -1.092601
-------------+----------------------------------------------------------------
    /lnsig2u |  -.8696584   .1389346                     -1.141965   -.5973516
-------------+----------------------------------------------------------------
     sigma_u |   .6473752   .0449714                        .56497    .7417999
         rho |   .2953254   .0289135                      .2419597    .3549498
------------------------------------------------------------------------------

. sum incomescaled

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
incomescaled |      7,458    32.69904    32.85171          0   1370.179

. margins, dydx(incomescaled)

Average marginal effects                        Number of obs     =      5,248
Model VCE    : Robust

Expression   : Pr(saving=1), predict(pr)
dy/dx w.r.t. : incomescaled

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
incomescaled |   .0010873   .0003144     3.46   0.001      .000471    .0017036
------------------------------------------------------------------------------

Would the interpretation be that a 1000 euro unit increase in income is associated with an increase in the probability of saving by 0.11 percentage points?

Thanks

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30193
#7

17 Apr 2017, 16:00

Would the interpretation be that a 1000 euro unit increase in income is associated with an increase in the probability of saving by 0.11 percentage points?

Yes.
Comment
Rose Simmons

Join Date: Feb 2017

Posts: 114
#8

17 Apr 2017, 17:21

Thank you for your help
Comment
Rose Simmons

Join Date: Feb 2017

Posts: 114
#9

18 Apr 2017, 03:00

To test model fit and decide statistically whether I should have included -income- or -lnincome- in my model, I conducted a Wald test in #6.
The results are as follows:

Code:

. xtprobit saving $xlist employed retired health income lnincome risk selfcontrol child savi > ngexp partner uni owner male c.age c.age#c.age i.year, re vce(cluster hhid) nolog . test income lnincome ( 1) [saving]income = 0 ( 2) [saving]lnincome = 0 chi2( 2) = 15.18 Prob > chi2 = 0.0005 . . test income ( 1) [saving]income = 0 chi2( 1) = 5.19 Prob > chi2 = 0.0227 . . test lnincome ( 1) [saving]lnincome = 0 chi2( 1) = 0.48 Prob > chi2 = 0.4866

Question 1: Please could you let me know if I conducted the test of model fit correctly?

Question 2: How should I interpret the three test outcomes. Would it be:

-test income lnincome-
p-value=0.0005
Inclusion of both -income- and -lnincome- improves model fit. So next step is to test -income- and -lnincome- separately.

-test income-
p-value=0.0227
Inclusion of -income- alone is significant and improves model fit.

-test lnincome-
p-value=0.4866
Inclusion of -lnincome- alone is insignificant and does not improve model fit.

As -income- alone improves model fit, there is no need to include -lnincome-?

Thanks
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17749
#10

18 Apr 2017, 03:04

Rose:
at its face-value, I find hard to justify the reason for including a predictor and its logged version.

Kind regards,
Carlo
(Stata 19.0)
Comment
Rose Simmons

Join Date: Feb 2017

Posts: 114
#11

18 Apr 2017, 03:14

Thank you for your reply Carlo Lazzaro

I agree that I should either use -income- or -lnincome- and not both. To decide which one of these to use, would the Wald test in #9 suggest that I should use -income- and not -lnincome-, because the p-value for -test income- is significant?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17749
#12

18 Apr 2017, 03:55

Rose:
I would not sponsor the p-value approach.
The main goal of a regression model is giving an acceptable proxy of data generating process.
As an aside, including -income- or -lnincome- brings about a different way of interpreting how this predictor contribute to explain variation in the dependen variable when adjusted for the other independent variables. What does the literature in your rersearch field suggest?
Eventually, why coding:

Code:

c.age c.age#c.age

when you can type it more efficiently:

Code:

c.age##c.age

?

Kind regards,
Carlo
(Stata 19.0)
Comment
Rose Simmons

Join Date: Feb 2017

Posts: 114
#13

18 Apr 2017, 04:06

Ah ok so I should not use the p-value approach - why so? Is it because the inclusion of variables should not be dictated merely by a statistical test?

I had been wondering whether I should conduct a statistical test to test for functional form.
This would determine if ln(income) would be better, because in #4 when I plotted histograms, I thought ln(income) was better as it is not skewed and follows more of a normal distribution.
On the other hand, given that the literature uses income, not ln(income), perhaps I should stick to income.

Code:

c.age##c.age

Thank you for the suggested coding, I will use that from now on - much more efficient!

Last edited by Rose Simmons; 18 Apr 2017, 04:09.
Comment

Rose Simmons

Join Date: Feb 2017
Posts: 114

#14

22 Apr 2017, 05:21

Originally posted by Clyde Schechter View Post

"Would the interpretation be that a 1000 euro unit increase in income is associated with an increase in the probability of saving by 0.11 percentage points?"
Yes.

Clyde Schechter would the interpretation of AMEs in #6 be similar for discrete variables? For example:

Code:

. margins, dydx(male) 

Average marginal effects                        Number of obs     =     12,951
Model VCE    : OIM

Expression   : Pr(saving=1), predict(pr)
dy/dx w.r.t. : male

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        male |   .1167403   .0166912     6.99   0.000      .084026    .1494545
------------------------------------------------------------------------------

The figure above is made-up, but say the AME for male was 0.1167403, would this be interpreted as:
For males, compared to females, the probability of saving is 11.67 percentage points higher?

Many thanks

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30193
#15

22 Apr 2017, 10:57

The figure above is made-up, but say the AME for male was 0.1167403, would this be interpreted as:
For males, compared to females, the probability of saving is 11.67 percentage points higher?

Yes.
1 like
Comment

Announcement

Correlation vs causation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment