Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlation vs causation

    Hi,

    I have an unbalanced panel dataset (N=2976, T=13), using survey responses.
    My dependent variable is the household's ability to save (saving=1 if able to save, 0 otherwise), and I intend to use -xtprobit, re- to run my model.
    hhid is the Household's unique identifier, and the data is yearly.

    Code:
    . xtset hhid year
           panel variable:  hhid (unbalanced)
            time variable:  year, 2004 to 2016, but with gaps
                    delta:  1 unit
    
    . xtdes
    
        hhid:  6, 21, ..., 89972                                 n =       2976
        year:  2004, 2005, ..., 2016                             T =         13
               Delta(year) = 1 unit
               Span(year)  = 13 periods
               (hhid*year uniquely identifies each observation)
    
    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                             1       1       1         3         7      13      13
    I run my regression as follows:
    Code:
    . xtprobit saving $xlist $controllist i.year, re vce(cluster hhid) nolog
    
    Calculating robust standard errors:
    
    Random-effects probit regression                Number of obs     =      5,248
    Group variable: hhid                            Number of groups  =      1,721
    
    Random effects u_i ~ Gaussian                   Obs per group:
                                                                  min =          1
                                                                  avg =        3.0
                                                                  max =         13
    
    Integration method: mvaghermite                 Integration pts.  =         12
    
                                                    Wald chi2(32)     =    1015.46
    Log pseudolikelihood  = -2326.9353              Prob > chi2       =     0.0000
    
                                   (Std. Err. adjusted for 1,721 clusters in hhid)
    ------------------------------------------------------------------------------
                 |               Robust
          saving |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            prec |   .0051596    .010072     0.51   0.608    -.0145812    .0249004
        purchase |  -.0063193   .0096012    -0.66   0.510    -.0251374    .0124987
          retire |   .0077005   .0077755     0.99   0.322    -.0075392    .0229401
         bequest |   .0040619   .0062252     0.65   0.514    -.0081392     .016263
         mediumh |   .2634191   .0520474     5.06   0.000     .1614082    .3654301
           longh |   .2016656   .1390572     1.45   0.147    -.0708814    .4742127
            male |   .2665245   .0934311     2.85   0.004     .0834029    .4496461
             age |  -.0152692   .0155381    -0.98   0.326    -.0457233    .0151849
                 |
     c.age#c.age |   .0001046   .0001458     0.72   0.473    -.0001812    .0003903
                 |
        employed |   -.015734   .1085609    -0.14   0.885    -.2285093    .1970414
         retired |   .0340827   .1105945     0.31   0.758    -.1826785    .2508438
          health |   .0917022   .0439204     2.09   0.037     .0056198    .1777845
          income |   4.70e-06   1.37e-06     3.44   0.001     2.03e-06    7.38e-06
            risk |  -.0016178    .004678    -0.35   0.729    -.0107865    .0075508
     selfcontrol |   .2846024    .022067    12.90   0.000     .2413519    .3278529
           child |  -.1379202   .0371526    -3.71   0.000    -.2107378   -.0651025
      saving1exp |   1.636882   .0604239    27.09   0.000     1.518453    1.755311
         partner |  -.1470189   .0840967    -1.75   0.080    -.3118455    .0178077
             uni |   .1797381   .0828612     2.17   0.030     .0173332    .3421431
           owner |   .1961271   .0746949     2.63   0.009     .0497278    .3425264
                 |
            year |
           2005  |  -.9836225   .1001202    -9.82   0.000    -1.179854   -.7873906
           2006  |  -1.076696   .1112256    -9.68   0.000    -1.294694   -.8586978
           2007  |   -1.03664   .1086714    -9.54   0.000    -1.249632   -.8236479
           2008  |  -.9681669   .1064297    -9.10   0.000    -1.176765   -.7595685
           2009  |  -.8765504   .1051705    -8.33   0.000    -1.082681   -.6704199
           2010  |  -1.093482   .1064268   -10.27   0.000    -1.302075   -.8848896
           2011  |  -1.023239   .1335747    -7.66   0.000     -1.28504    -.761437
           2012  |  -.9060006    .131999    -6.86   0.000    -1.164714   -.6472873
           2013  |  -1.018888   .1422776    -7.16   0.000    -1.297747   -.7400292
           2014  |  -1.023546   .1235291    -8.29   0.000    -1.265658   -.7814332
           2015  |  -.9400109   .1363781    -6.89   0.000    -1.207307   -.6727149
           2016  |  -1.099619   .1318772    -8.34   0.000    -1.358094   -.8411444
                 |
           _cons |  -2.061224   .4942043    -4.17   0.000    -3.029846   -1.092601
    -------------+----------------------------------------------------------------
        /lnsig2u |  -.8696584   .1389346                     -1.141965   -.5973516
    -------------+----------------------------------------------------------------
         sigma_u |   .6473752   .0449714                        .56497    .7417999
             rho |   .2953254   .0289135                      .2419597    .3549498
    ------------------------------------------------------------------------------
    I then compute average marginal effects (AMEs):

    Code:
    . sum income
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
          income |      7,458    32699.04    32851.71          0    1370179
    
    
    . margins, dydx(income)
    
    Average marginal effects                        Number of obs     =      5,248
    Model VCE    : Robust
    
    Expression   : Pr(saving=1), predict(pr)
    dy/dx w.r.t. : income
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          income |   1.09e-06   3.28e-07     3.32   0.001     4.45e-07    1.73e-06
    ------------------------------------------------------------------------------
    
    . di 1.09*exp(-6)
    .00270184
    Question 1: Please could you advise me if you notice anything incorrect in the AME calculation?
    Question 2:For income, how might I interpret the AME? Would it be that a 1 unit increase in income is associated with an increase in the probability of saving by 0.27 percentage points?
    Question 3: Is there a way to establish whether this association is merely an association (i.e. correlation), or whether it may be a direct causation? For example I have heard of Granger causality tests in time series and wondered if I may apply a similar concept to panel data, or if you would be able to recommend any tests please?

    Thanks in advance

  • #2
    I will go directly to Question 3, which I judge is the most important here. To sum up, there is no single statistical test or model that can confirm whether or not an association means a direct cause-effect relationship. Traditionally, a strong association (high effect size + tiny p-value) augments our confidence that there might be a cause-effect, but rarely this type of inference is useful in practice. Search for Bradford Hill criteria.

    Comment


    • #3
      I'll address your interpretation of the -margins- output.

      The first thing is the correct reading of the numbers in the output themselves. 1.09e-06 means 1.09 X 10-6. It is the computer-ese version of scientific notation. It is not 1.09*exp(-6) (which is 1.09*e-6, a much larger number, because e is approximately 2.718.) So a better interpretation here would be that a unit increase in income is associated, on average, with an increase of 0.00000109 in the probability of saving, or, in percentage points, 0.000109.

      That is a very, very small number. But if your income variable is measured in ordinary currency units such as dollars, euros, pounds, yuan, yen, etc., it isn't surprising. Earning one more dollar would have only a microscopic impact on my ability to save. So I don't think a marginal effect denominated in those units is very meaningful. You might want to consider rescaling the income variable. I think it would be more sensible to talk about the impact of an additional $1,000 or $10,000 (pick a comparable scaling factor for other currencies) on probability of saving. Then re-do the regression with the new variable -gen income_scaled = income/1000- (or 10,000, or maybe even larger numbers for a currency like the yen) and re-run margins.

      I will spare you my rant about why I don't think average marginal effects in logit/probit models are particularly useful--but that is a matter of taste anyway.

      Comment


      • #4
        Tiago Pereira thanks, I will look into the Bradford Hill criteria to justify why my results may be more due to causation rather than correlation

        Clyde Schechter
        Thank you for the correction of 1.09e-06
        The income variable is the net household income over the past year, measured in euros

        Apologies I did not spot this sooner, but I have just thought to create a variable for log(income):

        Code:
        . gen lnincome=ln(income)
        (5,803 missing values generated)
        -income-
        Click image for larger version

Name:	Graph income.png
Views:	1
Size:	7.6 KB
ID:	1384108





        -lnincome-
        Click image for larger version

Name:	Graph lnincome.png
Views:	1
Size:	9.2 KB
ID:	1384109






        The first image above is income, and the second image is ln(income).
        Question 4: Looking at the histograms, would it be better for me to include lnincome rather than income in my regression? Or would you still recommend income_scaled=income/1000?

        Code:
        . xtprobit saving $xlist $controllist i.year, re vce(cluster hhid) nolog
        
        Calculating robust standard errors:
        
        Random-effects probit regression                Number of obs     =      5,229
        Group variable: hhid                            Number of groups  =      1,717
        
        Random effects u_i ~ Gaussian                   Obs per group:
                                                                      min =          1
                                                                      avg =        3.0
                                                                      max =         13
        
        Integration method: mvaghermite                 Integration pts.  =         12
        
                                                        Wald chi2(32)     =    1006.73
        Log pseudolikelihood  = -2323.1832              Prob > chi2       =     0.0000
        
                                       (Std. Err. adjusted for 1,717 clusters in hhid)
        ------------------------------------------------------------------------------
                     |               Robust
              saving |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                prec |   .0045298   .0100804     0.45   0.653    -.0152275     .024287
            purchase |  -.0067768   .0096368    -0.70   0.482    -.0256645    .0121109
              retire |   .0076771   .0077868     0.99   0.324    -.0075847    .0229389
             bequest |   .0045411   .0062097     0.73   0.465    -.0076297    .0167118
             mediumh |   .2642363   .0519589     5.09   0.000     .1623987    .3660739
               longh |   .2290869   .1391222     1.65   0.100    -.0435877    .5017614
                male |   .2459332   .0932037     2.64   0.008     .0632573    .4286092
                 age |  -.0140615   .0155576    -0.90   0.366    -.0445538    .0164308
                     |
         c.age#c.age |   .0000969   .0001461     0.66   0.507    -.0001894    .0003832
                     |
            employed |   -.022424   .1083645    -0.21   0.836    -.2348146    .1899665
             retired |   .0291244   .1105525     0.26   0.792    -.1875545    .2458033
              health |   .0880439   .0438645     2.01   0.045     .0020711    .1740167
            lnincome |   .1062808    .034314     3.10   0.002     .0390266     .173535
                risk |  -.0013808   .0046762    -0.30   0.768     -.010546    .0077843
         selfcontrol |   .2842433    .021958    12.94   0.000     .2412065    .3272802
               child |  -.1314346   .0370083    -3.55   0.000    -.2039696   -.0588996
           savingexp |   1.643237   .0604975    27.16   0.000     1.524664     1.76181
             partner |  -.1280734    .084093    -1.52   0.128    -.2928926    .0367458
                 uni |   .1979592   .0827689     2.39   0.017     .0357352    .3601831
               owner |   .1975978   .0748395     2.64   0.008     .0509151    .3442805
                     |
                year |
               2005  |  -.9878203   .1001985    -9.86   0.000    -1.184206   -.7914348
               2006  |  -1.075814   .1112789    -9.67   0.000    -1.293917   -.8577115
               2007  |  -1.033597   .1085819    -9.52   0.000    -1.246414   -.8207804
               2008  |   -.954156   .1066032    -8.95   0.000    -1.163094   -.7452177
               2009  |  -.8569364   .1050763    -8.16   0.000    -1.062882   -.6509907
               2010  |   -1.06336   .1061656   -10.02   0.000    -1.271441    -.855279
               2011  |  -1.007659   .1338136    -7.53   0.000    -1.269929   -.7453892
               2012  |   -.894026   .1311794    -6.82   0.000    -1.151133   -.6369191
               2013  |  -1.002616   .1419595    -7.06   0.000    -1.280851   -.7243801
               2014  |  -1.001264   .1236705    -8.10   0.000    -1.243654   -.7588746
               2015  |  -.9257081   .1360592    -6.80   0.000    -1.192379   -.6590369
               2016  |  -1.079159   .1316689    -8.20   0.000    -1.337225   -.8210926
                     |
               _cons |  -3.016636   .5839036    -5.17   0.000    -4.161066   -1.872206
        -------------+----------------------------------------------------------------
            /lnsig2u |  -.8763844   .1397156                     -1.150222   -.6025468
        -------------+----------------------------------------------------------------
             sigma_u |   .6452018   .0450724                      .5626424    .7398755
                 rho |   .2939276   .0289958                      .2404485    .3537612
        ------------------------------------------------------------------------------
        I now run margins again:

        Code:
        . sum lnincome
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
            lnincome |      7,414    10.12951    .8965137          0   14.13045
        
        . margins, dydx(lnincome)
        
        Average marginal effects                        Number of obs     =      5,229
        Model VCE    : Robust
        
        Expression   : Pr(saving=1), predict(pr)
        dy/dx w.r.t. : lnincome
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
            lnincome |   .0246547   .0079288     3.11   0.002     .0091145     .040195
        ------------------------------------------------------------------------------
        Question 5: If I should use lnincome instead of income, would the AME above be interpreted as:
        A 1% increase in income is associated with an increase in the probability of saving by 0.025% points
        A 10% increase in income is associated with an increase in the probability of saving by 0.0025% points.

        Many thanks
        Last edited by Rose Simmons; 17 Apr 2017, 14:52. Reason: Graphs added for clarity

        Comment


        • #5
          The choice between income and log income as a predictor is not a matter of the distribution of the variable. It is a matter of which model better fits the data. In fitting a probit model to savings as a function of log income you are saying that constant multiples of income are associated with the same difference in invnormal(probability of savings). By contrast, if you use income directly, you are saying that constant additive increments of income are associated with the same difference in invnormal(probability of savings). I have no idea which of those is true (or closer to true)--this is way out of my area. But this may be something that has been well studied in the economic literature and you might be able to find an answer there, or by consulting a colleague with expertise in this area. If there is no prior information about this issue, then you should look at which model produces a better fit to your data.

          As for question 5, if lnincome has a marginal effect of 0.025 (approx.), then a 1% increase in income would be associated, on average, with an increase in predicted probability of .025 (or, equivalently, 2.5 percentage points.) This is, by the way, an approximation based on the fact that ln(1.01) is very close to 0.01. It is an approximation that begins to fail when you try to apply the same rule of thumb to a 10% increase in income--ln(1.10) = .095, so that it would be an increase in predicted probability of about 23.75 (= 2.5*9.5) percentage points, not 25. Actually, it's worse than that. Probit is a non linear model, and the marginal effect changes as the value of income (or ln(income)) changes. So the rule of thumb interpretation of marginal effects really breaks down very badly when you get beyond very small changes like 1 or 2%, because the marginal effect is likely to be substantially different when ln(income) increases by 10% from any given baseline value.

          Comment


          • #6
            I have not come across the use of lnincome in the economic literature, so perhaps it is best to stick to income.
            Also, I have conducted a Wald test which I think indicates income should be used instead of lnincome:

            Code:
            . xtprobit saving $xlist employed retired health income lnincome risk selfcontrol child savingexp
            > partner uni owner male c.age c.age#c.age  i.year, re vce(cluster hhid) nolog
            
            Calculating robust standard errors:
            
            Random-effects probit regression                Number of obs     =      5,229
            Group variable: hhid                            Number of groups  =      1,717
            
            Random effects u_i ~ Gaussian                   Obs per group:
                                                                          min =          1
                                                                          avg =        3.0
                                                                          max =         13
            
            Integration method: mvaghermite                 Integration pts.  =         12
            
                                                            Wald chi2(33)     =    1012.83
            Log pseudolikelihood  = -2318.0444              Prob > chi2       =     0.0000
            
                                           (Std. Err. adjusted for 1,717 clusters in hhid)
            ------------------------------------------------------------------------------
                         |               Robust
                  saving |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    prec |    .004333   .0100784     0.43   0.667    -.0154204    .0240863
                purchase |  -.0066828   .0096283    -0.69   0.488    -.0255539    .0121884
                  retire |   .0083274   .0077809     1.07   0.285    -.0069228    .0235777
                 bequest |   .0047083   .0062142     0.76   0.449    -.0074714     .016888
                 mediumh |   .2650664   .0520052     5.10   0.000      .163138    .3669948
                   longh |   .2237275   .1395066     1.60   0.109    -.0497004    .4971553
                employed |  -.0241512   .1083568    -0.22   0.824    -.2365266    .1882242
                 retired |   .0320783   .1106649     0.29   0.772    -.1848209    .2489775
                  health |   .0908535   .0439819     2.07   0.039     .0046506    .1770564
                  income |   4.01e-06   1.76e-06     2.28   0.023     5.61e-07    7.47e-06
                lnincome |   .0312861   .0449681     0.70   0.487    -.0568497     .119422
                    risk |  -.0019036   .0046757    -0.41   0.684    -.0110677    .0072606
             selfcontrol |   .2862064   .0221227    12.94   0.000     .2428468    .3295661
                   child |  -.1343374   .0371685    -3.61   0.000    -.2071864   -.0614884
               savingexp |   1.637051   .0604185    27.10   0.000     1.518633    1.755469
                 partner |  -.1466265   .0839361    -1.75   0.081    -.3111383    .0178852
                     uni |   .1773929   .0827722     2.14   0.032     .0151623    .3396235
                   owner |   .1907355   .0747272     2.55   0.011     .0442729    .3371981
                    male |   .2550749   .0932809     2.73   0.006     .0722476    .4379022
                     age |  -.0149071   .0155204    -0.96   0.337    -.0453265    .0155123
                         |
             c.age#c.age |   .0001021   .0001457     0.70   0.483    -.0001834    .0003876
                         |
                    year |
                   2005  |  -.9855769   .1001328    -9.84   0.000    -1.181833   -.7893203
                   2006  |  -1.075177   .1111691    -9.67   0.000    -1.293064   -.8572895
                   2007  |   -1.03513   .1085451    -9.54   0.000    -1.247875   -.8223859
                   2008  |  -.9656881    .106539    -9.06   0.000    -1.174501   -.7568755
                   2009  |  -.8685176    .105183    -8.26   0.000    -1.074673   -.6623627
                   2010  |  -1.097256   .1062568   -10.33   0.000    -1.305515   -.8889961
                   2011  |  -1.023308   .1335108    -7.66   0.000    -1.284985    -.761632
                   2012  |  -.9060072   .1318514    -6.87   0.000    -1.164431   -.6475833
                   2013  |  -1.015178   .1426392    -7.12   0.000    -1.294746   -.7356104
                   2014  |   -1.01549   .1236541    -8.21   0.000    -1.257848   -.7731325
                   2015  |  -.9385416   .1362736    -6.89   0.000    -1.205633   -.6714503
                   2016  |  -1.097808   .1319412    -8.32   0.000    -1.356408    -.839208
                         |
                   _cons |  -2.351527   .6392282    -3.68   0.000    -3.604392   -1.098663
            -------------+----------------------------------------------------------------
                /lnsig2u |  -.8843267   .1401559                     -1.159027   -.6096262
            -------------+----------------------------------------------------------------
                 sigma_u |   .6426446   .0450352                      .5601708    .7372611
                     rho |    .292282   .0289917                      .2388441    .3521445
            ------------------------------------------------------------------------------
            
            . test income lnincome
            
             ( 1)  [saving]income = 0
             ( 2)  [saving]lnincome = 0
            
                       chi2(  2) =   15.18
                     Prob > chi2 =    0.0005
            
            . test income
            
             ( 1)  [saving]income = 0
            
                       chi2(  1) =    5.19
                     Prob > chi2 =    0.0227
            
            . test lnincome
            
             ( 1)  [saving]lnincome = 0
            
                       chi2(  1) =    0.48
                     Prob > chi2 =    0.4866
            Thank you for the clear numerical explanation, I see that the AME interpretations become more unstable as we go above the small % increases in income.


            Returning to your suggestion in #3, I have generated incomescaled:
            Code:
            . gen incomescaled = income/1000
            (5,759 missing values generated)
            
            . xtprobit saving $xlist employed retired health incomescaled risk selfcontrol child savingexp par
            > tner uni owner male c.age c.age#c.age  i.year, re vce(cluster hhid) nolog
            
            Calculating robust standard errors:
            
            Random-effects probit regression                Number of obs     =      5,248
            Group variable: hhid                            Number of groups  =      1,721
            
            Random effects u_i ~ Gaussian                   Obs per group:
                                                                          min =          1
                                                                          avg =        3.0
                                                                          max =         13
            
            Integration method: mvaghermite                 Integration pts.  =         12
            
                                                            Wald chi2(32)     =    1015.46
            Log pseudolikelihood  = -2326.9353              Prob > chi2       =     0.0000
            
                                           (Std. Err. adjusted for 1,721 clusters in hhid)
            ------------------------------------------------------------------------------
                         |               Robust
                  saving |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    prec |   .0051596    .010072     0.51   0.608    -.0145812    .0249004
                purchase |  -.0063193   .0096012    -0.66   0.510    -.0251374    .0124987
                  retire |   .0077005   .0077755     0.99   0.322    -.0075392    .0229401
                 bequest |   .0040619   .0062252     0.65   0.514    -.0081392     .016263
                 mediumh |   .2634191   .0520474     5.06   0.000     .1614082    .3654301
                   longh |   .2016656   .1390572     1.45   0.147    -.0708814    .4742127
                employed |   -.015734   .1085609    -0.14   0.885    -.2285093    .1970414
                 retired |   .0340827   .1105945     0.31   0.758    -.1826785    .2508438
                  health |   .0917022   .0439204     2.09   0.037     .0056198    .1777845
            incomescaled |   .0047015   .0013652     3.44   0.001     .0020257    .0073772
                    risk |  -.0016178    .004678    -0.35   0.729    -.0107865    .0075508
             selfcontrol |   .2846024    .022067    12.90   0.000     .2413519    .3278529
                   child |  -.1379202   .0371526    -3.71   0.000    -.2107378   -.0651025
               savingexp |   1.636882   .0604239    27.09   0.000     1.518453    1.755311
                 partner |  -.1470189   .0840967    -1.75   0.080    -.3118455    .0178077
                     uni |   .1797381   .0828612     2.17   0.030     .0173332    .3421431
                   owner |   .1961271   .0746949     2.63   0.009     .0497278    .3425264
                    male |   .2665245   .0934311     2.85   0.004     .0834029    .4496461
                     age |  -.0152692   .0155381    -0.98   0.326    -.0457233    .0151849
                         |
             c.age#c.age |   .0001046   .0001458     0.72   0.473    -.0001812    .0003903
                         |
                    year |
                   2005  |  -.9836225   .1001202    -9.82   0.000    -1.179854   -.7873906
                   2006  |  -1.076696   .1112256    -9.68   0.000    -1.294694   -.8586978
                   2007  |   -1.03664   .1086714    -9.54   0.000    -1.249632   -.8236479
                   2008  |  -.9681669   .1064297    -9.10   0.000    -1.176765   -.7595685
                   2009  |  -.8765504   .1051705    -8.33   0.000    -1.082681   -.6704199
                   2010  |  -1.093482   .1064268   -10.27   0.000    -1.302075   -.8848896
                   2011  |  -1.023239   .1335747    -7.66   0.000     -1.28504    -.761437
                   2012  |  -.9060006    .131999    -6.86   0.000    -1.164714   -.6472873
                   2013  |  -1.018888   .1422776    -7.16   0.000    -1.297747   -.7400292
                   2014  |  -1.023546   .1235291    -8.29   0.000    -1.265658   -.7814332
                   2015  |  -.9400109   .1363781    -6.89   0.000    -1.207307   -.6727149
                   2016  |  -1.099619   .1318772    -8.34   0.000    -1.358094   -.8411444
                         |
                   _cons |  -2.061224   .4942043    -4.17   0.000    -3.029846   -1.092601
            -------------+----------------------------------------------------------------
                /lnsig2u |  -.8696584   .1389346                     -1.141965   -.5973516
            -------------+----------------------------------------------------------------
                 sigma_u |   .6473752   .0449714                        .56497    .7417999
                     rho |   .2953254   .0289135                      .2419597    .3549498
            ------------------------------------------------------------------------------
            
            . sum incomescaled
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
            incomescaled |      7,458    32.69904    32.85171          0   1370.179
            
            . margins, dydx(incomescaled)
            
            Average marginal effects                        Number of obs     =      5,248
            Model VCE    : Robust
            
            Expression   : Pr(saving=1), predict(pr)
            dy/dx w.r.t. : incomescaled
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            incomescaled |   .0010873   .0003144     3.46   0.001      .000471    .0017036
            ------------------------------------------------------------------------------
            Would the interpretation be that a 1000 euro unit increase in income is associated with an increase in the probability of saving by 0.11 percentage points?

            Thanks

            Comment


            • #7
              Would the interpretation be that a 1000 euro unit increase in income is associated with an increase in the probability of saving by 0.11 percentage points?
              Yes.

              Comment


              • #8
                Thank you for your help

                Comment


                • #9
                  To test model fit and decide statistically whether I should have included -income- or -lnincome- in my model, I conducted a Wald test in #6.
                  The results are as follows:

                  Code:
                  . xtprobit saving $xlist employed retired health income lnincome risk selfcontrol child savi
                  > ngexp partner uni owner male c.age c.age#c.age i.year, re vce(cluster hhid) nolog
                  
                  . test income lnincome
                  
                   ( 1)  [saving]income = 0
                   ( 2)  [saving]lnincome = 0
                  
                             chi2(  2) =   15.18
                           Prob > chi2 =    0.0005
                  
                  . 
                  . test income
                  
                   ( 1)  [saving]income = 0
                  
                             chi2(  1) =    5.19
                           Prob > chi2 =    0.0227
                  
                  . 
                  . test lnincome
                  
                   ( 1)  [saving]lnincome = 0
                  
                             chi2(  1) =    0.48
                           Prob > chi2 =    0.4866
                  Question 1: Please could you let me know if I conducted the test of model fit correctly?

                  Question 2: How should I interpret the three test outcomes. Would it be:

                  -test income lnincome-
                  p-value=0.0005
                  Inclusion of both -income- and -lnincome- improves model fit. So next step is to test -income- and -lnincome- separately.

                  -test income-
                  p-value=0.0227
                  Inclusion of -income- alone is significant and improves model fit.

                  -test lnincome-
                  p-value=0.4866
                  Inclusion of -lnincome- alone is insignificant and does not improve model fit.

                  As -income- alone improves model fit, there is no need to include -lnincome-?

                  Thanks

                  Comment


                  • #10
                    Rose:
                    at its face-value, I find hard to justify the reason for including a predictor and its logged version.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Thank you for your reply Carlo Lazzaro

                      I agree that I should either use -income- or -lnincome- and not both. To decide which one of these to use, would the Wald test in #9 suggest that I should use -income- and not -lnincome-, because the p-value for -test income- is significant?

                      Comment


                      • #12
                        Rose:
                        I would not sponsor the p-value approach.
                        The main goal of a regression model is giving an acceptable proxy of data generating process.
                        As an aside, including -income- or -lnincome- brings about a different way of interpreting how this predictor contribute to explain variation in the dependen variable when adjusted for the other independent variables. What does the literature in your rersearch field suggest?
                        Eventually, why coding:
                        Code:
                        c.age c.age#c.age
                        when you can type it more efficiently:
                        Code:
                        c.age##c.age
                        ?
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Ah ok so I should not use the p-value approach - why so? Is it because the inclusion of variables should not be dictated merely by a statistical test?

                          I had been wondering whether I should conduct a statistical test to test for functional form.
                          This would determine if ln(income) would be better, because in #4 when I plotted histograms, I thought ln(income) was better as it is not skewed and follows more of a normal distribution.
                          On the other hand, given that the literature uses income, not ln(income), perhaps I should stick to income.


                          Code:
                           c.age##c.age
                          Thank you for the suggested coding, I will use that from now on - much more efficient!
                          Last edited by Rose Simmons; 18 Apr 2017, 04:09.

                          Comment


                          • #14
                            Originally posted by Clyde Schechter View Post
                            "Would the interpretation be that a 1000 euro unit increase in income is associated with an increase in the probability of saving by 0.11 percentage points?"
                            Yes.
                            Clyde Schechter would the interpretation of AMEs in #6 be similar for discrete variables? For example:

                            Code:
                            . margins, dydx(male) 
                            
                            Average marginal effects                        Number of obs     =     12,951
                            Model VCE    : OIM
                            
                            Expression   : Pr(saving=1), predict(pr)
                            dy/dx w.r.t. : male
                            
                            ------------------------------------------------------------------------------
                                         |            Delta-method
                                         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                                    male |   .1167403   .0166912     6.99   0.000      .084026    .1494545
                            ------------------------------------------------------------------------------
                            The figure above is made-up, but say the AME for male was 0.1167403, would this be interpreted as:
                            For males, compared to females, the probability of saving is 11.67 percentage points higher?

                            Many thanks

                            Comment


                            • #15
                              The figure above is made-up, but say the AME for male was 0.1167403, would this be interpreted as:
                              For males, compared to females, the probability of saving is 11.67 percentage points higher?


                              Yes.

                              Comment

                              Working...
                              X