Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Weird-looking" average marginal effects of a two-part model (twopm)

    Dear Stata group,

    I am struggling with the marginal effects of the very helpful package twopm by Belotti and Deb (https://econpapers.repec.org/softwar...de/s457538.htm), which seems the right kind of model for our type of data: the dependent variable Y has a large proportion of zeros (median is 0 overall) and is otherwise extremely right-skewed and continuous. I am trying to estimate the effects of an experiment with two treatment arms with pre and post observations.


    In the example below, I show one cross-sectional regression with a logit part and a linear regression of log(Y>0). I ran the margins command with the “duan” option, which uses Duan's smearing retransformation to obtain fitted values, as recommended in the twopm help file.

    What I am puzzled about is why the average marginal effects are all insignificant, in particular since one of the predictors (l1_x1) is significant in both the logit and the linear regression part. Would someone be able to explain this? When I run the same regression after first transforming the y-variable into a ratio, the average marginal effects look much more “sensible”, e.g., l1_x1 is then highly significant. My interest is mostly in the treatment effects, but the insignificance of l1_x1 makes me suspicious of the reported average marginal effects in general.

    For the example below, I ran the following commands:

    twopm y treatment2 treatment3 l1_x1 l1_x2 if t == 210, firstpart(logit) secondpart(regress, log) vce(r)

    margins, predict(duan) dydx(*)

    On related notes: If I transform the Y-variable myself by taking logs and leaving the zeros at 0 before running twopm, how do I calculate the correct average marginal effects? Can someone point me to other methods that might be appropriate for our type of data? What is unfortunately still missing from twopm is an option to account for individual-level variance, i.e., to make use of the panel-structure. I am pondering a square- or cubic-root transformation of Y and the continuous predictors and then run xtreg, fe, but I hesitate because the interpretation of the coefficients then becomes less straightforward and this approach wouldn’t give me much insight in the processes that generate the binary part and the continuous part.

    Any advice would be highly appreciated. My apologies for the formatting of the Stata output. I would have included a picture, had the posting guidelines (this is my first post) not warned me off doing that.

    Kind regards,
    Arne



    twopm y treatment2 treatment3 l1_x1 l1_x2 if t == 210, firstpart(logit) secondpart(regress, log) vce(r)

    Fitting logit regression for first part:

    Iteration 0: log pseudolikelihood = -4898.9361
    Iteration 1: log pseudolikelihood = -4828.0511
    Iteration 2: log pseudolikelihood = -4825.3601
    Iteration 3: log pseudolikelihood = -4825.3592
    Iteration 4: log pseudolikelihood = -4825.3592

    Fitting OLS regression for second part:

    Two-part model
    ------------------------------------------------------------------------------
    Log pseudolikelihood = -10340.873 Number of obs = 7141

    Part 1: logit
    ------------------------------------------------------------------------------
    Number of obs = 7141
    Wald chi2(4) = 72.80
    Prob > chi2 = 0.0000
    Log pseudolikelihood = -4825.3592 Pseudo R2 = 0.0150

    Part 2: regress_log
    ------------------------------------------------------------------------------
    Number of obs = 3996
    F( 4, 3991) = 7.56
    Prob > F = 0.0000
    R-squared = 0.2062
    Adj R-squared = 0.2054
    Log likelihood = -5515.5138 Root MSE = 0.9627
    ------------------------------------------------------------------------------
    | Robust
    y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    logit |
    treatment2 | .1120826 .0583787 1.92 0.055 -.0023375 .2265027
    treatment3 | .3461758 .0591399 5.85 0.000 .2302636 .4620879
    l1_x1 | .0014139 .0002617 5.40 0.000 .0009011 .0019268
    l1_x2 | -.0001279 .0000254 -5.04 0.000 -.0001777 -.0000781
    _cons | -.0154538 .0490313 -0.32 0.753 -.1115534 .0806458
    -------------+----------------------------------------------------------------
    regress_log |
    treatment2 | .0162864 .0394817 0.41 0.680 -.0610962 .093669
    treatment3 | .0113214 .0378215 0.30 0.765 -.0628074 .0854502
    l1_x1 | .0011357 .0003257 3.49 0.000 .0004973 .0017742
    l1_x2 | .0001861 .0000668 2.79 0.005 .0000552 .0003169
    _cons | 4.921944 .0564761 87.15 0.000 4.811253 5.032635
    ------------------------------------------------------------------------------

    . margins, predict(duan) dydx(*)

    Average marginal effects Number of obs = 7,141
    Model VCE : Robust

    Expression : twopm combined expected values, predict(duan)
    dy/dx w.r.t. : treatment2 treatment3 l1_x1 l1_x2

    ------------------------------------------------------------------------------
    | Delta-method
    | dy/dx Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    treatment2 | 872.558 3405.784 0.26 0.798 -5802.657 7547.773
    treatment3 | 1604.371 7080.782 0.23 0.821 -12273.71 15482.45
    l1_x1 | 37.03503 101.7666 0.36 0.716 -162.4238 236.4939
    l1_x2 | 4.729553 12.28636 0.38 0.700 -19.35126 28.81037
    ------------------------------------------------------------------------------




  • #2
    Arne: A lot of things could be going on here, so hard to diagnose. My first instinct would be to respecify the second part as something like
    Code:
    ...secondpart(glm, family(gamma) link(log))
    and see how the results might differ from what you've obtained.

    Comment

    Working...
    X