Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • margins coeflegend difficult to understand with 2-way interaction and at()

    Dear Statalisters,

    I am trying to do a number of things with this regression:
    1. run a diff-in-diff
    2. in a logit model
    3. in a survival analysis framework (discrete time models)
    I used Jenkins' lectures (particularly, no.6) as a starting point and have read all the relevant threads and Stata articles of my knowledge.
    This is my code:

    Code:
    logit event treat##after x1 x2 period period2 , cluster(id)    
    margins treat, dydx(after) pwcompare(cimargins effects) at(period==(1(1)10)) at(bcoh==1980) coeflegend post
    where:
    split: dummy variable ==1 if the relationship after t period (years)
    treat: dummy ==1 if the individual lives in a state affected by the policy, 0 otherwise
    after: dummy==1 if the the individual got married after the policy started
    x1 & x2: other independent variables
    period: marriage duration
    period2: period squared

    It's an unbalanced panel in which each marriage has as many rows as the years of marriage. Split==0 except on the last year. Other than split, period and period2, the variables are constant.

    I have two issues:

    I. I am not managing to plot this with marginsplot (there's an error)
    II. When I do it manually, adding -coeflegend post-, I don't understand the labels


    The part of the puzzling output is:
    Code:
    -----------------------------------------------------------------------------------
                      |   Contrast                 Unadjusted
                      |      dy/dx  Legend
    ------------------+----------------------------------------------------------------
    1.after           |
            _at#state |
    ( 1 1) vs ( 1 0)  |  -.0527861  _b[1.after:1vs1bn._at#1vs0bn.state]
    ( 2 0) vs ( 1 0)  |  -.0123686  _b[1.after:2vs1._at#0vs0.state]
    ( 2 1) vs ( 1 0)  |  -.0624848  _b[1.after:2vs1._at#1vs0bn.state]
    ( 3 0) vs ( 1 0)  |  -.0248322  _b[1.after:3vs1._at#0vs0.state]
    ( 3 1) vs ( 1 0)  |  -.0720985  _b[1.after:3vs1._at#1vs0bn.state]
    ( 4 0) vs ( 1 0)  |  -.0371947  _b[1.after:4vs1._at#0vs0.state]
    Thanks to anyone who can provide help.
    Last edited by Fabio Martinenghi; 22 Mar 2019, 05:14.

  • #2
    Your -logit- command is not properly set up to work with -margins- and the estimates you are getting from it are incorrect, so it is just as well you cannot plot them. The problem arises from the quadratic term of period: by constructing it as a separate variable, period2, -margins- has no way to know that period2 is the square of period, so it is calculating things as if the two variables had nothing to do with each other.

    Code:
    logit event treat##after x1 x2 c.period##c.period , cluster(id)  
    margins treat, dydx(after) pwcompare(cimargins effects) at(period==(1(1)10)) at(bcoh==1980) coeflegend post
    By the way, your description is a bit confusing. You do not explain the outcome variable event, but you do explain a variable, split, which does not appear in the model. Is split another name for event? Also, if, as I am inferring from your use of -cluster(id)-, you have repeated observations on the same people over time, it is probably not appropriate to use -logit-. You would need to use an analysis that properly takes into account the repeated observations (-cluster(id)- gets you part of the way there, but is not usually sufficient) such as -xtlogit- or -melogit-.

    It is also unclear to me what you are trying to calculate with your -margins- command. Are you sure you want both -dydx(treat)- and -pwcompare()-? It is legal, but it isn't something people often seek to do?

    It's not clear to me why you want the -coeflegend- option. What are you going to do with that information? It isn't very often that one needs that after -margins-.
    Last edited by Clyde Schechter; 22 Mar 2019, 13:00.

    Comment


    • #3
      Thank you Clyde for your thorough response.
      1. I am sorry, split is event. I was trying to make things clearer and I ended up making a mess;
      2. I will investigate the xtlogit command. Insofar, in survival analysis, I have seen both Jenkins and Hernan using logit. It might have to do with the fact that the panel is very unbalanced, for each marriage has only as many rows as its duration in years (period). But this is just my guess;
      3. Re , I actually read that in this thread here on statalist, but I'm happy to change it if redundant;
      4. Coming to -coeflegend-, I was trying to obtain manually what I could not get automatically through -marginsplot-. I couldn't find any interesting info on that error (something to do with labels) so I tried this other manual way, which required me to locate under what name are the estimates stored. I'm also happy to use -marginsplot-, if possible;
      5. Most importantly, my aim is to estimate the Diff-in-Diff parameter for each period. Say I am only interested in the first 10 years of marriage (as in the code above) and how the policy impacts those, then I want to estimate for each year (period) what is the marginal effect of the policy on the divorce probability. From my understanding, this should result in only one curve; is that correct?
      Now, I tried to amend it following your instructions
      Code:
      qui logit split state##after x1 x2 c.period##c.period , cluster(id)    
      margins state, dydx(after)  at(period==(1(1)10)) at(x1==1980) pwcompare(cimargins effects)
      but when running -marginsplot- I still get the error:
      ``_term not labelled''
      Furthermore, I am not able to interpret the output of -margins-:
      Code:
      -----------------------------------------------------------------------------------
                        |   Contrast Delta-method    Unadjusted           Unadjusted
                        |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ------------------+----------------------------------------------------------------
      1.after|
             _at#state   |
      ( 1 1) vs ( 1 0)  |   .0423558   .0306932     1.38   0.168    -.0178018    .1025134
      ( 2 0) vs ( 1 0)  |   .0121108   .0027203     4.45   0.000      .006779    .0174425
      ( 2 1) vs ( 1 0)  |   .0512409   .0302538     1.69   0.090    -.0080554    .1105372
      ( 3 0) vs ( 1 0)  |   .0224256   .0050808     4.41   0.000     .0124674    .0323838
      ( 3 1) vs ( 1 0)  |   .0587271   .0299599     1.96   0.050     6.82e-06    .1174474
      ( 4 0) vs ( 1 0)  |    .031071   .0070888     4.38   0.000     .0171773    .0449647
      ( 4 1) vs ( 1 0)  |   .0649458   .0297715     2.18   0.029     .0065947    .1232968
      I also tried to not include the -, pwcompare(cimargins effects)- options.

      Code:
      qui logit split state##after x1 x2 c.period##c.period , cluster(id)    
      margins state, dydx(after)  at(period==(1(1)10)) at(x1==1980)
      marginsplot
      -marginsplot- worked this this, but displayed two plots, one corresponding to (state==1 & after==1) and one to (state==0 & after==1). The output is also readable:
      Code:
      ------------------------------------------------------------------------------
                   |            Delta-method
                   |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      1.after       |
        _at#state|
              1 0  |  -.1376984   .0292226    -4.71   0.000    -.1949736   -.0804231
              1 1  |  -.0953426   .0173397    -5.50   0.000    -.1293277   -.0613574
              2 0  |  -.1255876    .027068    -4.64   0.000    -.1786399   -.0725352
              2 1  |  -.0864574    .015897    -5.44   0.000     -.117615   -.0552998
              3 0  |  -.1152728   .0252481    -4.57   0.000    -.1647581   -.0657874
      1. Do these estimates correspond to the ones I am aiming at? (without accounting for the improvements I may be able to get by using xtlogit or perfecting my model)
      2. Being the parameter of interest one (the DiD), shouldn't I want (and get) a single (``connect'') line plotted?
      Thank you again, this thread will resolve many of my concerns, so that I can be confident the code is actually doing what I want.

      Comment


      • #4
        Based on your explanation, I think I would revise the underlying model. While I am usually advising people not to generate their own interaction terms to reflect DID, in this case I think it will simplify the use of -margins-.

        I am not 100% sure I understand your data structure and variables, so please think this through before you use it. My understanding is that each observation represents a year in the life of a person (couple?) The data span several states. Some states have the policy of interest and others don't during some or all of the study. The variable treat encodes (1/0) those observations where the person's state of residence has the policy in effect that year. You don't mention it, but I assume there is some variable that identifies whether or not a person is married in that year. This is relevant because, presumably, policy can only affect people if they are married. I will call that variable married, coded 1 for married, 0 for not married. So the DID variable has to distinguish observations where a person is married and lives in a state that has the policy (1) from all other observations (0). Note that this will be a generalized diff-in-diff analysis because not everybody in your study, I assume, gets married in the same year. I assume you also have a person identifier variable, which I will call id.


        Code:
        gen did = 1.treat#1.married
        xtset id period
        xtlogit split i.did##c.period##c.period x1 x2, fe vce(cluster(id))
        margins, dydx(did) at(period = (1(1)10)
        marginsplot, xdimension(period)
        That enables you estimate the policy effect as a quadratic function of period, and the plot you get will graph that.

        Comment


        • #5
          Thank you Clyde, I have followed your recommendation on how to model the interaction and it now works. I am still prudent on using a full-blown panel data approach but am investigating on it. The econometric model is far from optimal, I am aware of it. But at least I am more confident on the programming side.

          For the sake of clarity, my data structure is:
          • There are several states;
          • Some states are affected by a policy at time T, while the rest are not;
          • I have data on the year of start and year of end of marriages (hence duration in years too) of a representative sample;
          • These marriages start both before and after the policy;
          • In survival analysis you only use positive durations (here those which last at least 1 year), hence duration>=1 for any marriage i;
            • Yes, there is a marriage dummy, ==1 if i is married in year t, which is constructed from the year-of-start and year-of-end of marriages;
          • Marriage duration is censored due to either end of panel (administrative censoring) or death of one of the partners;
          • The data is by individual, although I could match the couples in the future.
          • Correct, not everyone is married in the same year.
          PS: following a discussion with my supervisor,
          I ended up specifying the model this way:
          Code:
          logit event after##c.period##c.period  state##c.period##c.period i.did##c.period##c.period  x1 x2, cluster(id)
          margins, dydx(did) at(period = (1(1)30)) at(x1=1980)
          marginsplot
          This is to truly use the diff-in-diff identification strategy. Otherwise (not including the components of the interaction: after and state) would identify the effect via ``selection on observables''.

          Comment

          Working...
          X