Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing average marginal effects across samples

    Hello everyone,

    for my Master's thesis I would like to compare average marginal effects across samples. I am looking at the association of stunting in children and parental education, with my dependent variable being stunting (a dummy) and my independent variable being the number of years the most educated parent went to school. The effect should be compared for boys and girls. In addition, I report odds ratios for each regression separately without testing for differences (such a test would suffer from both conceptual and econometric problems) To this end, I use the following code:

    *******************************************
    svy, subpop(male_D): logit stunting highest_education_years, or
    margins, dydx(*) post
    estimates store a1


    svy, subpop(female_D): logit stunting highest_education_years, or
    margins, dydx(*) post
    estimates store a2

    suest a1 a2

    test [a1_stunting]highest_education_years = [a2_stunting]highest_education_years
    ******************************************

    However, this does only work if I exclude the margins commands, i.e. if I compare odds ratios across samples. As I mentioned, I would prefer comparing Average Marginal Effects (see Mood 2010 for an explanation on why AME are preferable). With the margins commands, the code stops at the suest command because e(b) and e(V) cannot be retrieved. Essentially, I only need suest because I don't really know how Stata would label the equations without it (it's not simply "a1" and "a2" although I use estimates store). Does anyone know how to solve this? (Please note that I would like to avoid using interaction terms because I find them difficult to interpret in a non-linear setting. I am therefore looking specifically for a solution based on separate regressions)


    Alternatively, I tried to to the testing by hand using z=[b1-b2]/sqrt[Var(b1)+Var(b2)] with b1 being the education coefficient from model a1 and b2 for model a2:

    ******************************************
    svy, subpop(male_D): logit stunting highest_education_years, or
    margins, dydx(*) post
    mat A[1,1] = el(e(b),1,1)
    mat A[2,1] = sqrt(el(e(V),1,1))

    svy, subpop(female_D): logit stunting highest_education_years, or
    margins, dydx(*) post
    mat A[1,2] = el(e(b),1,1)
    mat A[2,2] = sqrt(el(e(V),1,1))

    *test statistic
    mat z = J(1,1,.)
    mat z[1,1] = (A[1,1]-A[1,2])/sqrt(A[2,1]^2+A[2,2]^2)
    matrix list z
    display 2*(1-normal(abs(z[1,1]))) //This line should give me the p-value
    ****************************************
    The code works but I assume here that "z" follows a standard normal distribution (However, the sample size is very large, so this might be a minor problem). In addition, I am not completely sure whether the formula for the test statistic is correct. You have probably noticed that I use the svy prefix and stata would usually calculate an adjusted Wald test in such a case. Does anyone know how the respective formula would have to look like?


    Thank you very much in advance!

    Kind regards,
    Christian

  • #2
    If you cite an article you need to give the full reference. We are an interdisciplinary list, so citations you think are common are in all likelihood completely obscure in other (sub-(sub-))disciplines. I assume you mean Mood, C. (2010). Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European Sociological Review, 26(1), 67-82. In that case I disagree with the author.

    Their main claim is that you cannot compare odds ratios across groups because the scale is not identified. This is simply not true. An odds is in your case the expected number of stunted per non-stunted. An odds ratio is the ratio of the odds of being stunted for a child of a parent with x years of education over the odds of being stunted for a child of a parent with x+1 years of education. So the odds ratio has a known scale and interpretation, which you can compare across groups. A bigger question is whether you can give such a comparison a causal interpretation. Neither odds ratios nor AMEs solve that problem.

    Another beef I have with AMEs is if you only report those you are actually estimating a linear probability model in disguise. Disguising your actual model is bad practice. If you want to estimate a linear probability model, then just be up front about it and do it in one go and estimate a simple linear regression (with robust standard errors) on your binary dependent variable. This would not be my choice, but I think it is a lot more honest then estimating a logit and than compute AMEs.

    Below is an example on how to perform in Stata the different types comparisons discussed above.

    Code:
    // load and prepare some example data
    sysuse nlsw88, clear
    
    // compare odds ratios
    logit union c.grade##i.south, or
    
    // compare AMEs
    margins, dydx(grade) over(south) post
    test 0.south = 1.south
    
    // linear probability model
    reg union c.grade##i.south, vce(robust)
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Hello Maarten,

      thank you very much for your quick reply and the code you provided. My apologies for omitting the full reference. I am curious why exactly you disagree with the Mood paper. The way I understand her (and Allison(1999, see full reference below)) is that unobserved heterogeneity could cause misleading results because it influences the logit coefficients (and their anti-log, i.e. the odds ratios). In case the unobserved heterogeneity differs between samples, you cannot know whether observed differences are in fact true differences in coefficients. Please correct me if I misunderstood the concept.

      I do agree with your criticism regarding AME. The trouble I have with comparing odds ratios (apart from what is claimed by Mood(2010)) is that the interpretation of the difference between odds ratios is not clear to me. Suppose I find that the coefficient for schooling is positive in both samples and the corresponding odds ratio larger in the sample for boys than in the sample for girls. What would that imply? If you want to translate an odds ratio into an effect on the probability of being stunted you need to know the baseline odds. In my sample, those tend to be larger for boys than for girls.
      My strategy would, therefore, be to report both odds ratios and AME but only use the AME for testing. That way one does not hide interesting information from the reader but can also report a value (AME) that it easier to understand in terms of cross-sample comparisons. Do you find such an approach sensible?


      Allison, P. D. (1999). Comparing logit and probit coefficients across groups. Sociological Methods and Research, 28, 186–208.
      Mood, C. (2010). Logistic regression: Why we cannot do what we think we can do, and what we can do about it.European Sociological Review, 26(1), 67-82.

      Comment


      • #4
        Originally posted by Christian Bommer View Post
        The way I understand her (and Allison(1999, see full reference below)) is that unobserved heterogeneity could cause misleading results because it influences the logit coefficients (and their anti-log, i.e. the odds ratios). In case the unobserved heterogeneity differs between samples, you cannot know whether observed differences are in fact true differences in coefficients. Please correct me if I misunderstood the concept.
        The fact that unobserved heterogeneity influences something is not enough for it to be a problem. You also need to show that the result of that influence is that you no longer measure what you want to measure. They want to interpret their results in terms of an effect on a latent variable, and the scale of the latent variable is defined by the variance of the error term. So if that variance differs across groups your dependent variable will be measured on different scales, and the effects cannot be compared. The easy solution is to interpret things in terms of odds ratios. As I showed, the scale is defined and comparable across groups. You should be careful not to give them a causal interpretation, but other than that there are no more problems.

        Originally posted by Christian Bommer View Post
        The trouble I have with comparing odds ratios (apart from what is claimed by Mood(2010)) is that the interpretation of the difference between odds ratios is not clear to me. Suppose I find that the coefficient for schooling is positive in both samples and the corresponding odds ratio larger in the sample for boys than in the sample for girls. What would that imply?
        It means that the odds ratio is larger for boys than for girls. I supose the bigger problem is that you have trouble with odds and odds ratios. An odds is just the expected number of stunted children per non-stunted child. So instead of dividing the number of stunted children by the total number of children (to get the probability) you divide them by the number of non-stunted children (to get the odds). The odds ratio is just the ratio of the odds, so an odds ratio of 1.05 means that the odds of someone with a year more education is 1.05 times larger (or equivalently 5% larger).

        Originally posted by Christian Bommer View Post
        If you want to translate an odds ratio into an effect on the probability of being stunted
        The trick with interpreting odds ratios is to not translate them to probabilities, but to interpret them in terms of odds. So you should not want to translate them to probabilities.

        Originally posted by Christian Bommer View Post
        you need to know the baseline odds. In my sample, those tend to be larger for boys than for girls.
        The baseline odds is indeed useful, but not because we deal with odds, but because we measure the effect in terms of ratios

        Originally posted by Christian Bommer View Post
        My strategy would, therefore, be to report both odds ratios and AME but only use the AME for testing. That way one does not hide interesting information from the reader but can also report a value (AME) that it easier to understand in terms of cross-sample comparisons. Do you find such an approach sensible?
        If you show the odds ratios and interpret them correctly, then there is no need to waste space with showing AMEs. In practice, the problem is more often that the odds ratios are shown, but never discussed in the text (or just mentioned but not really discussed). That obviously does not help. I would either stick to only odds ratios or a linear probability model
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Ok, thanks for the clarification, Maarten!

          Comment


          • #6
            Maarten has expressed the odds ratio interpretation approach very clearly and helpfully. I would also note that there is a diversity of opinion about the usefulness of interpretations in terms of odds ratios. For a view at the other extreme, have a look at "Log odds and ends" by Edward C. Norton, NBER Working Paper 18252, http://www.nber.org/papers/w18252

            Comment


            • #7
              Thanks for the hint!

              Comment

              Working...
              X