Margin after logit

Emily Parthy

Join Date: Aug 2024

Posts: 5
#1

Margin after logit

05 Aug 2024, 13:09

I would like to understand how to interpret results from margin and get your opinion as to my interpretation of the same.

About my dataset:
Dependent Variable: carown -- binary 1 if own car, 0 if don't own car

Independent variables:
p_educ0 -- continuous variable showing a person's probability of having less than high school education, so this can take any value between 0 and 1
p_educ1 -- continuous variable showing a person's probability of having a high school degree, so this can take any value between 0 and 1
p_educ2 -- continuous variable showing a person's probability of having some college but no degree, so this can take any value between 0 and 1
p_educ3 -- continuous variable showing a person's probability of having a college degree, so this can take any value between 0 and 1
p_educ4 -- continuous variable showing a person's probability of having some grad level education, so this can take any value between 0 and 1
p_educ5 -- continuous variable showing a person's probability of having a grad degree or higher, so this can take any value between 0 and 1

p_educ0+p_educ1+p_educ2+p_educ3+p_educ4+p_educ5=1 for each observation

n= 1,285

I am interested in interpretation of the results and not the meaningfulness of the model:

logit carown p_educ0 p_educ1 p_educ2 p_educ3 p_educ4

How do i interpret the results from the following three lines of codes:

1. margins

2. margins, at(p_educ0=1 p_educ1=0 p_educ2=0 p_educ3=0 p_educ4=0 p_educ5=0)

3. margins, at(p_educ0=0 p_educ1=1 p_educ2=0 p_educ3=0 p_educ4=0 p_educ5=0)

Can I interpret #1 as the predicted probability of car ownership in my dataset?
Can i interpret #2 as the predicted probability of car ownership if everyone in my dataset had less than high school or no schooling at all?
Similarly, can i interpret #3 as the predicted probability of car ownership if everyone in my dataset had a high school degree?

Further, what does it mean if the result in #2 or #3 is not statistically significant?

Lastly, how can i test if the predicted probability in #2 is statistically significantly different than the predicted probability in #3? My goal ultimately with this exercise is to answer the question: does level of education impact car ownership?

I understand that the model as presented is not the correct way to model the relationship between car ownership and education. I request that you ignore this fact in answering my questions.

Thank you!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

05 Aug 2024, 14:13

Added: I dont' understand how you can run -margins, at(p_educ0=1 p_educ1=0 p_educ2=0 p_educ3=0 p_educ4=0 p_educ5=0)- because p_educ5 is not included in the regression model. -margins- will reject that. You have to omit p_educ5 here. Now, this means that, in theory, you are looking at predicted car ownership probabilities with p_educ5 unconstrained. But as the model you show has no other variables, and p_educ5 does not figure in the calculation of the predicted probabilities, this is OK. However, if the model included variables other than the p_educ* variables, and if any of those other variables were correlated with p_educ5, then you would be getting incorrect results because the effect of the observed values of p_educ5 would "infiltrate" the -margins- results through those other variables.

Can I interpret #1 as the predicted probability of car ownership in my dataset?

Yes.

Can i interpret #2 as the predicted probability of car ownership if everyone in my dataset had less than high school or no schooling at all?
Similarly, can i interpret #3 as the predicted probability of car ownership if everyone in my dataset had a high school degree?

Well, no, not exactly. You can interpret #2 as the predicted probability of car ownership if everyone in your data set had a probability 1 of having high school or no schooling at all. You don't explain much about how these p_educ* variables were assessed. I'm more accustomed to seeing data sets in which each person is deterministically classified as having a given level of education. Perhaps these p_educ* variables are themselves predictions of some other model, or represent prevalence of the education levels for participants in the same postal code found in some population-level data set. In any case, there is the probability (rather, the near-certainty) that some of these education predictions are wrong. Not everybody with probability 1 of having a given level of education will actually have that level of education. So your inference can only refer to the probability of the given level of education--you cannot treat it as if it were a deterministic assessment of education.

But there is a bigger problem with this: see "Added:" above.

Further, what does it mean if the result in #2 or #3 is not statistically significant?

The same thing it would mean if the result were statistically significant: nothing at all. These are predicted probabilities of car ownership. Unless your study population is such that you can, with a straight face, posit a hypothesis that probability of a person with a certain probability of having a certain education level has a zero probability of owning a car, then this kind of null hypothesis test is just a straw-man. These p-values should be ignored. The confidence intervals, however, may be of some interest in describing the range of probabilities of car ownership that are compatible with the data. Or doing pairwise contrasts of these predictive margins might be useful--does the probability of car ownership vary with different probabilities of having given levels of education? But testing a hypothesis that the probability of car ownership is zero will make sense only in some very exotic context. I mean, seriously, no matter what educational stratum of society we look at, it would be absurd to think that there would be zero probability of car ownership.

Lastly, how can i test if the predicted probability in #2 is statistically significantly different than the predicted probability in #3?

Code:

margins, at(p_educ0=1 p_educ1=0 p_educ2=0 p_educ3=0 p_educ4=0 p_educ5=0) at(p_educ0= 0 p_educ1=1 p_educ2=0 p_educ3=0 p_educ4=0 p_educ5=0), pwcompare

My goal ultimately with this exercise is to answer the question: does level of education impact car ownership?

You can't do that because you do not have level of education. You have only a probability distribution for education levels of each observation (which may represent a person, or household, or community, or some other group of people). So you can only ask and answer questions about how that education probability distribution impacts car ownership. It is a subtly different question. Moreover, I would avoid using the word impact as that implies causality, unless this is experimental data. But if it's observational data, you have no basis for asserting causality from this kind of analysis (and limited ability to do so under strong assumptions using more complicated analyses that might require data you don't have.)

Last edited by Clyde Schechter; 05 Aug 2024, 14:51.
1 like
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#3

05 Aug 2024, 14:36

Hi Emily,

My goal ultimately with this exercise is to answer the question: does level of education impact car ownership?

The most straightforward way to do this is to have a single categorical education variable that you use as the independent variable. Then you can test whether each level of education is significantly different from a reference category. You don't even need margins to answer this research question, just the regression output. What you are doing here with the predicted probabilities should make the interpretation much much harder.

Do you not actually observe education levels directly? How did you generate these predicted probabilities? Are they from a different regression model?

Can I interpret #1 as the predicted probability of car ownership in my dataset?

Close. The phrase "in my dataset" doesn't have a clear meaning here. I believe it is holding all predictor variables at 0. You can think of this as being at the model intercept. Use the atmeans option to hold everything at the mean.

Can i interpret #2 as the predicted probability of car ownership if everyone in my dataset had less than high school or no schooling at all?

No, not really. This would be something like the predicted probability of owning a car for a hypothetical person who has a 100% probability of having less than a high school education and a 0% probability of having greater than a high school education. Notice as well that that is a situation that describes essentially no one. Basically everyone will have some probability of dropping out of high school (and therefore not 100% likely to graduate), and basically everyone will have some nonzero probability of having higher levels of education. This is a case that is probably not within the bounds of your data and is essentially nonsensical.

Similarly, can i interpret #3 as the predicted probability of car ownership if everyone in my dataset had a high school degree?

Again, no, for the same reasons. I'm surprised the line doesn't give you an error message since you don't include p_educ5 in the logit.

Further, what does it mean if the result in #2 or #3 is not statistically significant?

It is the likelihood of observing the given probability under repeated sampling if the true population probability is 0. Basically, if the probability is not statistically significant, it is not statistically different from zero.

Edit: I didn't see #2 before I posted, but I think we largely agree.

Last edited by Daniel Schaefer; 05 Aug 2024, 14:53.
Comment
Emily Parthy

Join Date: Aug 2024

Posts: 5
#4

06 Aug 2024, 06:13

Thank you so much for your responses!

Originally posted by Clyde Schechter View Post

Added: I dont' understand how you can run -margins, at(p_educ0=1 p_educ1=0 p_educ2=0 p_educ3=0 p_educ4=0 p_educ5=0)- because p_educ5 is not included in the regression model. -margins- will reject that. You have to omit p_educ5 here. Now, this means that, in theory, you are looking at predicted car ownership probabilities with p_educ5 unconstrained. But as the model you show has no other variables, and p_educ5 does not figure in the calculation of the predicted probabilities, this is OK. However, if the model included variables other than the p_educ* variables, and if any of those other variables were correlated with p_educ5, then you would be getting incorrect results because the effect of the observed values of p_educ5 would "infiltrate" the -margins- results through those other variables.

I apologize for the error in my post, i did run

logit carown p_educ0 p_educ1 p_educ2 p_educ3 p_educ4 p_educ5

and Stata omitted p_educ5 because of collinearity.

I am assuming that's why margins doesn't give me an error.

Originally posted by Clyde Schechter View Post

Perhaps these p_educ* variables are themselves predictions of some other model, or represent prevalence of the education levels for participants in the same postal code found in some population-level data set. In any case, there is the probability (rather, the near-certainty) that some of these education predictions are wrong. Not everybody with probability 1 of having a given level of education will actually have that level of education. So your inference can only refer to the probability of the given level of education--you cannot treat it as if it were a deterministic assessment of education.

p_educ* variable is a mix of self reports and predictions. If the variables take the value 0 or 1, then they are self reports else they are predictions of another model. So the series are riddled with all the limitations that you have mentioned.
Comment
Emily Parthy

Join Date: Aug 2024

Posts: 5
#5

06 Aug 2024, 06:26

Thank you for your responses!

I am a little confused with this statement relating to #1:

Originally posted by Daniel Schaefer View Post

I believe it is holding all predictor variables at 0. You can think of this as being at the model intercept. Use the atmeans option to hold everything at the mean.

If it is holding all the predictors at 0, then why would the output from margins (as in #1) be different from margins, at(p_educ0=0 p_educ1=0 p_educ2=0 p_educ3=0 p_educ4=0 p_educ5=0)? My understanding was that margins was taking all the predictors at the value that the dataset has and then averaging the predictions from the model. Is that not correct?
Comment
Emily Parthy

Join Date: Aug 2024

Posts: 5
#6

06 Aug 2024, 07:38

Originally posted by Clyde Schechter View Post

Added: I dont' understand how you can run -margins, at(p_educ0=1 p_educ1=0 p_educ2=0 p_educ3=0 p_educ4=0 p_educ5=0)- because p_educ5 is not included in the regression model. -margins- will reject that. You have to omit p_educ5 here. Now, this means that, in theory, you are looking at predicted car ownership probabilities with p_educ5 unconstrained. But as the model you show has no other variables, and p_educ5 does not figure in the calculation of the predicted probabilities, this is OK. However, if the model included variables other than the p_educ* variables, and if any of those other variables were correlated with p_educ5, then you would be getting incorrect results because the effect of the observed values of p_educ5 would "infiltrate" the -margins- results through those other variables.

I apologize for the error in my post, i did run

logit carown p_educ0 p_educ1 p_educ2 p_educ3 p_educ4 p_educ5

and Stata omitted p_educ5 because of collinearity.

I am assuming that's why margins doesn't give me an error.

Edited to add -- does this change my interpretation of margins?

If instead of

logit carown p_educ0 p_educ1 p_educ2 p_educ3 p_educ4 p_educ5

I did run a model excluding say p_educ3 (probability of college degree) such as --

logit carown p_educ0 p_educ1 p_educ2 p_educ4 p_educ5

Is margins still giving me the predicted probability of car ownership in my full dataset based on the limited predictors? The output doesn't change so, I think that is correct. It's only the variable that has been omitted that changed.

Then, margins, at(p_educ0=1 p_educ1=0 p_educ2=0 p_educ4=0 p_educ5=0) is giving me the predicted probability of car ownership if everyone in my sample had the probability of all the education levels in the model set to 0. Can this then be interpreted as the predicted probability of car ownership assuming everyone in my sample has the probability of college degree set to 1 (since college degree is excluded from the model and p_educ* should sum to 1)?

Also, then margins, at(p_educ0=0 p_educ1=0 p_educ2=0 p_educ4=0 p_educ5=0) at(p_educ0=0 p_educ1=1 p_educ2=0 p_educ4=0 p_educ5=0), pwcompare
compares the predicted probability of car ownership between a and b
where a is if everyone in my dataset had a probability of college degree set 1, and
b is if everyone in my dataset had a probability of high schol degree set to 1. Is this correct?

Thank you so much for your time!
Comment

Daniel Schaefer

Join Date: Mar 2020
Posts: 814

06 Aug 2024, 08:16

Oh, yep, you are right, my mistake. Margins on its own will produce the average marginal effect (see #4 in this thread). So margins will calculate the predicted probability by fixing the regression equation at the values given by each observation in the dataset, iteratively generating a predicted probability for each observation, then take the average over the predicted probabilities. That procedure yields slightly different results than atmeans and wildly different results than you would get fixing everything at 0. The atmeans option will average the independent variables first, then find a predicted probability by fixing the regression equations at those averages. For what it's worth, I still think the phrase "in my dataset" isn't very meaningful here since margins with and without the atmeans option will use all of the relevant information in the dataset, but it is okay if you think I'm being pedantic.

Here is a little toy example based on your problem.

Code:

clear
set obs 1000
gen x1 = runiform(0, 1)
gen x2 = runiform(0, 1 - x1)
gen x3 = runiform(0, 1 - (x1 + x2))
gen x4 = runiform(0, 1 - (x1 + x2 + x3))
gen x5 = 1 - (x1 + x2 + x3 + x4)
gen sum = x1 + x2 + x3 + x4 + x5
gen y = runiform() > 0.5
assert sum == 1

logit y x1 x2 x3 x4 x5
margins
margins, atmeans
margins, at(x1=0 x2=0 x3=0 x4=0)
margins, at(x1=0 x2=0 x3=0 x4=0 x5=0)

Code:

. logit y x1 x2 x3 x4 x5

note: x5 omitted because of collinearity.
Iteration 0:  Log likelihood = -693.07518  
Iteration 1:  Log likelihood = -691.97152  
Iteration 2:  Log likelihood = -691.97149  

Logistic regression                                     Number of obs =  1,000
                                                        LR chi2(4)    =   2.21
                                                        Prob > chi2   = 0.6977
Log likelihood = -691.97149                             Pseudo R2     = 0.0016

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.6923297   .7415292    -0.93   0.350      -2.1457    .7610408
          x2 |  -.5061452   .7691078    -0.66   0.510    -2.013569    1.001278
          x3 |  -.5516368   .8780041    -0.63   0.530    -2.272493     1.16922
          x4 |  -1.560927   1.199194    -1.30   0.193    -3.911303    .7894487
          x5 |          0  (omitted)
       _cons |     .66252   .7172691     0.92   0.356    -.7433016    2.068342
------------------------------------------------------------------------------

. margins

Predictive margins                                       Number of obs = 1,000
Model VCE: OIM

Expression: Pr(y), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |       .506   .0157928    32.04   0.000     .4750466    .5369534
------------------------------------------------------------------------------

. margins, atmeans

Adjusted predictions                                     Number of obs = 1,000
Model VCE: OIM

Expression: Pr(y), predict()
At: x1 = .4909171 (mean)
    x2 = .2613227 (mean)
    x3 = .1254111 (mean)
    x4 = .0622657 (mean)
    x5 = .0600834 (mean)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |   .5060004   .0158278    31.97   0.000     .4749785    .5370222
------------------------------------------------------------------------------

. margins, at(x1=0 x2=0 x3=0 x4=0)

Predictive margins                                       Number of obs = 1,000
Model VCE: OIM

Expression: Pr(y), predict()
At: x1 = 0
    x2 = 0
    x3 = 0
    x4 = 0

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |   .6598262    .160995     4.10   0.000     .3442818    .9753707
------------------------------------------------------------------------------

. margins, at(x1=0 x2=0 x3=0 x4=0 x5=0)

Adjusted predictions                                     Number of obs = 1,000
Model VCE: OIM

Expression: Pr(y), predict()
At: x1 = 0
    x2 = 0
    x3 = 0
    x4 = 0
    x5 = 0

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |   .6598262    .160995     4.10   0.000     .3442818    .9753707
------------------------------------------------------------------------------

. 
end of do-file

Comment

Emily Parthy

Join Date: Aug 2024

Posts: 5
#8

06 Aug 2024, 08:51

Thank you for the clarification, Daniel and the toy example. If I am interested in the differences in predicted y when the x's change, do you think it matters that the coefficients in the logit model of your example are not statistically different from 0, suggesting that there is no impact of either of the x's on y? Does it still make sense to compare and talk about the differences in predicted y at different values of x's? For example, the difference in margins, at(x1=0 x2=0 x3=0 x4=0) and margins, at(x1=1 x2=0 x3=0 x4=0), giving us the difference in predicted y when all observations are set to x5=1 vs when all are set to x1=1 and if this difference is statistically significant? I really appreciate your time!
Comment

Daniel Schaefer

Join Date: Mar 2020
Posts: 814

06 Aug 2024, 08:59

If instead of

logit carown p_educ0 p_educ1 p_educ2 p_educ3 p_educ4 p_educ5

I did run a model excluding say p_educ3 (probability of college degree) such as --

logit carown p_educ0 p_educ1 p_educ2 p_educ4 p_educ5

Is margins still giving me the predicted probability of car ownership in my full dataset based on the limited predictors? The output doesn't change so, I think that is correct. It's only the variable that has been omitted that changed.

I can see why you might think that should be the case here because of the perfect multicollinearity between the predictors. If you exclude any one of the predictors, since they are perfectly multicollinear, you effectively still have all of the same information. Just to be clear, this would not work in general, it only works because you are changing the excluded category for a perfectly collinear set of predictors.

I did an experiment to see if the results match up. Notice that there are sometimes very small differences in the results that you can't necessarily see from the rounded output. However, after 1000 iterations, I did not find a single difference in the predicted probability greater than or equal to 0.00001.

Code:

local trials = 1000
forv i = 1/`trials' {
    clear
    qui set obs 1000
    gen x1 = runiform(0, 1)
    gen x2 = runiform(0, 1 - x1)
    gen x3 = runiform(0, 1 - (x1 + x2))
    gen x4 = runiform(0, 1 - (x1 + x2 + x3))
    gen x5 = 1 - (x1 + x2 + x3 + x4)
    gen sum = x1 + x2 + x3 + x4 + x5
    gen y = runiform() > 0.5
    assert sum == 1
    * In this model, x5 is excluded
    qui logit y x1 x2 x3 x4 x5
    qui margins
    local model_1_const = r(b)[1,1]
    * In this model, x3 is excluded
    qui logit y x1 x2 x4 x5
    qui margins
    local model_2_const = r(b)[1,1]
    display as text "Iteration: " `i' " Model 1: " `model_1_const' " Model 2: " `model_2_const'
    if !(`model_1_const' == `model_2_const') {
        display as error "Difference of " `model_1_const' - `model_2_const' " detected on iteration " `i' "."
        assert `model_1_const' - `model_2_const' < 0.00001
    }
}

Here are the first 20 iterations.

Code:

Iteration: 1 Model 1: .501 Model 2: .501
Iteration: 2 Model 1: .486 Model 2: .486
Iteration: 3 Model 1: .472 Model 2: .472
Difference of 5.551e-17 detected on iteration 3.
Iteration: 4 Model 1: .48500001 Model 2: .48500001
Difference of 1.110e-16 detected on iteration 4.
Iteration: 5 Model 1: .51 Model 2: .51
Iteration: 6 Model 1: .516 Model 2: .516
Iteration: 7 Model 1: .507 Model 2: .507
Iteration: 8 Model 1: .497 Model 2: .497
Difference of 1.110e-16 detected on iteration 8.
Iteration: 9 Model 1: .466 Model 2: .466
Iteration: 10 Model 1: .514 Model 2: .514
Iteration: 11 Model 1: .5 Model 2: .5
Iteration: 12 Model 1: .523 Model 2: .523
Iteration: 13 Model 1: .492 Model 2: .492
Iteration: 14 Model 1: .539 Model 2: .539
Iteration: 15 Model 1: .518 Model 2: .518
Iteration: 16 Model 1: .483 Model 2: .483
Iteration: 17 Model 1: .492 Model 2: .492
Iteration: 18 Model 1: .48 Model 2: .48
Difference of 2.776e-16 detected on iteration 18.
Iteration: 19 Model 1: .482 Model 2: .482
Iteration: 20 Model 1: .496 Model 2: .496

Comment

Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#10

06 Aug 2024, 09:03

Emily, regarding #8, I don't think it matters that the model terms are not related, at least insofar as we want to show that the predicted probabilities under the model are different/the same depending on the parameters we give margins. It is possible to randomly generate these variables with partial correlations, but it is a bit of a pain to do in practice.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#11

06 Aug 2024, 09:07

By the way, this is why we typically ask posters to provide example data taken from your dataset using the -dataex- command. It makes it much easier to test things out on our end and to see what is going on with the data. When I randomly generate data I have to make a series of assumptions about what your data look like that may or may not be valid. Data examples with -dataex- solve many of the problems associated with random data.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#12

06 Aug 2024, 09:33

Regarding your other questions from #6:

Can this then be interpreted as the predicted probability of car ownership assuming everyone in my sample has the probability of college degree set to 1 (since college degree is excluded from the model and p_educ* should sum to 1)?

I definitely see your reasoning here. Suppose someone had educ 1-4 equal to 0. The sum of educ 1-5 must equal 1, therefore educ 5 must equal 1. Makes sense to me.

compares the predicted probability of car ownership between a and b
where a is if everyone in my dataset had a probability of college degree set 1, and
b is if everyone in my dataset had a probability of high schol degree set to 1. Is this correct?

I still have to object to the "everyone in my dataset" language, especially in this context where you are using the at() option. Remember, to get predicted probabilities you ultimately just take your logit regression equation, plug in a number for every x term, get a linear prediction, then transform the prediction to the probability scale using the appropriate link function. When you calculate margins without the at() option you go through every valid observation, plug in each x value for that specific observation into the logit equation, calculate a prediction, then take the average prediction over the whole dataset. When you use the at() option and say such and such variable equals such and such value, you are setting the corresponding x equal to whatever constant value you give margins. If you set a value for every independent variable in your model, there isn't any reason to average over your whole dataset: you just plug in the specific x values you gave the command into the logit equation and calculate the predicted probability once. Of course, the model coefficients are going to be based on all of the relevant information in your dataset every time, but fundamentally, the margins command plugs numbers into the regression equation (or sometimes an estimate of its derivative) to get predicted values. If you give it all of the numbers to plug in, it doesn't have to do any averaging over your entire dataset.
Comment

Announcement