Margins "asobserved" vs. "atmeans" resulting in around 1000X difference in predicted probability

Haka Nener

Join Date: Apr 2018

Posts: 4
#1

Margins "asobserved" vs. "atmeans" resulting in around 1000X difference in predicted probability

29 Apr 2018, 11:25

While learning how to use the margins command, I could not make sense of the following even after reading documentation (Help files, Stata Journal articles) and past forum entries containing explanations from experts including Richard A. Williams and Clyde Schechter, and I seek your help.

I have a regression model similar to this one, using real data in the field of innovation:

Code:

Logit y x xsquared controlvar1 controlvar2, cluster(id) robust

where x is a continuous variable bounded between zero and five.

After running that logit model, I ran two margins commands in order to see what the predicted probability of “y” was at alternative values of x. In order to calculate adjusted predictions at representative values (using the term mentioned by Richard A. Williams at this link: http://www3.nd.edu/~rwilliam/xsoc73994/Margins01.pdf ) I used the following code:

Code:

Margins, at(x=(0(0.1)5))

I was able to obtain results, but was puzzled when the results displayed in the result column “Margin” at alternative levels of "x" turned out to be more than 100 times, and in some cases more than 1000 times, larger at the same level of "x" compared to the results I obtained after running the next piece of code following the same regression model shown earlier:

Code:

Margins, at(x=(0(0.1)5)) atmeans

I’m aware that the former margins code uses the default “asobserved” whereas the latter uses “atmeans” but I could not make sense of how the results from the former, at any given level of x, could be more than 100 times to 1000 times greater than the results obtained from the latter. Before going any further, I wanted to ask whether comparing the results from the two margins commands above makes sense at all, and whether differences could really be as large as the ones I’m observing in my comparison.

Just to give you a sense of the magnitude of differences, the sample-wide average for “y” in my data is 0.10. That is the same result that I get when I do:

Code:

margins

But I get a result of 0.00006 after the following:

Code:

margins, atmeans

Thank you for helping me to make sense of this.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30150
#2

29 Apr 2018, 13:16

[quote]Logit y x xsquared controlvar1 controlvar2, cluster(id) robust[/code]
This model is "dead on arrival." I imagine from your choice of variable name that your intention is the xsquared = x². But -margins- has no way of knowing that: it does not speak English. -margins- thinks this is just any other variable, so when it evaluates the margins at your various values of x, or atmeans, it treats x and xsquared as completely unrelated variables. So none of its results are correct. You need to redo this model using factor variable notation. -margins- only works correctly when the original regression is coded with correct factor variable notation.

Code:

logit y c.x##c.x controlvar1 controlvar2, cluster(id) robust

As an aside, you don't need to say -robust- when you have already said -cluster(id)-. -cluster()- implies -robust-. Actually, if you are using current Stata, the preferred way to say you want id-cluster robust standard errors is -vce(cluster id)-.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

29 Apr 2018, 14:09

Let me add that Clyde's explanation in post #2 is remarkably similar to Richard William's discussion of Factor Variables around Example 2 in the PDF referred to in post #1. For example

... Because it does not know that age and age2 are related, it uses the mean value of age2 in its calculations, rather than the correct value of 70 squared.

It appears you focused on the part of that PDF about various forms of the command without having grasped the basics. Since you are new to the margins command, let me say that many members here will agree that you cannot find better place to start on margins than with a careful reading of the PDF. I'll also note that Margins01.pdf is followed by Margins02.pdf ... Margins05.pdf covering more specialized topics.
1 like
Comment
Haka Nener

Join Date: Apr 2018

Posts: 4
#4

30 Apr 2018, 01:31

Thank you for your guidance. After following Clyde Schechter's and WIlliam Lisowski's advice, and reading R. A. William's "Margins" files up to and including Margins05, I'm still facing the same question of whether margins with "asobserved" can really yield a result more than hundreds of times greater compared to margins with "atmeans."

I have used factor notation in the regression, in two ways:

Code:

logit y c.x##c.x controlvar1 controlvar2, vce(cluster id)

and

Code:

logit y c.x c.x##c.x controlvar1 controlvar2, vce(cluster id)

In both cases, the result I get from

Code:

margins

is hundreds of times greater than the result I get from:

Code:

margins, atmeans

I cannot make sense of this.

Code (using the same regression command recommended by Clyde Schechter) and results (with asobserved vs. with atmeans) are as follows:

Code:

logit y c.x##c.x controlvar1 controlvar2, vce(cluster id)

Code:

margins

Code:

margins, atmeans
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5012
#5

30 Apr 2018, 05:52

We need to know more about the data. It may be 100s of times greater but then again it is less than .1 greater. In any event, I would probably trust asobserved more than atmeans.

Maybe you could show the summary stats for your variables.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment

Haka Nener

Join Date: Apr 2018
Posts: 4

30 Apr 2018, 06:46

Thank you for helping me to choose between margins "asobserved" and "atmeans." I intend to trust "asobserved" more, as Richard Williams suggested.

The summary stats of my regression sample (with _N=300) is as follows, in case it may help to explain the differences between the results I shared in my previous post:

Variable	Mean	Std. Dev.	Min	Max
y	0.10	0.30	0.00	1.00
c.x	1.97	1.45	0.00	4.44
c.x#c.x	5.97	5.63	0.00	19.74
control1	0.17	0.21	0.00	0.90
control2	3.62	1.37	0.00	5.74
control3	0.87	1.33	0.00	7.00
control4	2.55	0.28	1.79	3.52
control5	93.10	86.24	2.00	303.00
control6	18.34	14.01	3.00	28.00
control7	1.09	1.46	0.00	6.00
control8	3.84	3.70	0.00	16.00
control9	11.44	4.72	3.00	29.00
control10	0.76	0.89	0.00	3.00
control11	2.17	1.26	1.00	6.00
control12	1994.76	3.22	1986.00	2000.00
control13	0.42	0.73	0.00	5.70
control14	7.60	3.76	1.00	15.00
control15	4.36	1.54	0.81	7.34
control16	6.97	9.57	0.00	52.00
control17	5.11	0.75	3.21	6.79
control18	39.88	28.35	0.40	168.43
dummy1	0.28	0.45	0	1
dummy2	0.58	0.49	0	1
dummy3	0.94	0.23	0	1
dummy4	0.06	0.23	0	1
dummy5	0.06	0.24	0	1
dummy6	0.07	0.25	0	1
dummy7	0.13	0.33	0	1
dummy8	0.20	0.40	0	1
dummy9	0.08	0.26	0	1
dummy10	0.11	0.31	0	1
dummy11	0.04	0.19	0	1

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30150
#7

30 Apr 2018, 09:02

Well, I see two variables, control5 and control18 where the standard deviation is close to the mean, and the maximum value is several times the mean. So these variables have very skew distributions. It may well be that the observations with large values of these variables are pulling the -margins, asobserved- results way up, particularly if they have appreciably large positive coefficients.
2 likes
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5012
#8

30 Apr 2018, 17:53

I can't 100% swear to this, but if all you do is type -margins- after logit I think you get the mean of y. For example,

Code:

webuse nhanes2f, clear logit diabetes i.female weight height i.race margins estat sum

That seems to be true in your case; any slight difference between what margins and sum gave you may be due to missing data cases getting dropped in the logit.

I don't have your data or your logit results. Clyde may be on the right track with his idea.

In any event, it is not unusual for the asobserved and atmeans approaches to yield somewhat different estimates. I probably wouldn't worry about it, unless you have other reasons for thinking your model or data may be off.

As a sidelight, you have all these dummy variables. If you computed these from some multi-category variable, you should drop them and use factor variable notation with the original variable.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30150
#9

30 Apr 2018, 18:00

Richard is, essentially, correct when he says that -margins- by itself will just produce the mean of y. What you actually get is the mean of yhat, where yhat = invlogit(xb). But the way maximum likelihood estimation works, the mean of yhat will be equal to the mean of y except possibly for some numerical error.

But -margins, atmeans- will bet you the mean of yhat*, where yhat* = invlogit(xb calculated with all x's set to their means.) That will, in general, be a different number, and in the presence of skew distributions for some of the x's, the difference can be quite large.

And I agree with Richard that it would be better to use factor variable notation here, rather than homebrew indicator variables. For the commands shown so far, the results will be the same, but if we move on to, say, marginal effects, the current approach will produce incorrect results that are remedied by the use of factor-variable notation.
Comment
Haka Nener

Join Date: Apr 2018

Posts: 4
#10

01 May 2018, 05:12

Thank you all for your input. I hope this thread will be helpful for other researchers, too.
Comment
alex soton

Join Date: May 2017

Posts: 10
#11

15 Apr 2019, 10:57

I hope someone can answer this question: I am slightly confused to as what exactly as observed means for those variables whose values are not specified with at() command? I understand that the sample left 'atmeans' signified the mean of each independent variable left. Anyone?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30150
#12

15 Apr 2019, 18:09

So let's review what -margins- does. -margins- applies -predict- to calculate some statistic (a predicted value of xb or mu, or whatever, or perhaps a marginal effect) for each observation in your sample, and then averages those values and reports the average. The differences between -at()-, -atmeans-, and -atobserved- have to do with what Stata does to your data sample before applying -predict-. With variables specified in an -at()- option (not command), those variables are first reset to the particular value(s) specified in -at()-. With -atmeans- those variables are first reset to their mean values. And with -atobserved-, the variables are not reset at all: they are left as they were in the original data sample.
2 likes
Comment
Bryan Sayer

Join Date: Mar 2015

Posts: 19
#13

16 Apr 2019, 08:37

It is also worth noting "for what population" are the margins calculated? The original estimation from Korn and Graubard estimated the predicted value over the entire sample (population) at the specified value. Thus it is a simulation of a hypothetical state of being. Some options in Stata (such as over?) might change the population in the estimation. I've noticed that people seem to miss this aspect, and present margins as though it is the adjusted observed value, rather than a hypothetical one. Since all the various values are generally very close (when done properly, like with factor notation) it can be easy to miss the subtle difference between the margin for factor X=x versus the predicted value for factor X=x.

In fact, I am thinking about writing something along these line, and consequently I'm searching for what has already been written. Beyond the fine documentation, is there anything I should review that discusses marginal estimation, and particularly how it should be described in words?
Comment
Romano Tarsia

Join Date: Jul 2022

Posts: 20
#14

15 Nov 2023, 09:38

Dear Clyde Schechter, thanks a lot for your explanation in post #12 - 4.5 years later it is still extremely useful. I understand your point when it is related to the variable of interest in our analysis (e.g. the one we want to estimate) - "tempjan" in my example below. However, I am struggling to understad the difference between predicting "at mean" and "at as observed" when related to controls, "tempjuly" in the example below. Let me use the following example to clarify my question:

Code:

graph drop _all sysuse citytemp.dta, clear reghdfe heatdd c.tempjan##c.tempjan c.tempjuly##c.tempjuly, absorb(region) cluster(region) * 1) Predictions "asobserved" margins, at(tempjan = (0(1)70) (asobserved) _all) level(95) marginsplot, name(resp_marg) ytitle("") * 2) Predictions "at mean" margins, at(tempjan = (0(1)70) (mean) _all) level(95) marginsplot, name(resp_marg_mean) ytitle("") sum tempjuly, d // mean tempjuly is 75.05377

From my understanding, syntax 2 (at mean) calculates predictions for each level of tempjan = (0(1)70), fixing tempjuly at his mean (75.05377). That is, adding to the main prediction of tempjan the following terms:

_b[c.tempjuly](mean(c.tempjuly)) + _b[c.tempjuly#c.tempjuly](mean(c.tempjuly#c.tempjuly))

Am I correct?

What I am struggling to understand though is what do we multiply the estimates by in syntax 1. Basically, what replaces the question mark:

_b[c.tempjuly](?) + _b[c.tempjuly#c.tempjuly](?)

Thank you in advance.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30150
#15

15 Nov 2023, 10:07

To understand the difference you have to think of -margins- as a wrapper for -predict-, which, in fact, it is. In syntax 2, where tempjuly is constrained to its mean value, Stata will -predict- heatdd in the usual way, except that first it will go through the data set and replace the observed values of tempjuly by the mean value (75.05377) in every observation. Then the -predict-ed values are averaged, and that average is reported. (Stata then restores the original values of tempjuly once that is all done.)

In syntax 1, Stata will -predict- heatdd in the usual way without changing anything in the data set. And then the -predict-ed values are averaged, and that average is reported.

So the answer to your question as you phrased it is that in each individual observation, your ? is replaced by the actual value of tempjuly in that observation in the data set.
Comment

Announcement

Margins "asobserved" vs. "atmeans" resulting in around 1000X difference in predicted probability

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment