using margins and if in model with many interactions creates different predicted outcome?

Jerome Lyons

Join Date: Mar 2023

Posts: 13
#1

using margins and if in model with many interactions creates different predicted outcome?

17 Jul 2024, 16:18

Hi all, sorry if this has been asked before, I couldn't find any reference to this issue. Also, I can't share any of my data for confidentially reasons (sorry!) I can tell you I have just over 7 million observations of individuals from a census. I am using the most recent updated version of Stata 18.

THE SETUP: I am running a large Mincer regression with many interactions. The idea is to control for race, immigration status, sex, and their interaction terms

Code:

#delimit ; reg log_income c.log_yrs_school##i.black##i.immigrant##i.female c.log_experience##i.black##i.immigrant##i.female c.log__experience_sqrd##i.black##i.immigrant##i.female if age > 15 ;

I then use margins and marginsplot to display predicted log_income on the y-axis and either log_yrs_school or log_experience on the x-axis. I create multiple figures that have only two series: immigrant vs. non-immigrants for each race-sex cohort (ie, black female immigrants vs. black female non-immigrants or white male immigrants v. white male non-immigrants). Currently, I've done this using " if" commands in a loop:

Code:

foreach VAR in yrs_school experience{ forvalues FEMALE = 0/1{ forvalues BLACK = 0/1{ margins black#female#immigrant if black == `BLACK' & female == `FEMALE', at(log`VAR'=(``VAR'logs') ) maringsplot, name(`VAR'_f`FEMALE'_,b`BLACK',replace) } } }

THE ISSUE: If I run the same regressions and then run margins without the if command, I get different predicted values. Any thoughts/advice would be very much appreciated!

Thanks in advance!
Jerome
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

17 Jul 2024, 17:07

Yes, of course you get different results. The commands operate on different data samples. Without the -if- conditions, the entire dataset is used to calculate the predicted values. The predicted values for, for example, BLACK and FEMALE in this approach are calculated by (temporarily) setting black to BLACK and female to FEMALE in every observation in the data set and then getting observation level predictions, and averaging them.

By contrast, when you impose the -if- conditions, the analysis is performed using only the subset of the data that consists of BLACK FEMALEs.

Because it is likely that the distributions of the other variables in the model differ based on the values of black and female, this means that in the -if- condition, your results are not fully adjusted. The method without the -if- conditions provides fully adjusted results.
1 like
Comment
Jerome Lyons

Join Date: Mar 2023

Posts: 13
#3

17 Jul 2024, 18:04

Clyde Schechter this make a lot of sense. Thank you so much for your help!
Comment
Manije Darooghegi

Join Date: Jul 2024

Posts: 1
#4

27 Jul 2024, 05:58

Hi,
I want to exclude female participants (coded as 2 in the gender variable) who have an average energy intake higher than 14,644 kJ or lower than 2,092 kJ. I attempted to use the following code:
drop if (gender == 2 & average_energy > 14644) | (gender == 2 & average_energy < 2092)
However, this code excludes all females, which indicates that the second part of the condition (related to energy intake) might not be implemented correctly. Could you please check the code and guide me?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10180
#5

27 Jul 2024, 08:45

Originally posted by Manije Darooghegi View Post

Hi,
I want to exclude female participants (coded as 2 in the gender variable) who have an average energy intake higher than 14,644 kJ or lower than 2,092 kJ. I attempted to use the following code:
drop if (gender == 2 & average_energy > 14644) | (gender == 2 & average_energy < 2092)
However, this code excludes all females, which indicates that the second part of the condition (related to energy intake) might not be implemented correctly.

Your question has little relation with the topic addressed in this thread. In future, please start a new thread. There does not appear to be anything wrong with your code. It should do what you ask for as the below illustrates. If this is not helpful, start a new thread and provide a data example that replicates the problem. See FAQ Advice #12 on how to do so, or

Code:

help dataex

Code:

clear input float(gender average_energy) 1 15000 2 20000 1 3500 2 5000 2 1500 end list drop if (gender == 2 & average_energy > 14644) | (gender == 2 & average_energy < 2092) list

Res.:

Code:

. list +-------------------+ | gender averag~y | |-------------------| 1. | 1 15000 | 2. | 2 20000 | 3. | 1 3500 | 4. | 2 5000 | 5. | 2 1500 | +-------------------+ . . drop if (gender == 2 & average_energy > 14644) | (gender == 2 & average_energy < 2092) (2 observations deleted) . . list +-------------------+ | gender averag~y | |-------------------| 1. | 1 15000 | 2. | 1 3500 | 3. | 2 5000 | +-------------------+
Comment

Announcement

using margins and if in model with many interactions creates different predicted outcome?

Comment

Comment

Comment

Comment