How to recode a variable to include as an interaction

Sherine Maui

Join Date: Apr 2018

Posts: 90
#1

How to recode a variable to include as an interaction

07 Feb 2023, 08:36

Hello.

I am trying to code one of my variables in a way that would allow me to include an interaction term. I have two variables: race and the proportion of the population for each specific race. For e.g. if I am white in the dataset, the proportion variable would be the % of white population in my area.

The race variable has four categories: white, black, asian, mixed The proportion measure was coded from continuous to categorical (<30% or >30%)

My problem is that for some race-prop combinations there are no observations. For the whites, the prop measure ranges from 33%-97%, for the blacks it ranges from 0%-30%. I am not sure how to recode the variable so that I have some observations in each category, that way when I include the interaction in the model, I actually get an estimate for each group.

I have tried keeping the prop variable as continuous, but the plot produced using margins does not look quite right (I think because some race categories don’t have observations). I realise one way would be to combine race categories, but If I could avoid doing that I would rather keep the three distinct race categories separate.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35667
#2

07 Feb 2023, 09:27

For many if not most statistical purposes going from continuous to categorical was a backward step. My advice is that to solve this problem you don't do that in the first place.

It could be that there is gain in working with e.g. logit of each proportion.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17706
#3

07 Feb 2023, 09:27

Sherine:
I'm probably missing out on something, but the proportion of each level pf -i.race- is already embedded in this variable, as it works as 0/1 for each level.

Kind regards,
Carlo
(Stata 19.0)
Comment
Sherine Maui

Join Date: Apr 2018

Posts: 90
#4

07 Feb 2023, 09:41

Thank you for the Advice Nick. Would you mind explaining what you mean by logit of each proportion. The proportion variable is continuous from 0 - 1, do you mean log transforming the variable?

Thank you Carlo. The proportion variable is based on census data whereas my data (including race) is based on admin data (subsample of the population), so I don't think it would be embedded since I'm using two different data sources?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35667
#5

07 Feb 2023, 09:52

No; I mean logit of a proportion. The logit is log [p / (1 - p)] for proportion p.

Code:

twoway function logit(x)

The function is indeterminate for arguments 0 and 1. Exact 1s seem unlikely in your context, but perhaps not exact 0s.
Comment
Sherine Maui

Join Date: Apr 2018

Posts: 90
#6

07 Feb 2023, 10:22

I see, thanks for clarification Nick.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17706
#7

08 Feb 2023, 02:18

Originally posted by Sherine Maui View Post

<snip>

Thank you Carlo. The proportion variable is based on census data whereas my data (including race) is based on admin data (subsample of the population), so I don't think it would be embedded since I'm using two different data sources?

Sherine:
I see the issue.
Thanks for clarifying.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

How to recode a variable to include as an interaction

Comment

Comment

Comment

Comment

Comment

Comment