Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to recode a variable to include as an interaction

    Hello.

    I am trying to code one of my variables in a way that would allow me to include an interaction term. I have two variables: race and the proportion of the population for each specific race. For e.g. if I am white in the dataset, the proportion variable would be the % of white population in my area.

    The race variable has four categories: white, black, asian, mixed The proportion measure was coded from continuous to categorical (<30% or >30%)

    My problem is that for some race-prop combinations there are no observations. For the whites, the prop measure ranges from 33%-97%, for the blacks it ranges from 0%-30%. I am not sure how to recode the variable so that I have some observations in each category, that way when I include the interaction in the model, I actually get an estimate for each group.

    I have tried keeping the prop variable as continuous, but the plot produced using margins does not look quite right (I think because some race categories don’t have observations). I realise one way would be to combine race categories, but If I could avoid doing that I would rather keep the three distinct race categories separate.

  • #2
    For many if not most statistical purposes going from continuous to categorical was a backward step. My advice is that to solve this problem you don't do that in the first place.

    It could be that there is gain in working with e.g. logit of each proportion.

    Comment


    • #3
      Sherine:
      I'm probably missing out on something, but the proportion of each level pf -i.race- is already embedded in this variable, as it works as 0/1 for each level.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thank you for the Advice Nick. Would you mind explaining what you mean by logit of each proportion. The proportion variable is continuous from 0 - 1, do you mean log transforming the variable?

        Thank you Carlo. The proportion variable is based on census data whereas my data (including race) is based on admin data (subsample of the population), so I don't think it would be embedded since I'm using two different data sources?

        Comment


        • #5
          No; I mean logit of a proportion. The logit is log [p / (1 - p)] for proportion p.

          Code:
          twoway function logit(x)
          The function is indeterminate for arguments 0 and 1. Exact 1s seem unlikely in your context, but perhaps not exact 0s.

          Comment


          • #6
            I see, thanks for clarification Nick.

            Comment


            • #7
              Originally posted by Sherine Maui View Post
              <snip>

              Thank you Carlo. The proportion variable is based on census data whereas my data (including race) is based on admin data (subsample of the population), so I don't think it would be embedded since I'm using two different data sources?
              Sherine:
              I see the issue.
              Thanks for clarifying.

              Kind regards,
              Carlo
              (Stata 19.0)

              Comment

              Working...
              X