Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • using margins and if in model with many interactions creates different predicted outcome?

    Hi all, sorry if this has been asked before, I couldn't find any reference to this issue. Also, I can't share any of my data for confidentially reasons (sorry!) I can tell you I have just over 7 million observations of individuals from a census. I am using the most recent updated version of Stata 18.

    THE SETUP: I am running a large Mincer regression with many interactions. The idea is to control for race, immigration status, sex, and their interaction terms
    Code:
    #delimit ;
    reg log_income
    c.log_yrs_school##i.black##i.immigrant##i.female
    c.log_experience##i.black##i.immigrant##i.female  
    c.log__experience_sqrd##i.black##i.immigrant##i.female
    if age > 15 ;
    I then use margins and marginsplot to display predicted log_income on the y-axis and either log_yrs_school or log_experience on the x-axis. I create multiple figures that have only two series: immigrant vs. non-immigrants for each race-sex cohort (ie, black female immigrants vs. black female non-immigrants or white male immigrants v. white male non-immigrants). Currently, I've done this using " if" commands in a loop:
    Code:
    foreach VAR in yrs_school experience{
      forvalues FEMALE = 0/1{
        forvalues BLACK = 0/1{
          margins black#female#immigrant if black == `BLACK' & female == `FEMALE', at(log`VAR'=(``VAR'logs') )
          maringsplot, name(`VAR'_f`FEMALE'_,b`BLACK',replace)
        }
      }
    }
    THE ISSUE: If I run the same regressions and then run margins without the if command, I get different predicted values. Any thoughts/advice would be very much appreciated!

    Thanks in advance!
    Jerome

  • #2
    Yes, of course you get different results. The commands operate on different data samples. Without the -if- conditions, the entire dataset is used to calculate the predicted values. The predicted values for, for example, BLACK and FEMALE in this approach are calculated by (temporarily) setting black to BLACK and female to FEMALE in every observation in the data set and then getting observation level predictions, and averaging them.

    By contrast, when you impose the -if- conditions, the analysis is performed using only the subset of the data that consists of BLACK FEMALEs.

    Because it is likely that the distributions of the other variables in the model differ based on the values of black and female, this means that in the -if- condition, your results are not fully adjusted. The method without the -if- conditions provides fully adjusted results.

    Comment


    • #3
      Clyde Schechter this make a lot of sense. Thank you so much for your help!

      Comment


      • #4
        Hi,
        I want to exclude female participants (coded as 2 in the gender variable) who have an average energy intake higher than 14,644 kJ or lower than 2,092 kJ. I attempted to use the following code:
        drop if (gender == 2 & average_energy > 14644) | (gender == 2 & average_energy < 2092)
        However, this code excludes all females, which indicates that the second part of the condition (related to energy intake) might not be implemented correctly. Could you please check the code and guide me?

        Comment


        • #5
          Originally posted by Manije Darooghegi View Post
          Hi,
          I want to exclude female participants (coded as 2 in the gender variable) who have an average energy intake higher than 14,644 kJ or lower than 2,092 kJ. I attempted to use the following code:
          drop if (gender == 2 & average_energy > 14644) | (gender == 2 & average_energy < 2092)
          However, this code excludes all females, which indicates that the second part of the condition (related to energy intake) might not be implemented correctly.
          Your question has little relation with the topic addressed in this thread. In future, please start a new thread. There does not appear to be anything wrong with your code. It should do what you ask for as the below illustrates. If this is not helpful, start a new thread and provide a data example that replicates the problem. See FAQ Advice #12 on how to do so, or

          Code:
          help dataex

          Code:
          clear
          input float(gender average_energy)
          1   15000
          2   20000
          1   3500
          2   5000
          2   1500
          end
          
          list
          drop if (gender == 2 & average_energy > 14644) | (gender == 2 & average_energy < 2092)
          list
          Res.:

          Code:
          . list
          
               +-------------------+
               | gender   averag~y |
               |-------------------|
            1. |      1      15000 |
            2. |      2      20000 |
            3. |      1       3500 |
            4. |      2       5000 |
            5. |      2       1500 |
               +-------------------+
          
          . 
          . drop if (gender == 2 & average_energy > 14644) | (gender == 2 & average_energy < 2092)
          (2 observations deleted)
          
          . 
          . list
          
               +-------------------+
               | gender   averag~y |
               |-------------------|
            1. |      1      15000 |
            2. |      1       3500 |
            3. |      2       5000 |
               +-------------------+

          Comment

          Working...
          X