Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • levels of a 3 level categorical levels not showing in regression model

    Hello

    I'm using stata 14. I have data from experiments that I am analyzing. I have the following variables:

    1. DV: math score
    2. IV 1: treatment group. Variable name = newtreat: 1 control, 2 experimental groups (standard priming = trt1; enhanced priming = trt2)
    2. IV2: gender
    3. Moderator variable (continuous) = regfocus_ctr
    4. Covariates: pretest (tims), also a continuous variable

    The data were from 2 schools; The experimental group in each school was different (sch1 had std prime (ST); sch2= enhanced (STfit)) so I had to create IV1, which I did as follows

    Code:
    gen ctr = 0 if treat==0 
    tab ctr
    
    gen ST = 1 if treat==1 & school==3
    tab ST
    
    gen STfit = 2 if treat==1 & school==4
    egen newtreat = rowmax(ctr ST STfit)
    tab newtreat
    I checked everything out and the numbers within each condition match. The new variable, newtreat is also coded (0, 1, 2).

    But when I use this variable in a basic anova or regression-- you would expect to see values for treat1 and treat2; but i only get values for treat2(enhanced). I'm not sure what is going on.

    I ran a nested reg- but before doing so; i created dummy variables for newtreat- to indicate it had 3 levels. I also created 2 and 3 way interaction terms between the 2 categorical IVs (gender and treatment) and the one continuous variable. I did that as follows:

    Code:
    generate newtrt1 = (newtreat==1)
    generate newtrt2 = (newtreat==2)
    
    generate newtrt1_gender = newtrt1*gender
    generate newtrt2_gender = newtrt2*gender
    
    generate newtrt1_regfoc = newtrt1*regfocus_ctr
    generate newtrt2_regfoc = newtrt2*regfocus_ctr
    
    generate gender_regfoc = gender*regfocus_ctr
    Then I ran the nest reg as follows:

    Code:
     nestreg : regress psatscore (tscore) (newtrt1 newtrt2 gender regfocus_ctr) ///
              (newtrt1_gender newtrt2_gender newtrt1_regfoc newtrt2_regfoc gender_regfoc) ///
              (newtrt1_gender_regf newtrt2_gender_regf), beta
    The regression model gives me values for covariate, main effects for gender, newtrt2 and moderator (regfocus_ctr) (notice-- i dont get any main effect for newtrt1 here)**
    It also goes ahead and gives me interaction effect values that have newtrt2 (but not newtrt1)- same for the 3 way interaction. What am I doing wrong? Here is the output for the final model

    HTML Code:
     Block  4: newtrt2_gender_regf
    
          Source |       SS           df       MS      Number of obs   =       142
    -------------+----------------------------------   F(9, 132)       =     16.63
           Model |  1025.35296         9  113.928106   Prob > F        =    0.0000
        Residual |  904.196339       132  6.84997226   R-squared       =    0.5314
    -------------+----------------------------------   Adj R-squared   =    0.4994
           Total |   1929.5493       141  13.6847468   Root MSE        =    2.6172
    
    -------------------------------------------------------------------------------------
              psatscore |      Coef.   Std. Err.      t    P>|t|                     Beta
    --------------------+----------------------------------------------------------------
                 tscore |    .702169   .1021284     6.88   0.000                 .5511338
                newtrt2 |  -.9413409   1.717568    -0.55   0.585                -.1266531
              timztreat |   .2881621   .1619988     1.78   0.078                 .4076952
                 gender |   .2058176   .6248194     0.33   0.742                 .0275799
           regfocus_ctr |   .4366071    .557428     0.78   0.435                 .1169767
         newtrt2_gender |   .2685316    .945092     0.28   0.777                 .0307676
         newtrt2_regfoc |  -.7833509   .7785595    -1.01   0.316                -.1403718
          gender_regfoc |  -.7271432   .6678049    -1.09   0.278                -.1561166
    newtrt2_gender_regf |   1.754673   .9740478     1.80   0.074                 .2431445
                  _cons |   4.444787   .9941791     4.47   0.000                        .
    -------------------------------------------------------------------------------------
    Even just a basic anova doesnt partition the newtreat variable (I would expect to see newtrt1 & newtrt2)-- but it just gives me an overall value for newtreat

    Code:
     anova psat i.newtreat##i.gender
    I've done similar analyses before (where I had to combine 2 variables-- gender and school type- to give me boys only, girls in coed, girls in same sex school)-- and everything worked out well without a hitch.


    I'm not sure what is going on here and I hope to get some insight by posting here. Any help would be greatly appreciated. Thanks!

    ​​​​​​​Katherine Picho

  • #2
    Well, let's take a look at how you created these variables:

    Code:
    gen ctr = 0 if treat==0
    tab ctr
    
    gen ST = 1 if treat==1 & school==3
    tab ST
    
    gen STfit = 2 if treat==1 & school==4
    egen newtreat = rowmax(ctr ST STfit)
    
    tab newtreat
    So, in those observations where school == 4 & treat == 1, newtreat will be 2. In those observations where school == 3 and treat == 1, newtreat will be 2. And if neither of the preceding conditions holds, and if treat == 0, then newtreat will be 0. But if, for example, you have an observation that has treat = 1 and school is something other than 3 or 4, then newtreat will just have a missing value. Is that what you intend? It sounds like most of your values of newtreat will be missing, since only under some fairly restrictive conditions (that are not logically exhaustive of the possibilities) do you get a 0/1/2 value of newtreat. It is plausible that when you then remember that observations of missing values of any of the regression variables (including psat) are excluded from the analysis, it may just be that the (few, I'm guessing) observations with newtrt = 1 (or perhaps those with newtrt = 0) all have missing values on some other variables in the regression.

    So I'd run:

    Code:
    tab newtreat if e(sample)
    to see what's going on here. I think you'll find that one of your anticipated values of newtreat doesn't actually occur when missingness of other variables is taken into account.

    Comment


    • #3
      Hello Clyde

      Thank you for responding. There are only 2 schools in the dataset- and they are coded 3 & 4. There are no missing values for school. newtreat also has no missing values. When I tab newtreat- i get what I expect-- and the numbers correspond to the exact number of participants in each condition. See below:
      HTML Code:
      . tab newtreat
      
          3 level |
        indicator |
        treatment |
         variable |      Freq.     Percent        Cum.
      ------------+-----------------------------------
          Control |        145       50.88       50.88
      Standard ST |         74       25.96       76.84
      ST Enhanced |         66       23.16      100.00
      ------------+-----------------------------------
            Total |        285      100.00
      without labels:
      HTML Code:
      . tab newtreat, nolabel
      
          3 level |
        indicator |
        treatment |
         variable |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |        145       50.88       50.88
                1 |         74       25.96       76.84
                2 |         66       23.16      100.00
      ------------+-----------------------------------
            Total |        285      100.00
      But you're on to something because here's the thing...when I do what you suggest (code you provided) I get 0 observations! What is this implying.... and how do I fix this? Thank you!

      HTML Code:
      . tab newtreat if e(sample)
      no observations

      Comment


      • #4
        Hi Clyde

        I re-read your message again and cross checked... school 3 had very few observations for the continuous moderator variable (regfocus)-- maybe n= 16 (out of a total N of 146 participants). SO your statement on missing data makes sense. Which means then I can only run the analysis with the school that has pretty much all the data available for the variables (yes some are missing but it's like 3-4 observations, nothing as substantive as missingness on moderator for school 3!!).

        Thank you for your help. If you have any more useful information to share, I'd be happy to soak it all in.

        Best,
        Katherine.

        p.s. I'd be interested to know (for future purposes) what that response (i.e. no observations) means....when i ran the code: tab newtreat if e(sample). Thank you!!

        Comment


        • #5
          I'd be interested to know (for future purposes) what that response (i.e. no observations) means....when i ran the code: tab newtreat if e(sample).
          It means that newtreat has a missing value in every observation that managed to get included in the estimation sample.

          By the way, an alternative to restricting your analysis to a single school is to include both schools but not use the variable regfocus in your model--since that seems to be the source of the bulk of the missing values. You would have to think about your research goals to decide which approach would be better.

          Comment


          • #6
            Hi Clyde

            Yes I like your idea....I have to review my research goals...but am glad this was sorted out. Thank you very much for your help!!!

            Comment

            Working...
            X