levels of a 3 level categorical levels not showing in regression model

Katherine Picho

Join Date: Apr 2014
Posts: 32

levels of a 3 level categorical levels not showing in regression model

01 May 2018, 14:33

Hello

I'm using stata 14. I have data from experiments that I am analyzing. I have the following variables:

1. DV: math score
2. IV 1: treatment group. Variable name = newtreat: 1 control, 2 experimental groups (standard priming = trt1; enhanced priming = trt2)
2. IV2: gender
3. Moderator variable (continuous) = regfocus_ctr
4. Covariates: pretest (tims), also a continuous variable

The data were from 2 schools; The experimental group in each school was different (sch1 had std prime (ST); sch2= enhanced (STfit)) so I had to create IV1, which I did as follows

Code:

gen ctr = 0 if treat==0 
tab ctr

gen ST = 1 if treat==1 & school==3
tab ST

gen STfit = 2 if treat==1 & school==4
egen newtreat = rowmax(ctr ST STfit)
tab newtreat

I checked everything out and the numbers within each condition match. The new variable, newtreat is also coded (0, 1, 2).

But when I use this variable in a basic anova or regression-- you would expect to see values for treat1 and treat2; but i only get values for treat2(enhanced). I'm not sure what is going on.

I ran a nested reg- but before doing so; i created dummy variables for newtreat- to indicate it had 3 levels. I also created 2 and 3 way interaction terms between the 2 categorical IVs (gender and treatment) and the one continuous variable. I did that as follows:

Code:

generate newtrt1 = (newtreat==1)
generate newtrt2 = (newtreat==2)

generate newtrt1_gender = newtrt1*gender
generate newtrt2_gender = newtrt2*gender

generate newtrt1_regfoc = newtrt1*regfocus_ctr
generate newtrt2_regfoc = newtrt2*regfocus_ctr

generate gender_regfoc = gender*regfocus_ctr

Then I ran the nest reg as follows:

Code:

 nestreg : regress psatscore (tscore) (newtrt1 newtrt2 gender regfocus_ctr) ///
          (newtrt1_gender newtrt2_gender newtrt1_regfoc newtrt2_regfoc gender_regfoc) ///
          (newtrt1_gender_regf newtrt2_gender_regf), beta

The regression model gives me values for covariate, main effects for gender, newtrt2 and moderator (regfocus_ctr) (notice-- i dont get any main effect for newtrt1 here)**
It also goes ahead and gives me interaction effect values that have newtrt2 (but not newtrt1)- same for the 3 way interaction. What am I doing wrong? Here is the output for the final model

HTML Code:

 Block  4: newtrt2_gender_regf

      Source |       SS           df       MS      Number of obs   =       142
-------------+----------------------------------   F(9, 132)       =     16.63
       Model |  1025.35296         9  113.928106   Prob > F        =    0.0000
    Residual |  904.196339       132  6.84997226   R-squared       =    0.5314
-------------+----------------------------------   Adj R-squared   =    0.4994
       Total |   1929.5493       141  13.6847468   Root MSE        =    2.6172

-------------------------------------------------------------------------------------
          psatscore |      Coef.   Std. Err.      t    P>|t|                     Beta
--------------------+----------------------------------------------------------------
             tscore |    .702169   .1021284     6.88   0.000                 .5511338
            newtrt2 |  -.9413409   1.717568    -0.55   0.585                -.1266531
          timztreat |   .2881621   .1619988     1.78   0.078                 .4076952
             gender |   .2058176   .6248194     0.33   0.742                 .0275799
       regfocus_ctr |   .4366071    .557428     0.78   0.435                 .1169767
     newtrt2_gender |   .2685316    .945092     0.28   0.777                 .0307676
     newtrt2_regfoc |  -.7833509   .7785595    -1.01   0.316                -.1403718
      gender_regfoc |  -.7271432   .6678049    -1.09   0.278                -.1561166
newtrt2_gender_regf |   1.754673   .9740478     1.80   0.074                 .2431445
              _cons |   4.444787   .9941791     4.47   0.000                        .
-------------------------------------------------------------------------------------

Even just a basic anova doesnt partition the newtreat variable (I would expect to see newtrt1 & newtrt2)-- but it just gives me an overall value for newtreat

Code:

 anova psat i.newtreat##i.gender

I've done similar analyses before (where I had to combine 2 variables-- gender and school type- to give me boys only, girls in coed, girls in same sex school)-- and everything worked out well without a hitch.

I'm not sure what is going on here and I hope to get some insight by posting here. Any help would be greatly appreciated. Thanks!

Katherine Picho

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

01 May 2018, 14:51

Well, let's take a look at how you created these variables:

Code:

gen ctr = 0 if treat==0 tab ctr gen ST = 1 if treat==1 & school==3 tab ST gen STfit = 2 if treat==1 & school==4 egen newtreat = rowmax(ctr ST STfit) tab newtreat

So, in those observations where school == 4 & treat == 1, newtreat will be 2. In those observations where school == 3 and treat == 1, newtreat will be 2. And if neither of the preceding conditions holds, and if treat == 0, then newtreat will be 0. But if, for example, you have an observation that has treat = 1 and school is something other than 3 or 4, then newtreat will just have a missing value. Is that what you intend? It sounds like most of your values of newtreat will be missing, since only under some fairly restrictive conditions (that are not logically exhaustive of the possibilities) do you get a 0/1/2 value of newtreat. It is plausible that when you then remember that observations of missing values of any of the regression variables (including psat) are excluded from the analysis, it may just be that the (few, I'm guessing) observations with newtrt = 1 (or perhaps those with newtrt = 0) all have missing values on some other variables in the regression.

So I'd run:

Code:

tab newtreat if e(sample)

to see what's going on here. I think you'll find that one of your anticipated values of newtreat doesn't actually occur when missingness of other variables is taken into account.
Comment

Katherine Picho

Join Date: Apr 2014
Posts: 32

02 May 2018, 08:29

Hello Clyde

Thank you for responding. There are only 2 schools in the dataset- and they are coded 3 & 4. There are no missing values for school. newtreat also has no missing values. When I tab newtreat- i get what I expect-- and the numbers correspond to the exact number of participants in each condition. See below:

HTML Code:

. tab newtreat

    3 level |
  indicator |
  treatment |
   variable |      Freq.     Percent        Cum.
------------+-----------------------------------
    Control |        145       50.88       50.88
Standard ST |         74       25.96       76.84
ST Enhanced |         66       23.16      100.00
------------+-----------------------------------
      Total |        285      100.00

without labels:

HTML Code:

. tab newtreat, nolabel

    3 level |
  indicator |
  treatment |
   variable |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        145       50.88       50.88
          1 |         74       25.96       76.84
          2 |         66       23.16      100.00
------------+-----------------------------------
      Total |        285      100.00

But you're on to something because here's the thing...when I do what you suggest (code you provided) I get 0 observations! What is this implying.... and how do I fix this? Thank you!

HTML Code:

. tab newtreat if e(sample)
no observations

Comment

Katherine Picho

Join Date: Apr 2014

Posts: 32
#4

02 May 2018, 08:37

Hi Clyde

I re-read your message again and cross checked... school 3 had very few observations for the continuous moderator variable (regfocus)-- maybe n= 16 (out of a total N of 146 participants). SO your statement on missing data makes sense. Which means then I can only run the analysis with the school that has pretty much all the data available for the variables (yes some are missing but it's like 3-4 observations, nothing as substantive as missingness on moderator for school 3!!).

Thank you for your help. If you have any more useful information to share, I'd be happy to soak it all in.

Best,
Katherine.

p.s. I'd be interested to know (for future purposes) what that response (i.e. no observations) means....when i ran the code: tab newtreat if e(sample). Thank you!!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#5

02 May 2018, 09:12

I'd be interested to know (for future purposes) what that response (i.e. no observations) means....when i ran the code: tab newtreat if e(sample).

It means that newtreat has a missing value in every observation that managed to get included in the estimation sample.

By the way, an alternative to restricting your analysis to a single school is to include both schools but not use the variable regfocus in your model--since that seems to be the source of the bulk of the missing values. You would have to think about your research goals to decide which approach would be better.
Comment
Katherine Picho

Join Date: Apr 2014

Posts: 32
#6

02 May 2018, 19:08

Hi Clyde

Yes I like your idea....I have to review my research goals...but am glad this was sorted out. Thank you very much for your help!!!
Comment

Announcement

levels of a 3 level categorical levels not showing in regression model

Comment

Comment

Comment

Comment

Comment