No. observations in each group with interaction terms

Chiara Brouns

Join Date: Dec 2016

Posts: 29
#1

No. observations in each group with interaction terms

10 Jan 2017, 02:21

Hello,

I created interaction terms on a group of dummy variable taking value 1 if an individual has a disease and employment status in 3 categories like this:

. reg satis educ educsq marital_dum age agesq male i.diabetes##i.empl i.asthma##i.empl i.heart##i.empl 1.cancer##i.empl i.stroke##i.empl i.migraene##i.empl i.dementia
> ##i.empl i.depression##i.empl i.otherilln##i.empl i.hypertension##i.empl if age>25 & age<59 & svyyear==2009, robust

I would like to know how many observations I have in each group, so 1 diabetes&1 empl, 1, diabetes&2empl and so on. How can I do this in Stata?

Thanx a lot!
Chiara
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17711

10 Jan 2017, 02:36

Chiara:
perhaps what follows can help you out:

Code:

. sysuse auto.dta
(1978 Automobile Data)

. reg price i.foreign##i.rep78
note: 1.foreign#1b.rep78 identifies no observations in the sample
note: 1.foreign#2.rep78 identifies no observations in the sample
note: 1.foreign#5.rep78 omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(7, 61)        =      0.39
       Model |    24684607         7  3526372.43   Prob > F        =    0.9049
    Residual |   552112352        61  9051022.16   R-squared       =    0.0428
-------------+----------------------------------   Adj R-squared   =   -0.0670
       Total |   576796959        68  8482308.22   Root MSE        =    3008.5

-------------------------------------------------------------------------------
        price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
      foreign |
     Foreign  |   2088.167   2351.846     0.89   0.378     -2614.64    6790.974
              |
        rep78 |
           2  |   1403.125   2378.422     0.59   0.557    -3352.823    6159.073
           3  |   2042.574   2204.707     0.93   0.358    -2366.011    6451.159
           4  |   1317.056   2351.846     0.56   0.578    -3385.751    6019.863
           5  |       -360   3008.492    -0.12   0.905    -6375.851    5655.851
              |
foreign#rep78 |
   Foreign#1  |          0  (empty)
   Foreign#2  |          0  (empty)
   Foreign#3  |  -3866.574   2980.505    -1.30   0.199    -9826.462    2093.314
   Foreign#4  |  -1708.278   2746.365    -0.62   0.536    -7199.973    3783.418
   Foreign#5  |          0  (omitted)
              |
        _cons |     4564.5   2127.325     2.15   0.036      310.651    8818.349
-------------------------------------------------------------------------------

. egen check=group(foreign rep78)
(5 missing values generated)

. bysort check : list foreign rep78 check if _n==1

----------------------------------------------------------------------------------------------------------
-> check = 1

     +--------------------------+
     |  foreign   rep78   check |
     |--------------------------|
  1. | Domestic       1       1 |
     +--------------------------+

----------------------------------------------------------------------------------------------------------
-> check = 2

     +--------------------------+
     |  foreign   rep78   check |
     |--------------------------|
  1. | Domestic       2       2 |
     +--------------------------+

----------------------------------------------------------------------------------------------------------
-> check = 3

     +--------------------------+
     |  foreign   rep78   check |
     |--------------------------|
  1. | Domestic       3       3 |
     +--------------------------+

----------------------------------------------------------------------------------------------------------
-> check = 4

     +--------------------------+
     |  foreign   rep78   check |
     |--------------------------|
  1. | Domestic       4       4 |
     +--------------------------+

----------------------------------------------------------------------------------------------------------
-> check = 5

     +--------------------------+
     |  foreign   rep78   check |
     |--------------------------|
  1. | Domestic       5       5 |
     +--------------------------+

----------------------------------------------------------------------------------------------------------
-> check = 6

     +-------------------------+
     | foreign   rep78   check |
     |-------------------------|
  1. | Foreign       3       6 |
     +-------------------------+

----------------------------------------------------------------------------------------------------------
-> check = 7

     +-------------------------+
     | foreign   rep78   check |
     |-------------------------|
  1. | Foreign       4       7 |
     +-------------------------+

----------------------------------------------------------------------------------------------------------
-> check = 8

     +-------------------------+
     | foreign   rep78   check |
     |-------------------------|
  1. | Foreign       5       8 |
     +-------------------------+

----------------------------------------------------------------------------------------------------------
-> check = .

     +--------------------------+
     |  foreign   rep78   check |
     |--------------------------|
  1. | Domestic       .       . |
     +--------------------------+


. label define check 1 "Domestic_1rep" 2 "Domestic_2rep" 3 "Domestic_3rep" 4 "Domestic_4rep" 5 "Domestic_5
> rep" 6 "Foreign_3rep" 7"Foreign_4rep" 8"Foreign_5rep"

. label val check check

. tab check

group(foreign |
       rep78) |      Freq.     Percent        Cum.
--------------+-----------------------------------
Domestic_1rep |          2        2.90        2.90
Domestic_2rep |          8       11.59       14.49
Domestic_3rep |         27       39.13       53.62
Domestic_4rep |          9       13.04       66.67
Domestic_5rep |          2        2.90       69.57
 Foreign_3rep |          3        4.35       73.91
 Foreign_4rep |          9       13.04       86.96
 Foreign_5rep |          9       13.04      100.00
--------------+-----------------------------------
        Total |         69      100.00

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Chiara Brouns

Join Date: Dec 2016

Posts: 29
#3

10 Jan 2017, 02:53

Hi Carlo,

Yes, this works! Thanx a lot!

Ciao,
Chiara
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35699

10 Jan 2017, 03:31

I don't quite see any need to produce a new variable here or any gain from doing so. Any two-way tabulation command would show the frequencies of cross-combinations. groups (SSC) does so in a way that extends to three-way, four-way, ... interactions.

Code:

. sysuse auto
(1978 Automobile Data)

. groups foreign rep78

  +------------------------------------+
  |  foreign   rep78   Freq.   Percent |
  |------------------------------------|
  | Domestic       1       2      2.90 |
  | Domestic       2       8     11.59 |
  | Domestic       3      27     39.13 |
  | Domestic       4       9     13.04 |
  | Domestic       5       2      2.90 |
  |------------------------------------|
  |  Foreign       3       3      4.35 |
  |  Foreign       4       9     13.04 |
  |  Foreign       5       9     13.04 |
  +------------------------------------+

. groups foreign rep78, nolabel

  +-----------------------------------+
  | foreign   rep78   Freq.   Percent |
  |-----------------------------------|
  |       0       1       2      2.90 |
  |       0       2       8     11.59 |
  |       0       3      27     39.13 |
  |       0       4       9     13.04 |
  |       0       5       2      2.90 |
  |-----------------------------------|
  |       1       3       3      4.35 |
  |       1       4       9     13.04 |
  |       1       5       9     13.04 |
  +-----------------------------------+

.

Comment

Katherine Adams

Join Date: Jan 2019

Posts: 52
#5

15 Feb 2019, 11:06

Hello,

One of the terms in my regression is the interaction term i.treatgr#i.tp,

where

1
‘treatgr’ identifies one of the 3 treatment groups (variable ‘randomgr’, randomgr=1/2/3, and randomgr=0 if an observation is in a control group) as follows:
gen treatgr = randomgrp if calday >= td(05may2017)
replace treat = 0 if treat ==.

(the treatment starts on May 5, 2017)

2
‘tp’ is a treatment period dummy:
gen tp = (calday >= td(05may2017))

While running the regression, Stata reports that

note: 1.treat#0b.tp identifies no observations in the sample
note: 2.treat#0b.tp identifies no observations in the sample
note: 3.treat#0b.tp identifies no observations in the sample

which is OK by definition of the variable ‘treatgr’.

However, is this situation normal in general? Should I correct it in some way in order to avoid such messages?

Thank you.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#6

15 Feb 2019, 11:17

Katherine:
- how can interested listers reply positively withoun an example/excerpt of your data (that you can easily share via -dataex-)?
Moreover, your interaction code should probably be:

Code:

i.treatgr##i.tp

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#7

15 Feb 2019, 11:24

Carlo gave excellent advice.

Now, just as a side not, I'm wondering why you would wish to add an interaction term with a binary variable (time period) and a categorical variable, provided all categories are place under a single period.

In other words, and giving a reply to your question ("is this situation normal in general?"), I believe there is no advantage in adding this interaction term. This shall be the best way to "avoid" the "no observations in the sample" message.

Best regards,

Marcos
1 like
Comment
Katherine Adams

Join Date: Jan 2019

Posts: 52
#8

15 Feb 2019, 13:13

Carlo, Marcos, thank you for your help! And yes, I will try to generate an example of my data.
Comment
Katherine Adams

Join Date: Jan 2019

Posts: 52
#9

15 Feb 2019, 16:16

Here is the example of my data; the data in the example is sorted by location, so, in fact, it is related to only one household with location id 600001 (the original data is for many households over 2017-2018).

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long location float(lconsum tp) byte randomgrp float(calday treatgr treat_numb_of_days) 600001 4.342219 0 0 20820 0 -403 600001 4.396476 0 0 20821 0 -402 600001 4.4473995 0 0 20822 0 -401 600001 4.4349075 0 0 20823 0 -400 600001 4.3400753 0 0 20824 0 -399 600001 3.974716 0 0 20825 0 -398 600001 4.2170517 0 0 20826 0 -397 600001 4.4074755 0 0 20827 0 -396 600001 4.2367565 0 0 20828 0 -395 600001 4.3245976 0 0 20829 0 -394 600001 4.2221044 0 0 20830 0 -393 600001 4.4336513 0 0 20831 0 -392 600001 4.122668 0 0 20832 0 -391 600001 4.443582 0 0 20833 0 -390 600001 4.1282955 0 0 20834 0 -389 600001 4.153299 0 0 20835 0 -388 600001 4.019543 0 0 20836 0 -387 600001 3.8549176 0 0 20837 0 -386 600001 3.745078 0 0 20838 0 -385 600001 3.776974 0 0 20839 0 -384 end format %td calday

location; household’s location id
lconsum; log of energy consumption
tp; post-treatment variable; gen tp = (calday >= td(08feb2018)). [Sorry, I used the wrong treatment date in my previous post]
randomgr; one of three treatment groups (can be 1,2,3, as well as 0 if it is a control group)
calday; day and year 01jan2017
treatgr; treatment indicator;
gen treatgr = randomgrp if calday >= td(08feb2018)
replace treat = 0 if treat ==.

I will also try to be more specific about my model.
The problem is that I need to do an event-study.
First, I generate a variable showing the number of days before/after the date when the treatment starts:
gen treat_numb_of_days = calday-td(08feb2018)
Then, I do an event study regression (as I understand it):
areg lconsum treatgr tp i.treatgr#c.treat_numb_of_days#i.tp, absorb(location) vce(cluster location)

As I said, while running the regression above, Stata reports that
note: 1.treatgr#0b.tp#c.treat_numb_of_days identifies no observations in the sample
note: 2.treatgr#0b.tp#c.treat_numb_of_days identifies no observations in the sample
note: 3.treatgr#0b.tp#c.treat_numb_of_days identifies no observations in the sample

which should be the case because of the definition of the variable ‘treatgr’.

Carlo,
I have tried to do a regression using the full interaction ##
areg lconsum i.treatgr##c.treat_numb_of_days##i.tp, absorb(location) vce(cluster location)

But got the following messages:
note: 1.treat#0b.tp identifies no observations in the sample
note: 1.treat#1.tp omitted because of collinearity
note: 2.treat#0b.tp identifies no observations in the sample
note: 2.treat#1.tp omitted because of collinearity
note: 3.treat#0b.tp identifies no observations in the sample
note: 3.treat#1.tp omitted because of collinearity
note: 1.treat#0b.tp#c.treat_numb_of_days identifies no observations in the sample
note: 1.treat#1.tp#c.treat_numb_of_days omitted because of collinearity
note: 2.treat#0b.tp#c.treat_numb_of_days identifies no observations in the sample
note: 2.treat#1.tp#c.treat_numb_of_days omitted because of collinearity
note: 3.treat#0b.tp#c.treat_numb_of_days identifies no observations in the sample
note: 3.treat#1.tp#c.treat_numb_of_days omitted because of collinearity

In addition, in this case, I am not sure how to plot a figure for my event study showing point estimates from the event study regression of energy consumption before and after the treatment.

Marcos,
I am afraid if I omit my interaction term, I will not be able to conduct the event study (however, I may be wrong in my understanding of an event-study regression).

Thank you.
Comment

Announcement

No. observations in each group with interaction terms

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment