Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • No. observations in each group with interaction terms

    Hello,

    I created interaction terms on a group of dummy variable taking value 1 if an individual has a disease and employment status in 3 categories like this:

    . reg satis educ educsq marital_dum age agesq male i.diabetes##i.empl i.asthma##i.empl i.heart##i.empl 1.cancer##i.empl i.stroke##i.empl i.migraene##i.empl i.dementia
    > ##i.empl i.depression##i.empl i.otherilln##i.empl i.hypertension##i.empl if age>25 & age<59 & svyyear==2009, robust

    I would like to know how many observations I have in each group, so 1 diabetes&1 empl, 1, diabetes&2empl and so on. How can I do this in Stata?

    Thanx a lot!
    Chiara


  • #2
    Chiara:
    perhaps what follows can help you out:
    Code:
    . sysuse auto.dta
    (1978 Automobile Data)
    
    . reg price i.foreign##i.rep78
    note: 1.foreign#1b.rep78 identifies no observations in the sample
    note: 1.foreign#2.rep78 identifies no observations in the sample
    note: 1.foreign#5.rep78 omitted because of collinearity
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(7, 61)        =      0.39
           Model |    24684607         7  3526372.43   Prob > F        =    0.9049
        Residual |   552112352        61  9051022.16   R-squared       =    0.0428
    -------------+----------------------------------   Adj R-squared   =   -0.0670
           Total |   576796959        68  8482308.22   Root MSE        =    3008.5
    
    -------------------------------------------------------------------------------
            price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
          foreign |
         Foreign  |   2088.167   2351.846     0.89   0.378     -2614.64    6790.974
                  |
            rep78 |
               2  |   1403.125   2378.422     0.59   0.557    -3352.823    6159.073
               3  |   2042.574   2204.707     0.93   0.358    -2366.011    6451.159
               4  |   1317.056   2351.846     0.56   0.578    -3385.751    6019.863
               5  |       -360   3008.492    -0.12   0.905    -6375.851    5655.851
                  |
    foreign#rep78 |
       Foreign#1  |          0  (empty)
       Foreign#2  |          0  (empty)
       Foreign#3  |  -3866.574   2980.505    -1.30   0.199    -9826.462    2093.314
       Foreign#4  |  -1708.278   2746.365    -0.62   0.536    -7199.973    3783.418
       Foreign#5  |          0  (omitted)
                  |
            _cons |     4564.5   2127.325     2.15   0.036      310.651    8818.349
    -------------------------------------------------------------------------------
    
    . egen check=group(foreign rep78)
    (5 missing values generated)
    
    . bysort check : list foreign rep78 check if _n==1
    
    ----------------------------------------------------------------------------------------------------------
    -> check = 1
    
         +--------------------------+
         |  foreign   rep78   check |
         |--------------------------|
      1. | Domestic       1       1 |
         +--------------------------+
    
    ----------------------------------------------------------------------------------------------------------
    -> check = 2
    
         +--------------------------+
         |  foreign   rep78   check |
         |--------------------------|
      1. | Domestic       2       2 |
         +--------------------------+
    
    ----------------------------------------------------------------------------------------------------------
    -> check = 3
    
         +--------------------------+
         |  foreign   rep78   check |
         |--------------------------|
      1. | Domestic       3       3 |
         +--------------------------+
    
    ----------------------------------------------------------------------------------------------------------
    -> check = 4
    
         +--------------------------+
         |  foreign   rep78   check |
         |--------------------------|
      1. | Domestic       4       4 |
         +--------------------------+
    
    ----------------------------------------------------------------------------------------------------------
    -> check = 5
    
         +--------------------------+
         |  foreign   rep78   check |
         |--------------------------|
      1. | Domestic       5       5 |
         +--------------------------+
    
    ----------------------------------------------------------------------------------------------------------
    -> check = 6
    
         +-------------------------+
         | foreign   rep78   check |
         |-------------------------|
      1. | Foreign       3       6 |
         +-------------------------+
    
    ----------------------------------------------------------------------------------------------------------
    -> check = 7
    
         +-------------------------+
         | foreign   rep78   check |
         |-------------------------|
      1. | Foreign       4       7 |
         +-------------------------+
    
    ----------------------------------------------------------------------------------------------------------
    -> check = 8
    
         +-------------------------+
         | foreign   rep78   check |
         |-------------------------|
      1. | Foreign       5       8 |
         +-------------------------+
    
    ----------------------------------------------------------------------------------------------------------
    -> check = .
    
         +--------------------------+
         |  foreign   rep78   check |
         |--------------------------|
      1. | Domestic       .       . |
         +--------------------------+
    
    
    . label define check 1 "Domestic_1rep" 2 "Domestic_2rep" 3 "Domestic_3rep" 4 "Domestic_4rep" 5 "Domestic_5
    > rep" 6 "Foreign_3rep" 7"Foreign_4rep" 8"Foreign_5rep"
    
    . label val check check
    
    . tab check
    
    group(foreign |
           rep78) |      Freq.     Percent        Cum.
    --------------+-----------------------------------
    Domestic_1rep |          2        2.90        2.90
    Domestic_2rep |          8       11.59       14.49
    Domestic_3rep |         27       39.13       53.62
    Domestic_4rep |          9       13.04       66.67
    Domestic_5rep |          2        2.90       69.57
     Foreign_3rep |          3        4.35       73.91
     Foreign_4rep |          9       13.04       86.96
     Foreign_5rep |          9       13.04      100.00
    --------------+-----------------------------------
            Total |         69      100.00
    
    .
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Hi Carlo,

      Yes, this works! Thanx a lot!

      Ciao,
      Chiara

      Comment


      • #4
        I don't quite see any need to produce a new variable here or any gain from doing so. Any two-way tabulation command would show the frequencies of cross-combinations. groups (SSC) does so in a way that extends to three-way, four-way, ... interactions.

        Code:
        . sysuse auto
        (1978 Automobile Data)
        
        . groups foreign rep78
        
          +------------------------------------+
          |  foreign   rep78   Freq.   Percent |
          |------------------------------------|
          | Domestic       1       2      2.90 |
          | Domestic       2       8     11.59 |
          | Domestic       3      27     39.13 |
          | Domestic       4       9     13.04 |
          | Domestic       5       2      2.90 |
          |------------------------------------|
          |  Foreign       3       3      4.35 |
          |  Foreign       4       9     13.04 |
          |  Foreign       5       9     13.04 |
          +------------------------------------+
        
        . groups foreign rep78, nolabel
        
          +-----------------------------------+
          | foreign   rep78   Freq.   Percent |
          |-----------------------------------|
          |       0       1       2      2.90 |
          |       0       2       8     11.59 |
          |       0       3      27     39.13 |
          |       0       4       9     13.04 |
          |       0       5       2      2.90 |
          |-----------------------------------|
          |       1       3       3      4.35 |
          |       1       4       9     13.04 |
          |       1       5       9     13.04 |
          +-----------------------------------+
        
        .

        Comment


        • #5
          Hello,

          One of the terms in my regression is the interaction term i.treatgr#i.tp,

          where

          1
          ‘treatgr’ identifies one of the 3 treatment groups (variable ‘randomgr’, randomgr=1/2/3, and randomgr=0 if an observation is in a control group) as follows:
          gen treatgr = randomgrp if calday >= td(05may2017)
          replace treat = 0 if treat ==.

          (the treatment starts on May 5, 2017)

          2
          ‘tp’ is a treatment period dummy:
          gen tp = (calday >= td(05may2017))

          While running the regression, Stata reports that

          note: 1.treat#0b.tp identifies no observations in the sample
          note: 2.treat#0b.tp identifies no observations in the sample
          note: 3.treat#0b.tp identifies no observations in the sample

          which is OK by definition of the variable ‘treatgr’.


          However, is this situation normal in general? Should I correct it in some way in order to avoid such messages?

          Thank you.

          Comment


          • #6
            Katherine:
            - how can interested listers reply positively withoun an example/excerpt of your data (that you can easily share via -dataex-)?
            Moreover, your interaction code should probably be:
            Code:
            i.treatgr##i.tp
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Carlo gave excellent advice.

              Now, just as a side not, I'm wondering why you would wish to add an interaction term with a binary variable (time period) and a categorical variable, provided all categories are place under a single period.

              In other words, and giving a reply to your question ("is this situation normal in general?"), I believe there is no advantage in adding this interaction term. This shall be the best way to "avoid" the "no observations in the sample" message.

              Best regards,

              Marcos

              Comment


              • #8
                Carlo, Marcos, thank you for your help! And yes, I will try to generate an example of my data.

                Comment


                • #9
                  Here is the example of my data; the data in the example is sorted by location, so, in fact, it is related to only one household with location id 600001 (the original data is for many households over 2017-2018).

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input long location float(lconsum tp) byte randomgrp float(calday treatgr treat_numb_of_days)
                  600001  4.342219 0 0 20820 0 -403
                  600001  4.396476 0 0 20821 0 -402
                  600001 4.4473995 0 0 20822 0 -401
                  600001 4.4349075 0 0 20823 0 -400
                  600001 4.3400753 0 0 20824 0 -399
                  600001  3.974716 0 0 20825 0 -398
                  600001 4.2170517 0 0 20826 0 -397
                  600001 4.4074755 0 0 20827 0 -396
                  600001 4.2367565 0 0 20828 0 -395
                  600001 4.3245976 0 0 20829 0 -394
                  600001 4.2221044 0 0 20830 0 -393
                  600001 4.4336513 0 0 20831 0 -392
                  600001  4.122668 0 0 20832 0 -391
                  600001  4.443582 0 0 20833 0 -390
                  600001 4.1282955 0 0 20834 0 -389
                  600001  4.153299 0 0 20835 0 -388
                  600001  4.019543 0 0 20836 0 -387
                  600001 3.8549176 0 0 20837 0 -386
                  600001  3.745078 0 0 20838 0 -385
                  600001  3.776974 0 0 20839 0 -384
                  end
                  format %td calday
                  location; household’s location id
                  lconsum; log of energy consumption
                  tp; post-treatment variable; gen tp = (calday >= td(08feb2018)). [Sorry, I used the wrong treatment date in my previous post]
                  randomgr; one of three treatment groups (can be 1,2,3, as well as 0 if it is a control group)
                  calday; day and year 01jan2017
                  treatgr; treatment indicator;
                  gen treatgr = randomgrp if calday >= td(08feb2018)
                  replace treat = 0 if treat ==.


                  I will also try to be more specific about my model.
                  The problem is that I need to do an event-study.
                  First, I generate a variable showing the number of days before/after the date when the treatment starts:
                  gen treat_numb_of_days = calday-td(08feb2018)
                  Then, I do an event study regression (as I understand it):
                  areg lconsum treatgr tp i.treatgr#c.treat_numb_of_days#i.tp, absorb(location) vce(cluster location)

                  As I said, while running the regression above, Stata reports that
                  note: 1.treatgr#0b.tp#c.treat_numb_of_days identifies no observations in the sample
                  note: 2.treatgr#0b.tp#c.treat_numb_of_days identifies no observations in the sample
                  note: 3.treatgr#0b.tp#c.treat_numb_of_days identifies no observations in the sample

                  which should be the case because of the definition of the variable ‘treatgr’.

                  Carlo,
                  I have tried to do a regression using the full interaction ##
                  areg lconsum i.treatgr##c.treat_numb_of_days##i.tp, absorb(location) vce(cluster location)

                  But got the following messages:
                  note: 1.treat#0b.tp identifies no observations in the sample
                  note: 1.treat#1.tp omitted because of collinearity
                  note: 2.treat#0b.tp identifies no observations in the sample
                  note: 2.treat#1.tp omitted because of collinearity
                  note: 3.treat#0b.tp identifies no observations in the sample
                  note: 3.treat#1.tp omitted because of collinearity
                  note: 1.treat#0b.tp#c.treat_numb_of_days identifies no observations in the sample
                  note: 1.treat#1.tp#c.treat_numb_of_days omitted because of collinearity
                  note: 2.treat#0b.tp#c.treat_numb_of_days identifies no observations in the sample
                  note: 2.treat#1.tp#c.treat_numb_of_days omitted because of collinearity
                  note: 3.treat#0b.tp#c.treat_numb_of_days identifies no observations in the sample
                  note: 3.treat#1.tp#c.treat_numb_of_days omitted because of collinearity

                  In addition, in this case, I am not sure how to plot a figure for my event study showing point estimates from the event study regression of energy consumption before and after the treatment.


                  Marcos,
                  I am afraid if I omit my interaction term, I will not be able to conduct the event study (however, I may be wrong in my understanding of an event-study regression).


                  Thank you.

                  Comment

                  Working...
                  X