Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DID difference in difference for count data using nbreg (negativ binomial)

    Dear statalists,

    I have a data set of treated (employees purchasing stocks through a firm´s stock option scheme) and non-treated (employees not purchasing stocks thorugh a firm´s stock option scheme) individuals with two periods (before and after treatment) and several controls.

    Only those employees that purchased stock-options for the first time are considered in the treatment group. Hence, the data set is very unbalances as the control group (non-treatment) is several times larger than the treatment group.

    My dependend variable is a count variable of ideas issued to an idea suggestion scheme - so we are interested in whether employees owning stocks are issuing more ideas that employees not owning stocks in the firm.

    Variables are:
    DV: newidea_a_did_1
    treatment dummy: did_eso_treatment
    period dummy: period
    tnteraction period x treatment: treatment_X_period
    + several controls

    I have attached and excerpt of my data below

    The question is, can I run a difference in difference regression using nbreg just as I would do it with the common reg command? I think nbreg is more appropriate due to having count data extremely skewed to the left?

    reg command: reg newidea_a_did_1 period did_eso_treatment treatment_X_period year fulltime_did_1 size_did_1 dummy_function_1_did_1 dummy_level_1_did_1, vce(robust)
    nbreg command: nbreg newidea_a_did_1 period did_eso_treatment treatment_X_period year fulltime_did_1 size_did_1 dummy_function_1_did_1 dummy_level_1_did_1


    Thanks for your help!
    Felix


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte period long newid int(year newidea_a_did_1) byte(fulltime_did_1 dummy_function_1_did_1 dummy_level_1_did_1) int size_did_1 byte did_eso_treatment float treatment_X_period
    0 164876 2014 0 1 0 0 1 0 0
    1  12837 2014 0 1 0 0 1 0 0
    1 136451 2015 0 1 0 0 1 0 0
    1  95503 2013 0 1 0 0 1 0 0
    0 148296 2013 0 1 0 0 1 0 0
    0 164616 2014 0 1 0 0 1 0 0
    1  79008 2011 0 1 0 0 1 0 0
    1 113462 2015 0 1 0 0 1 0 0
    1 104390 2012 0 1 0 0 1 0 0
    0   5472 2012 4 1 0 0 1 0 0
    0 129275 2015 0 1 0 0 1 0 0
    1  47902 2015 0 1 0 0 1 0 0
    1  89282 2013 0 1 0 0 1 0 0
    1 154119 2013 0 1 0 0 1 0 0
    0  80340 2013 0 1 0 0 1 0 0
    0   5542 2014 3 1 0 0 1 0 0
    1 159958 2014 0 1 0 0 1 0 0
    1  30037 2015 0 1 0 0 1 0 0
    1  68050 2015 0 1 0 0 1 0 0
    0  26429 2014 0 1 0 0 1 0 0
    0  18680 2013 0 1 0 0 1 0 0
    0 127988 2015 1 1 0 0 1 0 0
    1  55030 2013 0 1 0 0 1 0 0
    0   1123 2014 1 1 0 0 1 0 0
    0   7311 2012 0 1 0 0 1 0 0
    1 132880 2012 0 1 0 1 1 0 0
    0 114821 2015 0 0 0 0 1 0 0
    0  12697 2015 1 1 0 0 1 0 0
    1  22619 2011 0 1 0 0 1 0 0
    1  13878 2014 0 1 0 0 1 0 0
    1  21819 2014 0 1 0 0 1 0 0
    0 108467 2013 0 1 0 0 1 0 0
    0  23320 2013 0 1 0 0 1 0 0
    0  38465 2015 0 1 0 0 1 0 0
    1  67225 2011 0 1 0 0 1 0 0
    1 108023 2013 0 1 0 0 1 0 0
    1  78626 2015 1 1 0 0 1 0 0
    1 162525 2015 1 1 0 0 1 0 0
    0  88884 2014 0 1 0 0 1 0 0
    1  21763 2013 0 1 0 0 1 0 0
    0  13552 2011 0 1 0 0 1 0 0
    1  68124 2015 0 1 0 0 1 0 0
    0  13595 2011 0 1 0 0 1 0 0
    0 140693 2012 0 1 0 1 1 0 0
    1  68069 2014 0 1 0 0 1 0 0
    0  69566 2013 0 1 0 0 1 0 0
    1 116535 2012 0 1 0 0 1 0 0
    0   5935 2011 0 1 0 0 1 0 0
    0  37895 2012 0 1 1 0 1 0 0
    1 124789 2011 0 1 0 0 1 0 0
    1  53398 2013 0 1 0 0 1 0 0
    1 145305 2015 3 1 0 0 1 0 0
    0   5975 2013 0 1 0 0 1 0 0
    0   5991 2011 0 1 0 0 1 0 0
    0   5991 2013 0 1 0 0 1 0 0
    1  13616 2015 0 1 0 0 1 0 0
    0   5999 2015 0 1 0 0 1 0 0
    0 150473 2012 0 1 0 0 1 0 0
    1 164520 2011 0 1 0 0 1 0 0
    0 147783 2015 0 1 0 0 1 0 0
    0  79014 2015 0 1 0 0 1 0 0
    0 154112 2011 0 1 0 0 1 0 0
    0  32056 2015 0 1 0 0 1 0 0
    1  77614 2012 0 1 0 1 1 0 0
    0  78915 2015 0 1 0 0 1 0 0
    1 125923 2011 0 1 1 0 1 0 0
    0  22439 2014 2 1 0 0 1 0 0
    1 127811 2015 0 1 0 0 1 0 0
    0   5542 2015 0 1 0 0 1 0 0
    1  79014 2012 0 1 0 0 1 0 0
    0 160757 2013 0 1 0 0 1 0 0
    0 133963 2015 0 1 0 0 1 0 0
    0  69247 2011 0 1 0 0 1 0 0
    0 108467 2014 1 1 0 0 1 0 0
    0   6297 2014 0 1 0 0 1 0 0
    1 109064 2015 0 1 0 0 1 0 0
    0   5944 2015 0 1 0 0 1 0 0
    1  13399 2014 0 1 0 0 1 0 0
    1  67948 2013 0 1 0 0 1 0 0
    0  26745 2013 0 1 0 0 1 0 0
    1 124741 2014 0 1 0 0 1 0 0
    1  80122 2014 3 1 0 0 1 0 0
    1 131175 2012 0 1 0 0 1 0 0
    0 164807 2014 0 1 0 0 1 0 0
    1  33422 2011 0 1 1 0 1 0 0
    1 115241 2011 0 1 0 0 1 0 0
    0 127199 2013 0 1 0 0 1 0 0
    0  13878 2015 0 1 0 0 1 0 0
    0 147779 2015 0 1 0 0 1 0 0
    0  87807 2011 0 1 0 0 1 0 0
    0   6627 2015 0 0 0 0 1 0 0
    0  96166 2013 0 1 0 0 1 0 0
    0   6661 2011 0 1 0 0 1 0 0
    0   6664 2014 0 1 0 0 1 0 0
    1 161085 2012 0 1 0 0 1 0 0
    0  21961 2015 0 1 0 0 1 0 0
    1 160757 2012 0 1 0 0 1 0 0
    1  22469 2011 0 1 0 0 1 0 0
    0   6804 2011 0 1 0 0 1 0 0
    0   6804 2013 0 1 0 0 1 0 0
    end

    Last edited by Felix Hofmann; 07 Apr 2019, 14:42. Reason: dataes edited

  • #2
    To answer your general question, yes you can use -nbreg- instead of -reg- if you feel it is suitable. But it is not clear to me if either of these is suitable. It appears from your example data that at least some of the people (newid) in the data set appear more than once. In fact, in this kind of setting, it is usually the case that all or most will recur at least once in the pre- period and once in the post-period. If that is the case, you do not have the independence of observations required to use either -reg- or -nbreg-. You will need to use -xtreg- or -xtnbreg-. For that matter, take a look at -xtpoisson-. It's suitable for count data, simpler conceptually than -xtnbreg-, and has fewer difficulties with convergence. Perhaps you worry about overdispersion, but, at least in your example data, your outcome variable only ranges between 0 and 4: it's pretty hard to be overdispersed in that circumstance. Anyway, you need an analysis that accounts for the repeated observations on the same persons.

    I also was scanning your data a bit. The example you show does not have any observations with your treatment = 1. I wanted to verify that you have observations in both the pre- and post- periods in both treated and untreated cases. Just check that yourself before you try to run any analysis.

    I also noted you are including year as a variable, but as a continuous variable. That means you are modeling a continuous linear trend in the outcome variable over time. Is that what you intended? Perhaps you meant to include a series of indicator variables for year so as to adjust for year-by-year shocks in the outcome? If so, it needs to be i.year. (If you are not familiar with this, learn about factor-variable notation by reading -help fvvarlist-. When you do that you will also see that you can simplify the code by using factor variable notation to handle the main and interaction effects of the DID.)

    Comment


    • #3
      Dear Clyde, thank you very much for your reply, I will apply your recommendations!

      Code:
      It appears from your example data that at least some of the people (newid) in the data set appear more than once
      I also discovered the fact that some individuals appeared more than once (all in the control group) and excluded them using a 1:1 exact matching approach between treated and non-treated observation without replacement., because we have only one observation for treated individuals.

      Code:
      I also was scanning your data a bit. The example you show does not have any observations with your treatment = 1. I wanted to verify that you have observations in both the pre- and post- periods in both treated and untreated cases. Just check that yourself before you try to run any Analysis.
      Yes, the data has treated observations, they do not appear in the dataex example.

      Code:
      If so, it needs to be i.year.
      Yes, I wanted to include a series of indicator variables and will use i.YEAR


      Thanks!
      Felix

      Comment

      Working...
      X