Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-difference method

    Dear Members of Statalist

    I am using the difference-in-difference method and here is the context. We had companies that were paying their executives with shares from 2005-2010, from 2011 to 2017 they stopped. So I want to examine how this change or how this stoppage of Share Compensation affect firm performance. In short I want to investigate how the change (from paying shares to not paying shares to executives) affect firm performance. So my variable of interest is ShareCompensation*Post. Post is a dummy of 1 to represent the period after the removal of share compensation. I have 2 options to represent the Share Compensation variable (1) Option 1; it can be a dummy variable of 1 to represent companies that were paying Share Compensation to executives and then stopped from 2011-2017 and 0 for companies that never paid Share Compensation in both periods. (2) Option 2; The second option to represent Share Compensation variable is to measure it as a continuous variable, as the number of shares paid to executives (i.e 500 shares, 260 shares,123,000 shares).or the sensitivity of the shares to performance

    Questions
    • When I run separate regressions based on option 1 (using dummy variables) and option 2 (number of shares, which is a continuous variable) I get conflicting or contradictory results. My question is between these two options of measuring ShareCompensation, which one is better and more suited for the difference-in-difference regression in my case. Is it normal for the two options to give contradictory results.
    • If you are running a regression using the two options, would you use the same command, in other words does any one of them require a different treatment.
    • Any references would also be appreciated.
    Your assistance will be greatly appreciated

  • #2
    Well, it is possible that viewed as a continuous variable there is a different result, including opposite sign, from what you would see treating it as a discrete variable. Run this code to see an example, and study the graph to get an understanding of why.

    Code:
    clear *
    set obs 20
    gen id = _n
    gen treatment = (_n <= 10)
    gen treatment2 = treatment/_n if treatment
    replace treatment2 = 0 if !treatment
    expand 10
    by id, sort: gen time = _n
    gen byte post = (time > 5)
    
    gen y = 0 if treatment == 0 | post == 0
    replace y = 2-5*treatment2 if treatment & post
    
    xtset id time
    xtline y,overlay
    
    xtreg y i.treatment##i.post, fe
    xtreg y c.treatment2##i.post, fe
    But before we jump to that conclusion for you it wold be best if you showed the exact code you ran and the exact output you got from Stata. It is possible that you are not coding your models correctly, or are misinterpreting your outputs. In showing these things, please be sure to do it by copy/pasting directly from your Results window or log file into the screen editor, and surrounding that material by code delimiters. (Please read the Forum FAQ, with attention to #12, to learn about how to use code delimiters.)

    Comment


    • #3
      Thank you Clyde. I have pasted the code below. I was also interested in understanding which of the two options is better (continuous variables vs discrete). Here is the code that i used
      Code:
      qui reg `vars' `vari' c.`vari'#c.K3 `s4Varlist' Ind1-Ind13 Yr*  if s3!=., vce(cluster CompId)
      local p=Ftail(e(df_m),e(df_r),e(F)
      Here is the output i am getting
      (1) (2)
      VARIABLES Continuous
      Variable
      ShareComp
      Dummy Variable
      ShareComp
      Bsize 0.0430 0.0261
      (0.0878) (0.0890)
      NEDs 1.567 1.277
      (1.566) (1.589)
      ln(Fsize) 0.426** 0.431**
      (0.203) (0.202)
      Frsk -0.799 -0.961*
      (0.590) (0.564)
      CAge -0.0346 -0.0220
      (0.0366) (0.0378)
      Block -1.720* -1.847*
      (0.884) (0.940)
      CoAct -0.131 -0.332
      (0.655) (0.640)
      K3 -1.820** -1.427*
      (0.728) (0.819)
      ROA -0.363 -0.437
      (0.633) (0.635)
      StkRtrn 0.342* 0.342*
      (0.174) (0.176)
      InstOwn 3.336** 2.873*
      (1.618) (1.713)
      Indp 0.258 0.421
      (1.354) (1.378)
      ShareComp 0.0981*
      (0.0577)
      ShareComp*Post 0.363***
      (0.116)
      ShareCompDummy 1.347*
      (0.681)
      ShareCompDummy*Post -0.879
      (0.789)
      Constant -4.785 -5.457
      (4.574) (4.588)
      Observations 1,159 1,159
      R-squared 0.184 0.184
      Adjusted R-squared 0.158 0.158
      AIC 6967 6966

      Comment


      • #4
        OK. This is a bit hard for me to interpret. I had intended for you to post the direct output of -regress-, not the version that has been laundered through -esttab- or -estout-. Also, the code contains some local macros, but not their definitions, so I can't quite tell exactly what the regression was. So my next comment may be incorrect:

        The regression command appears to contain an interaction term c.`vari'#c.K3, without a term for K3 alone, and without a term for `vari' alone. Of course, it maybe that these are contained somewhere in `vars' or `S4varlist', so I have just missed them. But if, in fact, those terms are not in the regression, then the model is mis-specified and cannot be interpreted (unless `vari' and K3 are colinear with some other terms in the model--in which case this is not a problem.)

        I strongly recommend that people always use the ## notation when coding interaction models, because it is much too easy to mistakenly omit the constituents of the interaction when using just #. The use of ## is foolproof: Stata never forgets to include the constituents. And even if they end up being omitted due to colinearity, there's no harm done. In fact, it's a plus: the omission gives you confirmation that the constituents were, in fact, colinear. If they weren't supposed to be, then you know there is something wrong with the data that needs fixing. If they were supposed to be colinear (say with a fixed effect, as the treatment variable in a DID typically is) and it doesn't get omitted, again, you have a warning that there is something wrong with the data. ## is safer than # and has no downsides whatsoever. I don't really grasp why people use # instead when coding regressions, unless they are being charged by the character for the privilege of writing code. :-)

        That said, the outputs you show do include a term for K3. But the "variable names" you show in the output don't have any obvious correspondence to the variable names in the regression command, so I really can't be sure what's going on here.

        Could you repost the commands with the definitions of all the local macros that are mentioned in regress, along with the immediate, unmodified output of the -regress- commands? Thanks.



        Comment

        Working...
        X