Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Would like assistance for some of my regressions

    Hello!

    I have a dataset of grocery store transactions in Washington DC, Arlington County, VA, and Montgomery County, MD around the time period of Jan. 1 2012 when that locality imposed a bag tax of 5 cents per bag. I'm currently trying to run a differences in differences model to look at the effect of plastic bag consumption in Montgomery County, MD before and after the tax is implemented. I'm currently running this regression:

    code:
    regress plastic md post postXmd

    but my results seem to be off and I cannot understand why.

    I'm still fairly new to stata, so any help would be appreciated. Thanks!

    I've attached a sample of my dataset with the variables I believe are the key variables for my analysis. The plastic variable represents the number of plastic bags used and reuse represents number of reusable bags used. Post =1 if after Jan 12, 2012 (when tax occurred).

    input byte(plastic reuse post) float(dc va md postXmd postXdc postXva)
    2 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    0 4 0 0 0 1 0 0 0
    3 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    0 1 0 0 0 1 0 0 0
    0 5 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    4 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    0 6 0 0 0 1 0 0 0
    0 1 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    6 0 0 0 0 1 0 0 0
    0 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    0 0 0 0 0 1 0 0 0
    3 0 0 0 0 1 0 0 0
    6 0 0 0 0 1 0 0 0
    4 0 0 0 0 1 0 0 0
    7 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    2 1 0 0 0 1 0 0 0
    0 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    0 1 0 0 0 1 0 0 0
    4 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    0 5 0 0 0 1 0 0 0
    7 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    0 2 0 0 0 1 0 0 0
    2 2 0 0 0 1 0 0 0
    0 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    14 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    3 0 0 0 0 1 0 0 0
    0 1 0 0 0 1 0 0 0
    1 3 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    2 4 0 0 0 1 0 0 0
    0 1 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    0 0 0 0 0 1 0 0 0
    6 0 0 0 0 1 0 0 0
    0 1 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    0 3 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    4 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    12 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    4 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    1 0 0 0 0 1 0 0 0
    0 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    2 5 0 0 0 1 0 0 0
    0 1 0 0 0 1 0 0 0
    3 3 0 0 0 1 0 0 0
    6 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    4 0 0 0 0 1 0 0 0
    3 0 0 0 0 1 0 0 0
    7 0 0 0 0 1 0 0 0
    5 0 0 0 0 1 0 0 0
    3 0 0 0 0 1 0 0 0
    10 0 0 0 0 1 0 0 0
    2 0 0 0 0 1 0 0 0
    8 0 0 0 0 1 0 0 0

  • #2
    In the data example you provided, there is no variation in any of the predictors.
    Code:
    tab1 post md postXmd
    That is, all rows have the same values on those variables. You only have variation in the outcome. In order to run a multiple regression model, you need to have variation not only in the outcome, but in the predictors. A Google search with the phrase regression with no variability in predictors will give you a number of links with good explanations.

    Comment


    • #3
      Erik Ruzek

      Thank you for the response.

      Thank you for pointing out the problem with my sample, but in my dataset there are around 16,000 observations where the predictors are all dummy variables. So md = 1 if the store in Montgomery County, md and postXmd = post*md so that'll equal 1 if both values are 1. So,I believe there is variability but the values only change between 0 and 1 as they are dummy variables, but this was not shown in my sample.

      ------------------------------------------------------------------------------
      plastic | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      md | .3111623 .0421711 7.38 0.000 .2285022 .3938223
      post | -.1848998 .0402348 -4.60 0.000 -.2637645 -.1060352
      postXmd | -1.020558 .0558515 -18.27 0.000 -1.130033 -.911083
      _cons | 1.59197 .0310405 51.29 0.000 1.531127 1.652813
      ------------------------------------------------------------------------------

      These are my results when I run my regression:

      regress plastic md post postXmd.

      I believe my interaction term coefficient is not correct, but I'm unable to decipher why it is wrong. Wondering if it has something to do with my code.
      Last edited by Gary Hammersmite; 15 Dec 2023, 09:17.

      Comment


      • #4
        Use margins to graph the model-predicted means for plastic.
        Code:
        regress plastic i.md##i.post
        margins md#post
        marginsplot
        Before you make any conclusions, please check the residuals from your model. The data you shared for your plastic variable looked strange. It has a limited range and the most likely values were 0 or 1. The residuals are supposed to be normally-distributed. If they are not, then you may need to consider a different link function for the model.
        Code:
        help regress postestimation plots

        Comment


        • #5
          Gary:
          as an aside to Erik's helpful replies, please note:
          1) you can rely on the wonderful capabiities of -fvvarlist- notation to create interactions and categorical variables:
          Code:
          . regress plastic i.md##i.post
          note: 1.md omitted because of collinearity.
          note: 0.post omitted because of collinearity.
          note: 1.md#0.post omitted because of collinearity.
          
                Source |       SS           df       MS      Number of obs   =       100
          -------------+----------------------------------   F(0, 99)        =      0.00
                 Model |           0         0           .   Prob > F        =         .
              Residual |      616.75        99  6.22979798   R-squared       =    0.0000
          -------------+----------------------------------   Adj R-squared   =    0.0000
                 Total |      616.75        99  6.22979798   Root MSE        =     2.496
          
          ------------------------------------------------------------------------------
               plastic | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                  1.md |          0  (omitted)
                0.post |          0  (omitted)
                       |
               md#post |
                  1 0  |          0  (omitted)
                       |
                 _cons |       2.25   .2495956     9.01   0.000     1.754748    2.745252
          ------------------------------------------------------------------------------
          2) it is difficult to believe that, with two interacted predictors only, your regression is correcly specified (see -linktest-);
          3) if you're using Stata 17 or 18 (as per FAQ, if you're niot using the last version of Stata, that is 18) you should highlight that in your posts. Thanks), you can rely on -didrgress- for DID.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment

          Working...
          X