Would like assistance for some of my regressions

Gary Hammersmite

Join Date: Sep 2023

Posts: 6
#1

Would like assistance for some of my regressions

14 Dec 2023, 20:22

Hello!

I have a dataset of grocery store transactions in Washington DC, Arlington County, VA, and Montgomery County, MD around the time period of Jan. 1 2012 when that locality imposed a bag tax of 5 cents per bag. I'm currently trying to run a differences in differences model to look at the effect of plastic bag consumption in Montgomery County, MD before and after the tax is implemented. I'm currently running this regression:

code:
regress plastic md post postXmd

but my results seem to be off and I cannot understand why.

I'm still fairly new to stata, so any help would be appreciated. Thanks!

I've attached a sample of my dataset with the variables I believe are the key variables for my analysis. The plastic variable represents the number of plastic bags used and reuse represents number of reusable bags used. Post =1 if after Jan 12, 2012 (when tax occurred).

input byte(plastic reuse post) float(dc va md postXmd postXdc postXva)
2 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
0 4 0 0 0 1 0 0 0
3 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0 0
0 5 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
4 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
0 6 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
6 0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0 0
3 0 0 0 0 1 0 0 0
6 0 0 0 0 1 0 0 0
4 0 0 0 0 1 0 0 0
7 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
2 1 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0 0
4 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
0 5 0 0 0 1 0 0 0
7 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
0 2 0 0 0 1 0 0 0
2 2 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
14 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
3 0 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0 0
1 3 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
2 4 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0 0
6 0 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
0 3 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
4 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
12 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
4 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
1 0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
2 5 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0 0
3 3 0 0 0 1 0 0 0
6 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
4 0 0 0 0 1 0 0 0
3 0 0 0 0 1 0 0 0
7 0 0 0 0 1 0 0 0
5 0 0 0 0 1 0 0 0
3 0 0 0 0 1 0 0 0
10 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 0
8 0 0 0 0 1 0 0 0
Tags: None
Erik Ruzek

Join Date: Oct 2017

Posts: 441
#2

15 Dec 2023, 07:20

In the data example you provided, there is no variation in any of the predictors.

Code:

tab1 post md postXmd

That is, all rows have the same values on those variables. You only have variation in the outcome. In order to run a multiple regression model, you need to have variation not only in the outcome, but in the predictors. A Google search with the phrase regression with no variability in predictors will give you a number of links with good explanations.
Comment
Gary Hammersmite

Join Date: Sep 2023

Posts: 6
#3

15 Dec 2023, 09:13

Erik Ruzek

Thank you for the response.

Thank you for pointing out the problem with my sample, but in my dataset there are around 16,000 observations where the predictors are all dummy variables. So md = 1 if the store in Montgomery County, md and postXmd = post*md so that'll equal 1 if both values are 1. So,I believe there is variability but the values only change between 0 and 1 as they are dummy variables, but this was not shown in my sample.

------------------------------------------------------------------------------
plastic | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
md | .3111623 .0421711 7.38 0.000 .2285022 .3938223
post | -.1848998 .0402348 -4.60 0.000 -.2637645 -.1060352
postXmd | -1.020558 .0558515 -18.27 0.000 -1.130033 -.911083
_cons | 1.59197 .0310405 51.29 0.000 1.531127 1.652813
------------------------------------------------------------------------------

These are my results when I run my regression:

regress plastic md post postXmd.

I believe my interaction term coefficient is not correct, but I'm unable to decipher why it is wrong. Wondering if it has something to do with my code.

Last edited by Gary Hammersmite; 15 Dec 2023, 09:17.
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 441
#4

15 Dec 2023, 09:27

Use margins to graph the model-predicted means for plastic.

Code:

regress plastic i.md##i.post margins md#post marginsplot

Before you make any conclusions, please check the residuals from your model. The data you shared for your plastic variable looked strange. It has a limited range and the most likely values were 0 or 1. The residuals are supposed to be normally-distributed. If they are not, then you may need to consider a different link function for the model.

Code:

help regress postestimation plots
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17724

15 Dec 2023, 10:19

Gary:
as an aside to Erik's helpful replies, please note:
1) you can rely on the wonderful capabiities of -fvvarlist- notation to create interactions and categorical variables:

Code:

. regress plastic i.md##i.post
note: 1.md omitted because of collinearity.
note: 0.post omitted because of collinearity.
note: 1.md#0.post omitted because of collinearity.

      Source |       SS           df       MS      Number of obs   =       100
-------------+----------------------------------   F(0, 99)        =      0.00
       Model |           0         0           .   Prob > F        =         .
    Residual |      616.75        99  6.22979798   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =    0.0000
       Total |      616.75        99  6.22979798   Root MSE        =     2.496

------------------------------------------------------------------------------
     plastic | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        1.md |          0  (omitted)
      0.post |          0  (omitted)
             |
     md#post |
        1 0  |          0  (omitted)
             |
       _cons |       2.25   .2495956     9.01   0.000     1.754748    2.745252
------------------------------------------------------------------------------

2) it is difficult to believe that, with two interacted predictors only, your regression is correcly specified (see -linktest-);
3) if you're using Stata 17 or 18 (as per FAQ, if you're niot using the last version of Stata, that is 18) you should highlight that in your posts. Thanks), you can rely on -didrgress- for DID.

Kind regards,
Carlo
(Stata 19.0)

Announcement

Would like assistance for some of my regressions

Comment

Comment

Comment

Comment