Can we use weights or do svy for didregress?

Lars Pete

Join Date: Nov 2020

Posts: 118
#1

Can we use weights or do svy for didregress?

21 Nov 2023, 23:00

Dear all,

I have svyset. I am trying to run:

Code:

didregress (wjnilf i.racecat i.sex age c.age#c.age i.educat fullpart incwage) (treat14), group(statefips) time(year) aggregate(dlang)

wjnilf is binary. I am also doing a probit and glm logit. However, I also want to run a didregress. However, unlike probit and logit, I can't svy didregress. How can I use weights in did? Thanks.
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#2

22 Nov 2023, 01:54

Code:

didregress (wjnilf i.racecat i.sex age c.age#c.age i.educat fullpart incwage) (treat14) [pweight=weightvar], group(statefips) time(year) aggregate(dlang)

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Lars Pete

Join Date: Nov 2020

Posts: 118
#3

23 Nov 2023, 10:37

Thanks a lot Maarten Buis . So, since we can't svy, we can't use the replicate weights.
A follow-up question: Is this model alright if wjnilf is a binary variable?
Here treat14 is treated#post14. What would be the difference between this model and:

Code:

reg wjnilf treat14 i.racecat i.sex age c.age#c.age i.educat fullpart incwage i.statefips i.year [pw=weightvar]

OR

Code:

probit wjnilf treat14 i.racecat i.sex age c.age#c.age i.educat fullpart incwage i.statefips i.year [pw=weightvar]

I get different results. Which would be better?

Last edited by Lars Pete; 23 Nov 2023, 10:41. Reason: Added probit with LPM
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#4

24 Nov 2023, 07:03

Lars: What do you mean by "different results." Do you mean after you compute the average partial effect in the probit it's fairly different from the coefficient in linear regression?
Comment

Lars Pete

Join Date: Nov 2020
Posts: 118

24 Nov 2023, 16:09

Dear Jeff Jeff Wooldridge : Thank you for your comment.

#A. Actually, I was using i.statefips#i.year where they differed. But without the interaction of FEs, they remain same. So, I conclude all 3 models get same results while using state and year FE separately? The treatment is state level policy in 2014. Will deem these (health i.racecat i.sex age c.age#c.age i.educat fullpart ln_incwage ln_incnonwage disabwrk) as `controls'. Treat14 = treated_state * post14.

Code:

reg wjnilf treat14 `controls'  i.statefips i.year [pw=asecwt] //(coeff =  .0102327   P value = 0.404 )
probit wjnilf treat14  `controls' i.statefips i.year [pw=asecwt]
margins, dydx(treat14) // (dy/dx =   .0106622  P value 0.371 )
didregress (wjnilf `controls')(treat14) [pw=asecwt], group(statefips) time(year) // (coeff .01 Pvalue .318)

While using interactions however, ( I am confused whether I should use # or not or I should use # which just does istate#i.year or should do ## which does i.state i.year i.state#i.year), they differ (I am not able to use interaction in didrregress):

Code:

reg wjnilf treat14 `controls' i.statefips#i.year [pw=asecwt] // (coeff = -.003 P value = 0.962)
probit wjnilf treat14 `controls' i.statefips#i.year [pw=asecwt]
margins, dydx(treat14) // (coeff = -.01 Pvalue = .898)

#B. Its a pooled cross sectional data i.e. cps march. I cannot xtset. However, If i do this:

Code:

isid cpsidp year   //variables cpsidp and year do not uniquely identify the observations
bysort cpsidp year: assert _N == 1   //9 contradictions in 791,864 by-groups assertion is false
duplicates report cpsidp year   //--------------------------------------
*   Copies | Observations       Surplus
*----------+---------------------------
*       1 |       791855             0
*    39984 |        39984         39983
*    42299 |        42299         42298
*    42614 |        42614         42613
*    42814 |        42814         42813
*    43905 |        43905         43904
*    43963 |        43963         43962
*    44077 |        44077         44076
*    45435 |        45435         45434
*    45441 |        45441         45440
*--------------------------------------

duplicates tag cpsidp year, gen(isdup)
edit if isdup
drop isdup

xtset cpsidp year   //. xtset cpsidp year repeated time values within panel r(451);


duplicates drop cpsidp year, force
duplicates tag cpsidp year, gen(isdup)
drop if isdup == 1
xtset cpsidp year   // Panel variable: cpsidp (unbalanced), Time variable: year, 2011 to 2019, Delta: 1 unit

I could get around it. Then the results are:

Code:

*xtreg wjnilf treat14 `controls', fe // (coeff =  -.1826311 P value =0.023 )
*eststo fix1
*xtreg wjnilf treat14 `controls', re // (coeff = .0229734, P value =  0.001)
*eststo ran1
hausman fix1 ran1 //Test of H0: Difference in coefficients not systematic. chi2(16) = (b-B)'[(V_b-V_B)^(-1)](b-B)=  21.49. Prob > chi2 = 0.1605


xtprobit wjnilf treat14 `controls', fe // fixed-effects model not available r(198);
xtprobit wjnilf treat14 `controls', re // (coeff = .075845, P value = 0.215 )  
margins, dydx(treat14)   // (coeff = .1173941 , P value = 0.002)  
xtdidregress (wjnilf `controls')(treat14), group(statefips) time(year) // (coeff -.1988181  Pvalue .002)

Note: Here probit re and reg re differ.
Note: I can use Probit with RE which is not possible without xtset in #A .....BUT none of these models in #B would allow me to use weights. The errors are:"pweight not allowed in random-effects caser(101);" or for xtreg: "weight must be constant within cpsidp r(199);"

#C. Another approach which was taken in a QJE labor paper for DID using the same data was used: collapsing at the state level (they did take population weighed averages I suppose) and then doing DID where wjnilf will be average weighted proportion by state and year.
The same paper then used DDD to see ATET for a subsample. There they did (without collapsing or taking state level weighted averages) something like:

Code:

 reg wjnilf treat14*1(subsample indicator) `controls' i.cpsidp#i.state i.cpsidp#i.year i.state#i.year [pw=asecwt]

I believe this can't be done in probit because of the incidental parameter problem?

It is hard to decide which one should I go with? #A or #B or #C? In #A, probit and LPM give same results without i.state#i.year but different results with i.state#i.year. In #B Probit and LPM give different results. But the major problem in #B is I can't use weights. This is just one dep var among many, so I have to stick with a minimalist approach. I will also need to use DDD for subsample analysis. I'd really appreciate any feedback. Thank you.

Last edited by Lars Pete; 24 Nov 2023, 16:44. Reason: Added weights in #C and a note in #B

Comment

Lars Pete

Join Date: Nov 2020

Posts: 118
#6

26 Nov 2023, 21:14

Jeff Wooldridge what do you think?
Comment

Announcement