Wishlist for Stata 18

Sebastian Kripfganz replied

05 Jan 2022, 09:38
Originally posted by Jeff Wooldridge View Post

I think it's really important to update the differencing operator to make it easy on researchers to estimate panel data models by first differencing. Not allowing factor notation with D.() and replacing the difference in the interaction with the interaction of the differences are both shortcomings that are easy to fix. I think it contributes to the confusion of what is a model and what is an estimating equation. In panel data applications especially, differencing is used to eliminate heterogeneity in the levels equation. That is, FD is an alternative to FE, and so any model that can be estimated using xtreg, fe should be estimable by differencing the entire equation. It's cumbersome to have to create interactions "by hand" and it means that one cannot use the margins options. Also, not allowing something like i.year is also inconvenient. This is fundamental stuff, and it should be allowed for both OLS instrumental variables commands.

I agree that the way Stata deals with situations such as D.(c.x1#c.x2), which is expanded to cD.x1#cD.x2, is unfortunate. As I said in this other thread, I doubt that StataCorp will do anything about it.

A particular problem arises in this context in combination with macro variables: Even though it feels natural to do so, you never ever should code D.`var' or D.(`var') if `var' might contain interaction effects such as the above. This often requires to replace the variable list in `var' with temporary variables to avoid unintended consequences.

Last edited by Sebastian Kripfganz; 05 Jan 2022, 09:44.
2 likes
Leave a comment:
Jeff Wooldridge replied

05 Jan 2022, 09:31
I think it's really important to update the differencing operator to make it easy on researchers to estimate panel data models by first differencing. Not allowing factor notation with D.() and replacing the difference in the interaction with the interaction of the differences are both shortcomings that are easy to fix. I think it contributes to the confusion of what is a model and what is an estimating equation. In panel data applications especially, differencing is used to eliminate heterogeneity in the levels equation. That is, FD is an alternative to FE, and so any model that can be estimated using xtreg, fe should be estimable by differencing the entire equation. It's cumbersome to have to create interactions "by hand" and it means that one cannot use the margins options. Also, not allowing something like i.year is also inconvenient. This is fundamental stuff, and it should be allowed for both OLS instrumental variables commands.
3 likes
Leave a comment:
Anna Volkert replied

03 Jan 2022, 13:48
Multilevel Zero-One Inflated Beta Regression Model
Leave a comment:
Jared Greathouse replied

30 Dec 2021, 19:18
I know I usually say stuff like this, but I think it has to be said for those interested in treatment effects: another interesting algorithm we might find useful would be the Bayesian Structural Time Series approach. It's a method that's related to synthetic controls, but of course differs radically in the underlying theoretical framework. It's already been applied in a few interesting contexts, and I'd hate to see R beat Stata further (as much as it currently does, anyhow) in the treatment effects department.

I know StataCorp did quite a lot with teffects in 16 and 17, but I essentially think we should, and should want to see formalized, StataCorp implementations of synthetic controls, RD, and other treatment effect estimators, just as we have with difference-in-differences in (i think?) Stata 17.
1 like
Leave a comment:
William Lisowski replied

27 Dec 2021, 10:57
Euslaner (post #223) -

This is off-topic here, which is the Wishlist for Stata 18.

Please post your question to a new topic, preferably using code delimiters [CODE] and [/CODE] to render the example output more readable.
3 likes
Leave a comment:
Euslaner replied

27 Dec 2021, 10:51
This should be simple but I can't get it to work. I have a data set attached in which I want to create a variable for US states. The state variables in the World Values Study are X048ISO and X048WVS.I created a variable called region from X048WVS. I then wanted to create a state level variable when X048WVS is not equal to a region value. So I typed and received:

gen str24 state = "."

. des state

storage display value
variable name type format label variable label
-------------------------------------------------------------------------------------------------------------------------------
state str24 %24s

. replace state = X048WVS if region1 == "."
type mismatch
r(109);

where region is gen str24 state = "."

. des state

storage display value
variable name type format label variable label
-------------------------------------------------------------------------------------------------------------------------------
state str24 %24s

. replace state = X048WVS if region1 == "."
type mismatch
r(109);

where region ==

. fre X048WVS if S003 == 840

X048WVS -- Region where the interview was conducted (WVS)
----------------------------------------------------------------------------------------
| Freq. Percent Valid Cum.
-------------------------------------------+--------------------------------------------
Valid -2 No answer | 55 0.62 0.62 0.62
840001 US: New England | 364 4.13 4.13 4.75
840002 US: Middle Atlantic States | 962 10.91 10.91 15.66
840003 US: South Atlantic | 971 11.01 11.01 26.67
840004 US: East South Central | 462 5.24 5.24 31.91
840005 US: West South Central | 670 7.60 7.60 39.51
840006 US: East North Central | 996 11.29 11.29 50.80
840007 US: West North Central | 418 4.74 4.74 55.54
840008 US: Rocky Mountain state | 369 4.18 4.18 59.72
840009 US: Northwest | 156 1.77 1.77 61.49
840010 US: California | 445 5.05 5.05 66.54
840011 US: Alaska | 3 0.03 0.03 66.57
840012 US: Hawai | 4 0.05 0.05 66.62
840013 US: Pacific | 348 3.95 3.95 70.56
840201 US: AL Alabama | 28 0.32 0.32 70.88
840202 US: AR Arkansas | 11 0.12 0.12 71.01
840203 US: AZ Arizona | 73 0.83 0.83 71.83
840204 US: CA California | 282 3.20 3.20 75.03
840205 US: CO Colorado | 73 0.83 0.83 75.86
840206 US: CT Connecticut | 26 0.29 0.29 76.15
: | : : : :
840231 US: NJ New Jersey | 56 0.63 0.63 89.36
840232 US: NM New Mexico | 27 0.31 0.31 89.67
840233 US: NV Nevada | 22 0.25 0.25 89.92
840234 US: NY New York | 118 1.34 1.34 91.26
840235 US: OH Ohio | 101 1.15 1.15 92.40
840236 US: OK Oklahoma | 28 0.32 0.32 92.72
840237 US: OR Oregon | 33 0.37 0.37 93.09
840238 US: PA Pennsylvania | 83 0.94 0.94 94.04
840239 US: RI Rhode Island | 8 0.09 0.09 94.13
840240 US: SC South Carolina | 12 0.14 0.14 94.26
840241 US: SD South Dakota | 24 0.27 0.27 94.53
840242 US: TN Tennessee | 55 0.62 0.62 95.16
840243 US: TX Texas | 177 2.01 2.01 97.17
840244 US: UT Utah | 22 0.25 0.25 97.41
840245 US: VA Virginia | 68 0.77 0.77 98.19
840246 US: VT Vermont | 6 0.07 0.07 98.25
840247 US: WA Washington | 65 0.74 0.74 98.99
840248 US: WI Wisconsin | 68 0.77 0.77 99.76
840249 US: WV West Virginia | 19 0.22 0.22 99.98
840250 US: WY Wyoming | 2 0.02 0.02 100.00
Total | 8819 100.00 100.00
--------------------------------------------------------------------------------------

and I wanted to assign state values of . fre X048WVS if S003 == 840

X048WVS -- Region where the interview was conducted (WVS)
----------------------------------------------------------------------------------------
| Freq. Percent Valid Cum.
-------------------------------------------+--------------------------------------------
Valid -2 No answer | 55 0.62 0.62 0.62
840001 US: New England | 364 4.13 4.13 4.75
840002 US: Middle Atlantic States | 962 10.91 10.91 15.66
840003 US: South Atlantic | 971 11.01 11.01 26.67
840004 US: East South Central | 462 5.24 5.24 31.91
840005 US: West South Central | 670 7.60 7.60 39.51
840006 US: East North Central | 996 11.29 11.29 50.80
840007 US: West North Central | 418 4.74 4.74 55.54
840008 US: Rocky Mountain state | 369 4.18 4.18 59.72
840009 US: Northwest | 156 1.77 1.77 61.49
840010 US: California | 445 5.05 5.05 66.54
840011 US: Alaska | 3 0.03 0.03 66.57
840012 US: Hawai | 4 0.05 0.05 66.62
840013 US: Pacific | 348 3.95 3.95 70.56
840201 US: AL Alabama | 28 0.32 0.32 70.88
840202 US: AR Arkansas | 11 0.12 0.12 71.01
840203 US: AZ Arizona | 73 0.83 0.83 71.83
840204 US: CA California | 282 3.20 3.20 75.03
840205 US: CO Colorado | 73 0.83 0.83 75.86
840206 US: CT Connecticut | 26 0.29 0.29 76.15
: | : : : :
840231 US: NJ New Jersey | 56 0.63 0.63 89.36
840232 US: NM New Mexico | 27 0.31 0.31 89.67
840233 US: NV Nevada | 22 0.25 0.25 89.92
840234 US: NY New York | 118 1.34 1.34 91.26
840235 US: OH Ohio | 101 1.15 1.15 92.40
840236 US: OK Oklahoma | 28 0.32 0.32 92.72
840237 US: OR Oregon | 33 0.37 0.37 93.09
840238 US: PA Pennsylvania | 83 0.94 0.94 94.04
840239 US: RI Rhode Island | 8 0.09 0.09 94.13
840240 US: SC South Carolina | 12 0.14 0.14 94.26
840241 US: SD South Dakota | 24 0.27 0.27 94.53
840242 US: TN Tennessee | 55 0.62 0.62 95.16
840243 US: TX Texas | 177 2.01 2.01 97.17
840244 US: UT Utah | 22 0.25 0.25 97.41
840245 US: VA Virginia | 68 0.77 0.77 98.19
840246 US: VT Vermont | 6 0.07 0.07 98.25
840247 US: WA Washington | 65 0.74 0.74 98.99
840248 US: WI Wisconsin | 68 0.77 0.77 99.76
840249 US: WV West Virginia | 19 0.22 0.22 99.98
840250 US: WY Wyoming | 2 0.02 0.02 100.00
Total | 8819 100.00 100.00
--------------------------------------------------------------------------------------

and I wanted to attach values to state from W048WVS when W048WVS is a state, not a region. But my command didn;t work (see above). I tried to upload the data set (very small) but it wouldn;'t load. But this should be obvious to people who can do this better than I can.

Thanks,

Ric Uslaner
Leave a comment:
Joao Santos Silva replied

27 Dec 2021, 06:41
As discussed here, margins should not be available after nonlinear models with fixed effects are estimated. The explanation for that is simple: any interesting quantity that we may want to compute will depend on the value of the fixed effects, which are not estimated by these commands. Therefore, margins computes something that most of the times is meaningless. This could be done in a future update, but at least it would be good to have this looked into in the next version.
5 likes
Leave a comment:
Ali Atia replied

25 Dec 2021, 14:24
Currently, the shell command is ignored when running Stata in batch mode in Windows (leaving the notice "request ignored because of batch mode" in the log-file). It would be nice if this were rectified in a future update.
Leave a comment:

Bruce Weaver replied

24 Dec 2021, 07:16

In 2019, Goncalo Cotovio asked if there is an immediate form of the paired t-test. The answer then, as now, is that there is not an immediate form of the paired t-test. But why not? The paired t-test can easily be computed from the following summary data:

n = the number of paired scores
r = the correlation between the paired scores
#mean1 = mean of 1st sample
#sd1 = SD of 1st sample
#mean2 = mean of 2nd sample
#sd2 = SD of 2nd sample

This is exactly the same number of arguments needed for the immediate form of the unpaired t-test:

Code:

    Immediate form of two-sample t test

        ttesti #obs1 #mean1 #sd1 #obs2 #mean2 #sd2 [, options2]

For an immediate form of the paired t-test, the syntax might be something like this:

Code:

    Immediate form of two-sample paired t test

        ttesti #obs #r #mean1 #sd1 #mean2 #sd2, paired [options]

The following code shows the needed computations and compares the results to those from -ttest- using the raw data.

Cheers,
Bruce

Code:

clear
webuse fuel
summarize  
* Write the needed summary measures to the dataset
quietly summarize mpg1
generate byte n = r(N) in 1
generate m1 = r(mean) in 1
generate sd1 = r(sd) in 1
quietly summarize mpg2
generate m2 = r(mean) in 1
generate sd2 = r(sd) in 1
quietly pwcorr mpg1 mpg2
generate r = r(rho) in 1
* Now compute the paired t-test from the summary data
generate mdiff = m1-m2
generate sddiff = sqrt(sd1^2+sd2^2-2*r*sd1*sd2)
generate sediff = sddiff/sqrt(n)
generate tobs = mdiff/sediff
generate byte df = n-1
generate pval = ttail(df,abs(tobs))*2
list mdiff-pval in 1
* Compare results to those from -ttest-
ttest mpg1==mpg2

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: