Wishlist for Stata 18

Anna Volkert

Join Date: Jan 2022

Posts: 1
#226

03 Jan 2022, 13:48

Multilevel Zero-One Inflated Beta Regression Model
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#227

05 Jan 2022, 09:31

I think it's really important to update the differencing operator to make it easy on researchers to estimate panel data models by first differencing. Not allowing factor notation with D.() and replacing the difference in the interaction with the interaction of the differences are both shortcomings that are easy to fix. I think it contributes to the confusion of what is a model and what is an estimating equation. In panel data applications especially, differencing is used to eliminate heterogeneity in the levels equation. That is, FD is an alternative to FE, and so any model that can be estimated using xtreg, fe should be estimable by differencing the entire equation. It's cumbersome to have to create interactions "by hand" and it means that one cannot use the margins options. Also, not allowing something like i.year is also inconvenient. This is fundamental stuff, and it should be allowed for both OLS instrumental variables commands.
3 likes
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#228

05 Jan 2022, 09:38

Originally posted by Jeff Wooldridge View Post

I think it's really important to update the differencing operator to make it easy on researchers to estimate panel data models by first differencing. Not allowing factor notation with D.() and replacing the difference in the interaction with the interaction of the differences are both shortcomings that are easy to fix. I think it contributes to the confusion of what is a model and what is an estimating equation. In panel data applications especially, differencing is used to eliminate heterogeneity in the levels equation. That is, FD is an alternative to FE, and so any model that can be estimated using xtreg, fe should be estimable by differencing the entire equation. It's cumbersome to have to create interactions "by hand" and it means that one cannot use the margins options. Also, not allowing something like i.year is also inconvenient. This is fundamental stuff, and it should be allowed for both OLS instrumental variables commands.

I agree that the way Stata deals with situations such as D.(c.x1#c.x2), which is expanded to cD.x1#cD.x2, is unfortunate. As I said in this other thread, I doubt that StataCorp will do anything about it.

A particular problem arises in this context in combination with macro variables: Even though it feels natural to do so, you never ever should code D.`var' or D.(`var') if `var' might contain interaction effects such as the above. This often requires to replace the variable list in `var' with temporary variables to avoid unintended consequences.

Last edited by Sebastian Kripfganz; 05 Jan 2022, 09:44.

https://www.kripfganz.de/stata/
2 likes
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#229

06 Jan 2022, 05:11

I would like to suggest a few changes to the way the RESET test (estat ovtest) is implemented:

1 - The most important one is that the test should be based on the same type of covariance matrix used in the estimation of the main model. It does not make sense to run a model with some form of robust standard errors, and the perform the RESET with plain-vanilla standard errors.

2 - This is more a question of taste, but personally I would prefer if the the number of powers included by default could be reduced to just one or two (or that we have an option to choose the number of powers to include).

3 - Finally, it would be great if the misleading name of the command could be changed. I know that this may be asking too much, but at least perhaps we could have estat reset as synonymous to estat ovtest.
4 likes
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#230

06 Jan 2022, 13:47

Originally posted by Joao Santos Silva View Post

I would like to suggest a few changes to the way the RESET test (estat ovtest) is implemented:

1 - The most important one is that the test should be based on the same type of covariance matrix used in the estimation of the main model. It does not make sense to run a model with some form of robust standard errors, and the perform the RESET with plain-vanilla standard errors.

2 - This is more a question of taste, but personally I would prefer if the the number of powers included by default could be reduced to just one or two (or that we have an option to choose the number of powers to include).

3 - Finally, it would be great if the misleading name of the command could be changed. I know that this may be asking too much, but at least perhaps we could have estat reset as synonymous to estat ovtest.

I agree with Joao about RESET. In fact, I had a Twitter thread on this back in March: https://twitter.com/jmwooldridge/sta...12169036201985
Comment
George Ford

Join Date: Aug 2014

Posts: 3153
#231

06 Jan 2022, 14:05

Have the missing values tables store all results (not just the last one, as with mdesc) in a table.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#232

07 Jan 2022, 08:07

I expect that an earlier post in this topic has repeated the continual request that Stata address the problems created by the merge m:m command. In answering a question today I happened to review the help merge documentation and saw that it includes no reference to the problems with merge m:m.

If the underlying issues cannot be addressed directly, I suggest that the output of help merge be expanded to include a warning derived from the warning that appears in the PDF documentation, since it's often an uphill battle to get new users to read the more than the help output, if even that.

At the same time, there's a particular problem in both the PDF and the help output: the introduction includes

merge can perform match merges (one-to-one, one-to-many, many-to-one, and many-to-many), which are often called 'joins' by database people.

No database person anywhere ever used the SQL join command to accomplish what is produced by Stata's "many-to-many" merge command. But a database person might interpret the quoted statement as equating the Stata "many-to-many" merge with the SQL m-by-m join (I did when I was new to Stata). The m-by-m join is the equivalent not the Stata's merge m:m command but to Stata's joinby command, which is mentioned nowhere in the output of help merge.

If this pothole can't be fixed, at least make a better effort to steer new users around it.
3 likes
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#233

07 Jan 2022, 08:47

I completely agree with the spirit of William's post #232. Users should be steered away from -merge m:m- since what they wish to accomplish is better handled by -joinby-.

To make things equivalent, -merge m:m- is precisely an SQL full join on a common, coalesced ID. That is, observations with unmatched identifiers in either dataset are retained by default with -merge m:m-. In contrast, -joinby- defaults to removing those unmatched observations, which is usually what is desired. This behaviour can be counteracted by adding the -unmatched(both)- option to -joinby-. That said, if you know you don't have unmatched identifiers, -merge m:m- wouldn't necessarily be wrong, but it would be comparatively inefficient, and that is easily avoided nevertheless.

Toy example

Code:

tempname a b input byte(a b) 1 4 1 6 2 9 3 3 5 . end sort a save `a', replace list drop _all input byte(a c) 1 2 2 8 2 3 3 5 3 6 4 . end sort a save `b', replace list use `a', clear merge m:m a using `b' sort a b c list use `a', clear joinby a using `b', unmatched(both) sort a b c list a b c _merge

Equivalent join using SQL (with a SAS accent)

Code:

select coalesce(a.a, b.a) as a, a.b, b.c from one as a full join two as b on a.a=b.a order by a,b,c;
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#234

07 Jan 2022, 10:07

@Leonardo Guizzeti #233 -

I will start by confessing that the last SQL I wrote was in 2014, so my memories of SQL have passed their half-life several times over by now.

You write

merge m:m- is precisely an SQL full join on a common, coalesced ID

but that does not agree with my understanding, which is why I wrote the snarky comment beginning "No database person ..."

From the PDF documentation for merge we read

In an m:m merge, observations are matched within equal values of the key variable(s), with the first observation being matched to the first; the second, to the second; and so on. If the master and using have an unequal number of observations within the group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group.

I do not recall any SQL that does matching in that fashion. To the limits of my memory it seems to me that every SQL join I did where key K appeared in I rows in the left table and J rows in the right table produced I*J rows in the resulting table with key K. Stata merge m:m describes a procedure that produces max(I,J) observations in the resulting dataset.

Starting with your example, if we replace datasets `a' and `b' with

Code:

input byte(a b) 1 1 1 2 end input byte(a c) 1 3 1 4 1 5 end

we achieve the following result from merge m:m

Code:

. list, clean a b c _merge 1. 1 1 3 Matched (3) 2. 1 2 4 Matched (3) 3. 1 2 5 Matched (3)

and from joinby

Code:

. list a b c _merge, clean a b c _merge 1. 1 1 3 both in master and using data 2. 1 1 4 both in master and using data 3. 1 1 5 both in master and using data 4. 1 2 3 both in master and using data 5. 1 2 4 both in master and using data 6. 1 2 5 both in master and using data

Last edited by William Lisowski; 07 Jan 2022, 10:56. Reason: Corrected J*K to I*J
4 likes
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#235

07 Jan 2022, 10:26

Originally posted by William Lisowski View Post

I do not recall any SQL that does matching in that fashion. To the limits of my memory it seems to me that every SQL join I did where key K appeared in I rows in the left table and J rows in the right table produced J*K rows in the resulting table with key K. Stata merge m:m describes a procedure that produces max(I,J) observations in the resulting dataset.

Ah well spotted, and clearly demonstrates the deeper issue with -merge m:m-. I took would have expected the result to e J*K in size, so I evidently picked a toy problem that didn't properly test the two programs. My previous post can be disregarded and thank you for the clear elucidation of the issue.
5 likes
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#236

08 Jan 2022, 09:59

A final followup to #236, the -merge m:m- behaviour is akin to a data step match-merge join in SAS, for those familiar. The two can only be made to coincide when, within id variables, the join operation involves a one-to-many or one-to-one (either direction) relationship.
1 like
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#237

09 Jan 2022, 11:39

Pay no mind to #225, apparently state space models are the same thing (essentially) as Bayesian Structural time series models.

I really should take a class about Bayesian stats one of these days.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#238

10 Jan 2022, 01:23

I would like it if putdocx would be able to include hyperlinks. I use putdocs to automatically generate a codebook from a dataset. Since a codebook is not intended to be read cover to cover (unless you suffer from a really bad case of insomnia) you want to allow the user to jump back and forth between a list of variables and the detailed descriptions of the individual variables, which is what I want to use hyperlinks for.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
4 likes
Comment
Roger Newson

Join Date: Apr 2014

Posts: 317
#239

10 Jan 2022, 03:22

Still on the subject of putdocx, I would like to see Scalable Vector Graphics (,svg) added to the list of file formats that can be embedded by putdocx image. And it might be even better if this was done later in Stata 17.
4 likes
Comment
Hua Peng (StataCorp)

StataCorp Employee

Join Date: Jun 2014

Posts: 346
#240

11 Jan 2022, 11:14

Maarten Buis,

Code:

hep putdocx paragraph

in the text_options table, see hyperlink(link) option. Does this meet your need?

Last edited by Hua Peng (StataCorp); 11 Jan 2022, 11:22.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment