Wishlist for Stata 18

Roman Mostazir replied

13 Jan 2022, 03:54
Latent class growth curve models. If we have a 'k'-class solution for a model (estimated by gsem, where k=any numer 2/3 etc. of class solutions), Stata does not allow the categorical latent classes to have varying slopes. In M-plus that is possible as implemented in this paper . I attached the main figure in context where k=number of latent class solutions, age/vas etc are potential covariates for class membership, groups are randomised (0=control, 1=treatment), u=observed compliance of treatment (Treatment group: 0=no,1=yes; Control group: missing) and F_D* are the outcomes observed over six time periods.
3 likes
Leave a comment:

Maarten Buis replied

12 Jan 2022, 03:50

Originally posted by Hua Peng (StataCorp) View Post

Code:

hep putdocx paragraph

in the text_options table, see hyperlink(link) option. Does this meet your need?

Hua Peng (StataCorp),

Thank you for the suggestion. If I understand the helpfile correctly, then this enables one to include a link to an external webpage. What I was looking for was the possibility to link to another spot in the same document.

To give you an idea of what I am looking for, here is a .do file for creating a codebook for the auto dataset. It starts with a list of variables on the first page, and than page after page of more detailed descriptions of each variable. The way someone would use the codebook is the she/he looks at the list of variables, see if some variable looks interesting based on the name or label, than go to the page with detailed description of that variable make a decision on whether that variable is really interesting for her/his purposes, go back to the list of variables again find the next potentially useful variable etc. So there is a lot of back and forth between the list of variables and the detailed descriptions of the variables (all in the same document). With only 12 variables that is manageable, in a dataset with 100s of variables repeatedly having to find the right page gets really annoying really quickly. What I would like is to have the name of the variable in the list of variables (the entries in the table vars) be a link to the page that contains the descriptives for that variable.

As an aside, I know I could have used the new table command for some the tables in the code below, but I am preparing a course for an organization that has Stata 16 not 17.

Code:

clear all
cd "c:\temp"

sysuse auto, clear

putdocx begin, footer(footer_pn) pagesize(A4)

//footer
putdocx paragraph, tofooter(footer_pn)
putdocx pagenumber
putdocx text ("/")
putdocx pagenumber, totalpages

//title
putdocx paragraph, style(Heading1)
mata: st_local("fn",pathbasename(`"`c(filename)'"'))
local fl : data label
putdocx text ("Codebook for `fn'")
if "`fl'" != "" {
    putdocx text (": `fl'"), italic
}

//data properties
putdocx paragraph, style(Heading2)
putdocx text ("Properties of file")

putdocx table file = (3,2), layout(autofitcontents)
putdocx table file(1,1) = ("no. of variables"), bold
putdocx table file(1,2) = ("`c(k)'")
putdocx table file(2,1) = ("no. of observations"), bold
putdocx table file(2,2) = ("`=_N'"), nformat(%9.0gc) trim
putdocx table file(3,1) = ("last saved"), bold
putdocx table file(3,2) = ("`c(filedate)'")

putdocx table file(.,.), border(all,nil)
putdocx table file(1,.), border(top,single)
putdocx table file(3,.), border(bottom, single)

//list of variables
putdocx paragraph, style(Heading2)
putdocx text ("List of variables")

putdocx table vars = (`=`c(k)'+1',2), layout(autofitcontents)
putdocx table vars(1,1) = ("variable name"), bold
putdocx table vars(1,2) = ("label"), bold
local i = 2
foreach var of varlist * {
    putdocx table vars(`i'  , 1) = ("`var'") ,
    putdocx table vars(`i++', 2) = (`"`: variable label `var''"'),
}
putdocx table vars(.,.), border(all, nil)
putdocx table vars(1,.) , border(bottom, single) border(top, single)
putdocx table vars(`=`c(k)'+1',.) , border(bottom, single)

//properties of variables

foreach var of varlist * {
    // title
    putdocx pagebreak
    putdocx paragraph, style(Heading2)
    putdocx text ("`var'")
    if `"`: variable label `var''"' != "" {
        putdocx text (`": `: variable label `var''"'), italic
    }
    
    // collect info and frequency table or summary statistics
    capture confirm string variable `var'
    local tab = _rc == 0
    local rawtype = cond(`tab', "string", "numeric")
    local type : type `var'
    tempvar mark
    bys `var' : gen byte `mark' = _n == 1 if !missing(`var')
    count if `mark' == 1
    local tab = (r(N) <= 10) | `tab'
    local n_distinct = r(N)
    
    // variable properties
    putdocx paragraph, style(Heading3)
    putdocx text ("Properties of variable")

    putdocx table desc_`var' = (4,2), layout(autofitcontents)
    putdocx table desc_`var'(1,1) = ("type"), bold

    putdocx table desc_`var'(1,2) = ("`rawtype' (`type')")

    putdocx table desc_`var'(2,1) = ("missing values"), bold
    count if missing(`var')
    putdocx table desc_`var'(2,2) = (r(N)), nformat(%9.0gc) trim
 
    putdocx table desc_`var'(3,1) = ("non-missing values"), bold
    putdocx table desc_`var'(3,2) = (_N-r(N)), nformat(%9.0gc) trim
    
    putdocx table desc_`var'(4,1) = ("distinct non-missing values"), bold
    putdocx table desc_`var'(4,2) = (`n_distinct'), nformat(%9.0gc) trim

    putdocx table desc_`var'(.,.), border(all, nil)
    putdocx table desc_`var'(1,.), border(top, single)
    putdocx table desc_`var'(4,.), border(bottom, single)

    if `tab' {
           putdocx paragraph , style(Heading3)
        putdocx text ("Table")
        
        frame
        local data = r(currentframe)
        frame copy `data' table, replace
        frame change table

        tempvar freq
        bysort `var' : gen `freq' = _N
        by     `var' : keep if _n == 1

        local val_lab : value label `var'
        if "`val_lab'" != "" {
            tempvar lab
            decode `var', gen(`lab')
            local label "label"
        }

        keep `lab' `var' `freq'
        order `var' `lab' `freq'
        rename `var' value
        if "`val_lab'" != "" {
            rename `lab' label
            label values value .
        }
        rename `freq' frequency
    
        putdocx table tab_`var' = data(value `label' frequency), varnames layout(autofitcontents)
        putdocx table tab_`var'(.,.), border(all, nil)
        putdocx table tab_`var'(1,.), border(top, single) bold
        putdocx table tab_`var'(1,.), border(bottom, single)
        putdocx describe tab_`var'
        putdocx table tab_`var'(`r(nrows)',.), border(bottom, single)

        frame change `data'        
    }
    else {
          putdocx paragraph , style(Heading3)
        putdocx text ("Summary")
    
        sum `var', detail

        putdocx table tab_`var' = (5,2), layout(autofitcontents)
        putdocx table tab_`var'(1,1) = ("minimum"), bold
        putdocx table tab_`var'(2,1) = ("25th percentile"), bold
        putdocx table tab_`var'(3,1) = ("50th percentile"), bold
        putdocx table tab_`var'(4,1) = ("75th percentile"), bold
        putdocx table tab_`var'(5,1) = ("maximum"), bold
        putdocx table tab_`var'(1,2) = (r(min))
        putdocx table tab_`var'(2,2) = (r(p25))
        putdocx table tab_`var'(3,2) = (r(p50))
        putdocx table tab_`var'(4,2) = (r(p75))
        putdocx table tab_`var'(5,2) = (r(max))

        putdocx table tab_`var'(.,.), border(all, nil)
        putdocx table tab_`var'(1,.), border(top, single)
        putdocx table tab_`var'(5,.), border(bottom, single)        
    }
}

//close
putdocx save cb, replace

Last edited by Maarten Buis; 12 Jan 2022, 04:15.

Leave a comment:

Leonardo Guizzetti replied

11 Jan 2022, 11:42
Originally posted by Hua Peng (StataCorp) View Post

Maarten Buis,

Code:

hep putdocx paragraph

in the text_options table, see hyperlink(link) option. Does this meet your need?

I read this request as wanting to embed hyperlinks to locations within the document, e.g., to bookmarks. This option appears to be only for external URLs.
1 like
Leave a comment:
Hua Peng (StataCorp) replied

11 Jan 2022, 11:14
Maarten Buis,

Code:

hep putdocx paragraph

in the text_options table, see hyperlink(link) option. Does this meet your need?
Last edited by Hua Peng (StataCorp); 11 Jan 2022, 11:22.
Leave a comment:
Roger Newson replied

10 Jan 2022, 03:22
Still on the subject of putdocx, I would like to see Scalable Vector Graphics (,svg) added to the list of file formats that can be embedded by putdocx image. And it might be even better if this was done later in Stata 17.
4 likes
Leave a comment:
Maarten Buis replied

10 Jan 2022, 01:23
I would like it if putdocx would be able to include hyperlinks. I use putdocs to automatically generate a codebook from a dataset. Since a codebook is not intended to be read cover to cover (unless you suffer from a really bad case of insomnia) you want to allow the user to jump back and forth between a list of variables and the detailed descriptions of the individual variables, which is what I want to use hyperlinks for.
4 likes
Leave a comment:
Jared Greathouse replied

09 Jan 2022, 11:39
Pay no mind to #225, apparently state space models are the same thing (essentially) as Bayesian Structural time series models.

I really should take a class about Bayesian stats one of these days.
Leave a comment:
Leonardo Guizzetti replied

08 Jan 2022, 09:59
A final followup to #236, the -merge m:m- behaviour is akin to a data step match-merge join in SAS, for those familiar. The two can only be made to coincide when, within id variables, the join operation involves a one-to-many or one-to-one (either direction) relationship.
1 like
Leave a comment:
Leonardo Guizzetti replied

07 Jan 2022, 10:26
Originally posted by William Lisowski View Post

I do not recall any SQL that does matching in that fashion. To the limits of my memory it seems to me that every SQL join I did where key K appeared in I rows in the left table and J rows in the right table produced J*K rows in the resulting table with key K. Stata merge m:m describes a procedure that produces max(I,J) observations in the resulting dataset.

Ah well spotted, and clearly demonstrates the deeper issue with -merge m:m-. I took would have expected the result to e J*K in size, so I evidently picked a toy problem that didn't properly test the two programs. My previous post can be disregarded and thank you for the clear elucidation of the issue.
5 likes
Leave a comment:
William Lisowski replied

07 Jan 2022, 10:07
@Leonardo Guizzeti #233 -

I will start by confessing that the last SQL I wrote was in 2014, so my memories of SQL have passed their half-life several times over by now.

You write

merge m:m- is precisely an SQL full join on a common, coalesced ID

but that does not agree with my understanding, which is why I wrote the snarky comment beginning "No database person ..."

From the PDF documentation for merge we read

In an m:m merge, observations are matched within equal values of the key variable(s), with the first observation being matched to the first; the second, to the second; and so on. If the master and using have an unequal number of observations within the group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group.

I do not recall any SQL that does matching in that fashion. To the limits of my memory it seems to me that every SQL join I did where key K appeared in I rows in the left table and J rows in the right table produced I*J rows in the resulting table with key K. Stata merge m:m describes a procedure that produces max(I,J) observations in the resulting dataset.

Starting with your example, if we replace datasets `a' and `b' with

Code:

input byte(a b) 1 1 1 2 end input byte(a c) 1 3 1 4 1 5 end

we achieve the following result from merge m:m

Code:

. list, clean a b c _merge 1. 1 1 3 Matched (3) 2. 1 2 4 Matched (3) 3. 1 2 5 Matched (3)

and from joinby

Code:

. list a b c _merge, clean a b c _merge 1. 1 1 3 both in master and using data 2. 1 1 4 both in master and using data 3. 1 1 5 both in master and using data 4. 1 2 3 both in master and using data 5. 1 2 4 both in master and using data 6. 1 2 5 both in master and using data
Last edited by William Lisowski; 07 Jan 2022, 10:56. Reason: Corrected J*K to I*J
4 likes
Leave a comment:
Leonardo Guizzetti replied

07 Jan 2022, 08:47
I completely agree with the spirit of William's post #232. Users should be steered away from -merge m:m- since what they wish to accomplish is better handled by -joinby-.

To make things equivalent, -merge m:m- is precisely an SQL full join on a common, coalesced ID. That is, observations with unmatched identifiers in either dataset are retained by default with -merge m:m-. In contrast, -joinby- defaults to removing those unmatched observations, which is usually what is desired. This behaviour can be counteracted by adding the -unmatched(both)- option to -joinby-. That said, if you know you don't have unmatched identifiers, -merge m:m- wouldn't necessarily be wrong, but it would be comparatively inefficient, and that is easily avoided nevertheless.

Toy example

Code:

tempname a b input byte(a b) 1 4 1 6 2 9 3 3 5 . end sort a save `a', replace list drop _all input byte(a c) 1 2 2 8 2 3 3 5 3 6 4 . end sort a save `b', replace list use `a', clear merge m:m a using `b' sort a b c list use `a', clear joinby a using `b', unmatched(both) sort a b c list a b c _merge

Equivalent join using SQL (with a SAS accent)

Code:

select coalesce(a.a, b.a) as a, a.b, b.c from one as a full join two as b on a.a=b.a order by a,b,c;
1 like
Leave a comment:
William Lisowski replied

07 Jan 2022, 08:07
I expect that an earlier post in this topic has repeated the continual request that Stata address the problems created by the merge m:m command. In answering a question today I happened to review the help merge documentation and saw that it includes no reference to the problems with merge m:m.

If the underlying issues cannot be addressed directly, I suggest that the output of help merge be expanded to include a warning derived from the warning that appears in the PDF documentation, since it's often an uphill battle to get new users to read the more than the help output, if even that.

At the same time, there's a particular problem in both the PDF and the help output: the introduction includes

merge can perform match merges (one-to-one, one-to-many, many-to-one, and many-to-many), which are often called 'joins' by database people.

No database person anywhere ever used the SQL join command to accomplish what is produced by Stata's "many-to-many" merge command. But a database person might interpret the quoted statement as equating the Stata "many-to-many" merge with the SQL m-by-m join (I did when I was new to Stata). The m-by-m join is the equivalent not the Stata's merge m:m command but to Stata's joinby command, which is mentioned nowhere in the output of help merge.

If this pothole can't be fixed, at least make a better effort to steer new users around it.
3 likes
Leave a comment:
George Ford replied

06 Jan 2022, 14:05
Have the missing values tables store all results (not just the last one, as with mdesc) in a table.
Leave a comment:
Jeff Wooldridge replied

06 Jan 2022, 13:47
Originally posted by Joao Santos Silva View Post

I would like to suggest a few changes to the way the RESET test (estat ovtest) is implemented:

1 - The most important one is that the test should be based on the same type of covariance matrix used in the estimation of the main model. It does not make sense to run a model with some form of robust standard errors, and the perform the RESET with plain-vanilla standard errors.

2 - This is more a question of taste, but personally I would prefer if the the number of powers included by default could be reduced to just one or two (or that we have an option to choose the number of powers to include).

3 - Finally, it would be great if the misleading name of the command could be changed. I know that this may be asking too much, but at least perhaps we could have estat reset as synonymous to estat ovtest.

I agree with Joao about RESET. In fact, I had a Twitter thread on this back in March: https://twitter.com/jmwooldridge/sta...12169036201985
Leave a comment:
Joao Santos Silva replied

06 Jan 2022, 05:11
I would like to suggest a few changes to the way the RESET test (estat ovtest) is implemented:

1 - The most important one is that the test should be based on the same type of covariance matrix used in the estimation of the main model. It does not make sense to run a model with some form of robust standard errors, and the perform the RESET with plain-vanilla standard errors.

2 - This is more a question of taste, but personally I would prefer if the the number of powers included by default could be reduced to just one or two (or that we have an option to choose the number of powers to include).

3 - Finally, it would be great if the misleading name of the command could be changed. I know that this may be asking too much, but at least perhaps we could have estat reset as synonymous to estat ovtest.
4 likes
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: