Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Roman Mostazir
    replied
    Latent class growth curve models. If we have a 'k'-class solution for a model (estimated by gsem, where k=any numer 2/3 etc. of class solutions), Stata does not allow the categorical latent classes to have varying slopes. In M-plus that is possible as implemented in this paper . I attached the main figure in context where k=number of latent class solutions, age/vas etc are potential covariates for class membership, groups are randomised (0=control, 1=treatment), u=observed compliance of treatment (Treatment group: 0=no,1=yes; Control group: missing) and F_D* are the outcomes observed over six time periods.

    Click image for larger version

Name:	lc.jpg
Views:	1
Size:	71.7 KB
ID:	1644925

    Leave a comment:


  • Maarten Buis
    replied
    Originally posted by Hua Peng (StataCorp) View Post
    Code:
    hep putdocx paragraph
    in the text_options table, see hyperlink(link) option. Does this meet your need?
    Hua Peng (StataCorp),

    Thank you for the suggestion. If I understand the helpfile correctly, then this enables one to include a link to an external webpage. What I was looking for was the possibility to link to another spot in the same document.

    To give you an idea of what I am looking for, here is a .do file for creating a codebook for the auto dataset. It starts with a list of variables on the first page, and than page after page of more detailed descriptions of each variable. The way someone would use the codebook is the she/he looks at the list of variables, see if some variable looks interesting based on the name or label, than go to the page with detailed description of that variable make a decision on whether that variable is really interesting for her/his purposes, go back to the list of variables again find the next potentially useful variable etc. So there is a lot of back and forth between the list of variables and the detailed descriptions of the variables (all in the same document). With only 12 variables that is manageable, in a dataset with 100s of variables repeatedly having to find the right page gets really annoying really quickly. What I would like is to have the name of the variable in the list of variables (the entries in the table vars) be a link to the page that contains the descriptives for that variable.

    As an aside, I know I could have used the new table command for some the tables in the code below, but I am preparing a course for an organization that has Stata 16 not 17.

    Code:
    clear all
    cd "c:\temp"
    
    sysuse auto, clear
    
    putdocx begin, footer(footer_pn) pagesize(A4)
    
    //footer
    putdocx paragraph, tofooter(footer_pn)
    putdocx pagenumber
    putdocx text ("/")
    putdocx pagenumber, totalpages
    
    //title
    putdocx paragraph, style(Heading1)
    mata: st_local("fn",pathbasename(`"`c(filename)'"'))
    local fl : data label
    putdocx text ("Codebook for `fn'")
    if "`fl'" != "" {
        putdocx text (": `fl'"), italic
    }
    
    //data properties
    putdocx paragraph, style(Heading2)
    putdocx text ("Properties of file")
    
    putdocx table file = (3,2), layout(autofitcontents)
    putdocx table file(1,1) = ("no. of variables"), bold
    putdocx table file(1,2) = ("`c(k)'")
    putdocx table file(2,1) = ("no. of observations"), bold
    putdocx table file(2,2) = ("`=_N'"), nformat(%9.0gc) trim
    putdocx table file(3,1) = ("last saved"), bold
    putdocx table file(3,2) = ("`c(filedate)'")
    
    putdocx table file(.,.), border(all,nil)
    putdocx table file(1,.), border(top,single)
    putdocx table file(3,.), border(bottom, single)
    
    //list of variables
    putdocx paragraph, style(Heading2)
    putdocx text ("List of variables")
    
    putdocx table vars = (`=`c(k)'+1',2), layout(autofitcontents)
    putdocx table vars(1,1) = ("variable name"), bold
    putdocx table vars(1,2) = ("label"), bold
    local i = 2
    foreach var of varlist * {
        putdocx table vars(`i'  , 1) = ("`var'") ,
        putdocx table vars(`i++', 2) = (`"`: variable label `var''"'),
    }
    putdocx table vars(.,.), border(all, nil)
    putdocx table vars(1,.) , border(bottom, single) border(top, single)
    putdocx table vars(`=`c(k)'+1',.) , border(bottom, single)
    
    //properties of variables
    
    foreach var of varlist * {
        // title
        putdocx pagebreak
        putdocx paragraph, style(Heading2)
        putdocx text ("`var'")
        if `"`: variable label `var''"' != "" {
            putdocx text (`": `: variable label `var''"'), italic
        }
        
        // collect info and frequency table or summary statistics
        capture confirm string variable `var'
        local tab = _rc == 0
        local rawtype = cond(`tab', "string", "numeric")
        local type : type `var'
        tempvar mark
        bys `var' : gen byte `mark' = _n == 1 if !missing(`var')
        count if `mark' == 1
        local tab = (r(N) <= 10) | `tab'
        local n_distinct = r(N)
        
        // variable properties
        putdocx paragraph, style(Heading3)
        putdocx text ("Properties of variable")
    
        putdocx table desc_`var' = (4,2), layout(autofitcontents)
        putdocx table desc_`var'(1,1) = ("type"), bold
    
        putdocx table desc_`var'(1,2) = ("`rawtype' (`type')")
    
        putdocx table desc_`var'(2,1) = ("missing values"), bold
        count if missing(`var')
        putdocx table desc_`var'(2,2) = (r(N)), nformat(%9.0gc) trim
     
        putdocx table desc_`var'(3,1) = ("non-missing values"), bold
        putdocx table desc_`var'(3,2) = (_N-r(N)), nformat(%9.0gc) trim
        
        putdocx table desc_`var'(4,1) = ("distinct non-missing values"), bold
        putdocx table desc_`var'(4,2) = (`n_distinct'), nformat(%9.0gc) trim
    
        putdocx table desc_`var'(.,.), border(all, nil)
        putdocx table desc_`var'(1,.), border(top, single)
        putdocx table desc_`var'(4,.), border(bottom, single)
    
        if `tab' {
               putdocx paragraph , style(Heading3)
            putdocx text ("Table")
            
            frame
            local data = r(currentframe)
            frame copy `data' table, replace
            frame change table
    
            tempvar freq
            bysort `var' : gen `freq' = _N
            by     `var' : keep if _n == 1
    
            local val_lab : value label `var'
            if "`val_lab'" != "" {
                tempvar lab
                decode `var', gen(`lab')
                local label "label"
            }
    
            keep `lab' `var' `freq'
            order `var' `lab' `freq'
            rename `var' value
            if "`val_lab'" != "" {
                rename `lab' label
                label values value .
            }
            rename `freq' frequency
        
            putdocx table tab_`var' = data(value `label' frequency), varnames layout(autofitcontents)
            putdocx table tab_`var'(.,.), border(all, nil)
            putdocx table tab_`var'(1,.), border(top, single) bold
            putdocx table tab_`var'(1,.), border(bottom, single)
            putdocx describe tab_`var'
            putdocx table tab_`var'(`r(nrows)',.), border(bottom, single)
    
            frame change `data'        
        }
        else {
              putdocx paragraph , style(Heading3)
            putdocx text ("Summary")
        
            sum `var', detail
    
            putdocx table tab_`var' = (5,2), layout(autofitcontents)
            putdocx table tab_`var'(1,1) = ("minimum"), bold
            putdocx table tab_`var'(2,1) = ("25th percentile"), bold
            putdocx table tab_`var'(3,1) = ("50th percentile"), bold
            putdocx table tab_`var'(4,1) = ("75th percentile"), bold
            putdocx table tab_`var'(5,1) = ("maximum"), bold
            putdocx table tab_`var'(1,2) = (r(min))
            putdocx table tab_`var'(2,2) = (r(p25))
            putdocx table tab_`var'(3,2) = (r(p50))
            putdocx table tab_`var'(4,2) = (r(p75))
            putdocx table tab_`var'(5,2) = (r(max))
    
            putdocx table tab_`var'(.,.), border(all, nil)
            putdocx table tab_`var'(1,.), border(top, single)
            putdocx table tab_`var'(5,.), border(bottom, single)        
        }
    }
    
    //close
    putdocx save cb, replace
    Last edited by Maarten Buis; 12 Jan 2022, 04:15.

    Leave a comment:


  • Leonardo Guizzetti
    replied
    Originally posted by Hua Peng (StataCorp) View Post
    Maarten Buis,

    Code:
    hep putdocx paragraph
    in the text_options table, see hyperlink(link) option. Does this meet your need?
    I read this request as wanting to embed hyperlinks to locations within the document, e.g., to bookmarks. This option appears to be only for external URLs.

    Leave a comment:


  • Hua Peng (StataCorp)
    replied
    Maarten Buis,

    Code:
    hep putdocx paragraph
    in the text_options table, see hyperlink(link) option. Does this meet your need?
    Last edited by Hua Peng (StataCorp); 11 Jan 2022, 11:22.

    Leave a comment:


  • Roger Newson
    replied
    Still on the subject of putdocx, I would like to see Scalable Vector Graphics (,svg) added to the list of file formats that can be embedded by putdocx image. And it might be even better if this was done later in Stata 17.

    Leave a comment:


  • Maarten Buis
    replied
    I would like it if putdocx would be able to include hyperlinks. I use putdocs to automatically generate a codebook from a dataset. Since a codebook is not intended to be read cover to cover (unless you suffer from a really bad case of insomnia) you want to allow the user to jump back and forth between a list of variables and the detailed descriptions of the individual variables, which is what I want to use hyperlinks for.

    Leave a comment:


  • Jared Greathouse
    replied
    Pay no mind to #225, apparently state space models are the same thing (essentially) as Bayesian Structural time series models.

    I really should take a class about Bayesian stats one of these days.

    Leave a comment:


  • Leonardo Guizzetti
    replied
    A final followup to #236, the -merge m:m- behaviour is akin to a data step match-merge join in SAS, for those familiar. The two can only be made to coincide when, within id variables, the join operation involves a one-to-many or one-to-one (either direction) relationship.

    Leave a comment:


  • Leonardo Guizzetti
    replied
    Originally posted by William Lisowski View Post
    I do not recall any SQL that does matching in that fashion. To the limits of my memory it seems to me that every SQL join I did where key K appeared in I rows in the left table and J rows in the right table produced J*K rows in the resulting table with key K. Stata merge m:m describes a procedure that produces max(I,J) observations in the resulting dataset.
    Ah well spotted, and clearly demonstrates the deeper issue with -merge m:m-. I took would have expected the result to e J*K in size, so I evidently picked a toy problem that didn't properly test the two programs. My previous post can be disregarded and thank you for the clear elucidation of the issue.

    Leave a comment:


  • William Lisowski
    replied
    @Leonardo Guizzeti #233 -

    I will start by confessing that the last SQL I wrote was in 2014, so my memories of SQL have passed their half-life several times over by now.

    You write

    merge m:m- is precisely an SQL full join on a common, coalesced ID
    but that does not agree with my understanding, which is why I wrote the snarky comment beginning "No database person ..."

    From the PDF documentation for merge we read

    In an m:m merge, observations are matched within equal values of the key variable(s), with the first observation being matched to the first; the second, to the second; and so on. If the master and using have an unequal number of observations within the group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group.
    I do not recall any SQL that does matching in that fashion. To the limits of my memory it seems to me that every SQL join I did where key K appeared in I rows in the left table and J rows in the right table produced I*J rows in the resulting table with key K. Stata merge m:m describes a procedure that produces max(I,J) observations in the resulting dataset.

    Starting with your example, if we replace datasets `a' and `b' with
    Code:
    input byte(a b)
    1 1
    1 2
    end
    
    input byte(a c)
    1 3
    1 4
    1 5
    end

    we achieve the following result from merge m:m
    Code:
    . list, clean
    
           a   b   c        _merge  
      1.   1   1   3   Matched (3)  
      2.   1   2   4   Matched (3)  
      3.   1   2   5   Matched (3)
    and from joinby
    Code:
    . list a b c _merge, clean
    
           a   b   c                          _merge  
      1.   1   1   3   both in master and using data  
      2.   1   1   4   both in master and using data  
      3.   1   1   5   both in master and using data  
      4.   1   2   3   both in master and using data  
      5.   1   2   4   both in master and using data  
      6.   1   2   5   both in master and using data
    Last edited by William Lisowski; 07 Jan 2022, 10:56. Reason: Corrected J*K to I*J

    Leave a comment:


  • Leonardo Guizzetti
    replied
    I completely agree with the spirit of William's post #232. Users should be steered away from -merge m:m- since what they wish to accomplish is better handled by -joinby-.

    To make things equivalent, -merge m:m- is precisely an SQL full join on a common, coalesced ID. That is, observations with unmatched identifiers in either dataset are retained by default with -merge m:m-. In contrast, -joinby- defaults to removing those unmatched observations, which is usually what is desired. This behaviour can be counteracted by adding the -unmatched(both)- option to -joinby-. That said, if you know you don't have unmatched identifiers, -merge m:m- wouldn't necessarily be wrong, but it would be comparatively inefficient, and that is easily avoided nevertheless.

    Toy example

    Code:
    tempname a b
    input byte(a b)
    1 4
    1 6
    2 9
    3 3
    5 .
    end
    sort a
    save `a', replace
    list
    
    drop _all
    input byte(a c)
    1 2
    2 8
    2 3
    3 5
    3 6
    4 .
    end
    sort a
    save `b', replace
    list
    
    use `a', clear
    merge m:m a using `b'
    sort a b c
    list
    
    use `a', clear
    joinby a using `b', unmatched(both)
    sort a b c
    list a b c _merge
    Equivalent join using SQL (with a SAS accent)

    Code:
    select coalesce(a.a, b.a) as a,
            a.b, b.c
      from one as a full join two as b
      on a.a=b.a
      order by a,b,c;

    Leave a comment:


  • William Lisowski
    replied
    I expect that an earlier post in this topic has repeated the continual request that Stata address the problems created by the merge m:m command. In answering a question today I happened to review the help merge documentation and saw that it includes no reference to the problems with merge m:m.

    If the underlying issues cannot be addressed directly, I suggest that the output of help merge be expanded to include a warning derived from the warning that appears in the PDF documentation, since it's often an uphill battle to get new users to read the more than the help output, if even that.

    At the same time, there's a particular problem in both the PDF and the help output: the introduction includes

    merge can perform match merges (one-to-one, one-to-many, many-to-one, and many-to-many), which are often called 'joins' by database people.
    No database person anywhere ever used the SQL join command to accomplish what is produced by Stata's "many-to-many" merge command. But a database person might interpret the quoted statement as equating the Stata "many-to-many" merge with the SQL m-by-m join (I did when I was new to Stata). The m-by-m join is the equivalent not the Stata's merge m:m command but to Stata's joinby command, which is mentioned nowhere in the output of help merge.

    If this pothole can't be fixed, at least make a better effort to steer new users around it.

    Leave a comment:


  • George Ford
    replied
    Have the missing values tables store all results (not just the last one, as with mdesc) in a table.

    Leave a comment:


  • Jeff Wooldridge
    replied
    Originally posted by Joao Santos Silva View Post
    I would like to suggest a few changes to the way the RESET test (estat ovtest) is implemented:

    1 - The most important one is that the test should be based on the same type of covariance matrix used in the estimation of the main model. It does not make sense to run a model with some form of robust standard errors, and the perform the RESET with plain-vanilla standard errors.

    2 - This is more a question of taste, but personally I would prefer if the the number of powers included by default could be reduced to just one or two (or that we have an option to choose the number of powers to include).

    3 - Finally, it would be great if the misleading name of the command could be changed. I know that this may be asking too much, but at least perhaps we could have estat reset as synonymous to estat ovtest.
    I agree with Joao about RESET. In fact, I had a Twitter thread on this back in March: https://twitter.com/jmwooldridge/sta...12169036201985

    Leave a comment:


  • Joao Santos Silva
    replied
    I would like to suggest a few changes to the way the RESET test (estat ovtest) is implemented:

    1 - The most important one is that the test should be based on the same type of covariance matrix used in the estimation of the main model. It does not make sense to run a model with some form of robust standard errors, and the perform the RESET with plain-vanilla standard errors.

    2 - This is more a question of taste, but personally I would prefer if the the number of powers included by default could be reduced to just one or two (or that we have an option to choose the number of powers to include).

    3 - Finally, it would be great if the misleading name of the command could be changed. I know that this may be asking too much, but at least perhaps we could have estat reset as synonymous to estat ovtest.

    Leave a comment:

Working...
X