Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Formatting bug in ds?

    Dear All,

    I am getting a rather messy output out of the Stata's ds command.As a result of investigation the following minimal example demonstrates the issue:

    Code:
    clear all
    set more off
    version 14.0
    
    forval i=1/28 {
      tempfile tmp
      clear
      set obs 1
      generate t__id=1
      generate `="x"*`i''__id = 1
      describe
      save `"`tmp'"'
      local tmplist `"`tmplist' "`tmp'""'
    }
    
    foreach f in `tmplist' {
      use `"`f'"', clear
      ds *__id, varwidth(32)
    }
    This results in the following output:



    The problem seems to be provoked by the presence of a variable in front of the long-named variable, which would otherwise be shortened under the default value of the variable width (twelve).

    If the line with the ds command is replaced with:
    Code:
    ds *__id
    then the output becomes:



    I did some further investigations and the output becomes messier with more variables of short and long names, but couldn't figure out a clear pattern.

    Perhaps adding a strtrim() to the varlist before displaying it would be a quick and easy fix for the issue (if it hasn't been fixed yet in more recent versions of Stata). Or if this behavior is desirable, then what is the explanation??

    Thank you. Sergiy Radyakin

  • #2
    Deleted my original response, the code posted in #1 was not what produced the first set of output - the ds command used for that output was in fact
    Code:
    ds
    while the second set of output was what was produced by the code posted.

    TL;DR: It's not a bug, it's a feature.

    Reviewing the code shown by
    Code:
    viewsource ds_util.ado
    shows that ds is functioning as intended. It divides the linesize into columns of identical size based on the value of the varwidth option, whose default value is 12, and lists the variables down the columns rather than across the lines.
    Code:
    . clear
    
    . set obs 1
    number of observations (_N) was 0, now 1
    
    . foreach l in `c(alpha)' {
      2. generate `="`l'"*6' = 1
      3. }
    
    . 
    . set linesize 40 
    
    . ds
    aaaaaa  gggggg  mmmmmm  ssssss  yyyyyy
    bbbbbb  hhhhhh  nnnnnn  tttttt  zzzzzz
    cccccc  iiiiii  oooooo  uuuuuu
    dddddd  jjjjjj  pppppp  vvvvvv
    eeeeee  kkkkkk  qqqqqq  wwwwww
    ffffff  llllll  rrrrrr  xxxxxx
    
    .
    Last edited by William Lisowski; 08 Jan 2020, 16:15.

    Comment


    • #3
      Originally posted by William Lisowski View Post
      ... It divides the linesize into columns of identical size based on the value of the varwidth option...
      I agree, but I fail to see why the table gets messy with multiple outputs. Your output is one, albeit covering multiple lines.
      Mine is in reality a genuine loop over multiple datasets (similar to how my example illustrates).

      Holding the line width and varwidth option constant, shouldn't the multiple outputs still create a nicely aligned table when stacked in the output window??

      Here is the actual output (fragment, with obfuscated varnames):

      Click image for larger version

Name:	formatting.png
Views:	1
Size:	8.7 KB
ID:	1531294


      I didn't retain the width specified for the output in the loop, since anyway I was interested in the returned varlist, but I am pretty sure that was constant throughout all iterations since I wanted to see the full variable names. The output was just obtained during the debugging, but I took a snapshot of the weird alignment. I guess it's just a matter of creating these 7 datasets with these specific varnames for replication.

      My expectation in the above is that the values in the second column are aligned (right-aligned or left-aligned) to a certain margin, that is defined by the varwidth, kind of what William Lisowski illustrates in his message.

      Thank you, Sergiy

      Comment


      • #4
        I wrote in post #2 that

        [ds] divides the linesize into columns of identical size based on the value of the varwidth option
        and you followed up in post #3 with

        My expectation in the above is that the values in the second column are aligned (right-aligned or left-aligned) to a certain margin, that is defined by the varwidth
        In post #2 I should have written

        [ds] divides the linesize into columns of identical size based on the minimum of the value of the varwidth option and the length of the longest variable name
        Again using our ability to examine the code with
        Code:
        viewsource ds_util.ado
        we see that the list of (possibly abbreviated) variable names fed to the local DisplayInCols program is constructed by
        Code:
        local vlist 
        foreach v of local varlist {
            local vlist `"`vlist' `= abbrev("`v'",`varwidth')'"' 
            }
        and thus if the length of the longest variable name does not exceed varwidth, the varwidth option has no effect.

        Comment

        Working...
        X