Formatting bug in ds?

Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#1

Formatting bug in ds?

08 Jan 2020, 15:02

Dear All,

I am getting a rather messy output out of the Stata's ds command.As a result of investigation the following minimal example demonstrates the issue:

Code:

clear all set more off version 14.0 forval i=1/28 { tempfile tmp clear set obs 1 generate t__id=1 generate `="x"*`i''__id = 1 describe save `"`tmp'"' local tmplist `"`tmplist' "`tmp'""' } foreach f in `tmplist' { use `"`f'"', clear ds *__id, varwidth(32) }

This results in the following output:

The problem seems to be provoked by the presence of a variable in front of the long-named variable, which would otherwise be shortened under the default value of the variable width (twelve).

If the line with the ds command is replaced with:

Code:

ds *__id

then the output becomes:

I did some further investigations and the output becomes messier with more variables of short and long names, but couldn't figure out a clear pattern.

Perhaps adding a strtrim() to the varlist before displaying it would be a quick and easy fix for the issue (if it hasn't been fixed yet in more recent versions of Stata). Or if this behavior is desirable, then what is the explanation??

Thank you. Sergiy Radyakin
Tags: bug, ds, output formatting
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

08 Jan 2020, 15:56

Deleted my original response, the code posted in #1 was not what produced the first set of output - the ds command used for that output was in fact

Code:

ds

while the second set of output was what was produced by the code posted.

TL;DR: It's not a bug, it's a feature.

Reviewing the code shown by

Code:

viewsource ds_util.ado

shows that ds is functioning as intended. It divides the linesize into columns of identical size based on the value of the varwidth option, whose default value is 12, and lists the variables down the columns rather than across the lines.

Code:

. clear . set obs 1 number of observations (_N) was 0, now 1 . foreach l in `c(alpha)' { 2. generate `="`l'"*6' = 1 3. } . . set linesize 40 . ds aaaaaa gggggg mmmmmm ssssss yyyyyy bbbbbb hhhhhh nnnnnn tttttt zzzzzz cccccc iiiiii oooooo uuuuuu dddddd jjjjjj pppppp vvvvvv eeeeee kkkkkk qqqqqq wwwwww ffffff llllll rrrrrr xxxxxx .

Last edited by William Lisowski; 08 Jan 2020, 16:15.
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#3

08 Jan 2020, 18:09

Originally posted by William Lisowski View Post

... It divides the linesize into columns of identical size based on the value of the varwidth option...

I agree, but I fail to see why the table gets messy with multiple outputs. Your output is one, albeit covering multiple lines.
Mine is in reality a genuine loop over multiple datasets (similar to how my example illustrates).

Holding the line width and varwidth option constant, shouldn't the multiple outputs still create a nicely aligned table when stacked in the output window??

Here is the actual output (fragment, with obfuscated varnames):

I didn't retain the width specified for the output in the loop, since anyway I was interested in the returned varlist, but I am pretty sure that was constant throughout all iterations since I wanted to see the full variable names. The output was just obtained during the debugging, but I took a snapshot of the weird alignment. I guess it's just a matter of creating these 7 datasets with these specific varnames for replication.

My expectation in the above is that the values in the second column are aligned (right-aligned or left-aligned) to a certain margin, that is defined by the varwidth, kind of what William Lisowski illustrates in his message.

Thank you, Sergiy
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

09 Jan 2020, 06:28

I wrote in post #2 that

[ds] divides the linesize into columns of identical size based on the value of the varwidth option

and you followed up in post #3 with

My expectation in the above is that the values in the second column are aligned (right-aligned or left-aligned) to a certain margin, that is defined by the varwidth

In post #2 I should have written

[ds] divides the linesize into columns of identical size based on the minimum of the value of the varwidth option and the length of the longest variable name

Again using our ability to examine the code with

Code:

viewsource ds_util.ado

we see that the list of (possibly abbreviated) variable names fed to the local DisplayInCols program is constructed by

Code:

local vlist foreach v of local varlist { local vlist `"`vlist' `= abbrev("`v'",`varwidth')'"' }

and thus if the length of the longest variable name does not exceed varwidth, the varwidth option has no effect.
Comment

Announcement

Formatting bug in ds?

Comment

Comment

Comment