Wishlist for Stata 18

Leonardo Guizzetti replied

19 Jan 2022, 06:54
Originally posted by Henry Strawforrd View Post

... I beg you to search this forum and find out for yourself how often these kind of bysorts are suggested by very experienced users.

I have been a use here for many years and am familiar with the technique and that it is suggested. Those bysort strategies are often explained as intended to work (or not) with data with missing values, depending on the details shared by the person asking for help and the data example. Nevertheless, two strategies for missing values are often useful: drop them before performing a manipulation or summary, or (ii) ignore them after such an operation.

Originally posted by Henry Strawforrd View Post

I would say missing should be neither min nor max of the sort. I think other languages' NA is always false, so it would not be counted for these kind of bysort: gen exercises.

Missing values are part of the data and must go somewhere when sorted. Where then should Stata put them if not treating them as either the min/max value? Stata treats missing as large, R does the same with "NA", and SAS treats them as small. How the programmer chooses o work with groups defined in part or in whole by such values is up to to the programmer to decide.
2 likes
Leave a comment:
Henry Strawforrd replied

19 Jan 2022, 03:59
Originally posted by Leonardo Guizzetti View Post

I read the posted link and many of the stated "pitfalls" boil down to William's point about not being proficient in Stata (or failing to read the documentation). The list, I think, is intended for people with little or no experience coming to Stata from other languages, who may not have the same expectations about what code ouught or ought not to do.
.

If you think this is a beginner's problem, I beg you to search this forum and find out for yourself how often these kind of bysorts are suggested by very experienced users.

Code:

bysort id (time): gen temp = time[_N]

I would say missing should be neither min nor max of the sort. I think other languages' NA is always false, so it would not be counted for these kind of bysort: gen exercises.
Last edited by Henry Strawforrd; 19 Jan 2022, 04:04.
Leave a comment:
Esben Eriksen replied

19 Jan 2022, 03:00
opportunity to esitimate multilevel/hierachical beta regressions (rather than just simple beta regressions)
Leave a comment:
Leonardo Guizzetti replied

18 Jan 2022, 09:48
Originally posted by Henry Strawforrd View Post

Finally introducing a proper missing number that is NA. Missing = infinity is probably the most common cause of coding errors.
For instance, I just found out that the sorts I've been are wrong: https://github.com/matthieugomez/stata-pitfalls/wiki

Originally posted by William Lisowski View Post

... my reading suggests that it is just one of many manifestations of the underlying cause, lack of proficiency with Stata ...

I read the posted link and many of the stated "pitfalls" boil down to William's point about not being proficient in Stata (or failing to read the documentation). The list, I think, is intended for people with little or no experience coming to Stata from other languages, who may not have the same expectations about what code ouught or ought not to do.

For example, using egen functions, min() and max() mirror their counterpart functions min() and max(). But they do allow a switch to their behaviour which will *not* ignore missing values. However, you have to be deliberate in wanting those behaviours because they would not universally make sense in applications. And those proficient with Stata may expect the default behaviour to be consistent with their other namesakes.

Likewise, -egen sum, and rowsum transform missing values to zero.- refers to outdated -egen- functions. -egen rowtotal()- does also have a switch to handle the case when all values are missing. Those references are more than ~15 years old, and even then, were addressed here.

On a choice of treating missing values as infinitely large, is just that, a convention. Some Alternative Software defaults to making them infinitely small, and many similar issues bite inexperienced programmers in the opposite direction, yet you don't see their clients tossing the baby out with the bathwater.

The advice on comparing floats vs doubles is just wrong. Either value can be compared with implicit or explicit casting to float, or else created as doubles and compared directly. Conversion to strings should never happen in any context.
1 like
Leave a comment:
Richard Williams replied

18 Jan 2022, 09:21
I haven't read it carefully, but this article

https://www.academia.edu/s/79074d3221

criticizes Stata's handling of missing data and champions the use of a user-written routine called validly instead. validly has been around since 2013 and I don't remember seeing it mentioned before, but if you don't like how Stata handles missing data see if you like validly any better.
Leave a comment:
William Lisowski replied

18 Jan 2022, 08:28
Henry Strawforrd #252 -

1) If sorting Stata missing values is a problem that NA solves, given this dataset

Code:

+--------------+ | ID profit | |--------------| 1. | 101 0 | 2. | 102 42 | 3. | 103 NA | 4. | 104 -666 | +--------------+

what is the expected output of

Code:

sort profit

2) My point in (1) is that different users will have different expectations, and so the behavior of NA, whatever it might be, will need to be documented. But the output of help sort begins with

Code:

Description sort arranges the observations of the current data into ascending order based on the values of the variables in varlist. There is no limit to the number of variables in varlist. Missing numeric values (see missing) are interpreted as being larger than any other number, so they are placed last with . < .a < .b < ... < .z. When you sort on a string variable, however, null strings are placed first and uppercase letters come before lowercase letters.

If that is somehow insufficient, it is difficult to see what could be improved in the description of NA to avoid similar problems.

3) My point in (2) is that while the behavior of Stata missing values may be a common cause of coding errors, having followed Statalist for a few years, my reading suggests that it is just one of many manifestations of the underlying cause, lack of proficiency with Stata. Using Stata is like using any unfamiliar language. Reference to similar languages you already know will only take you so far. Ultimately, the grammar, syntax, and idioms of a language matter, and the effort to systematically learn those features, rather than relying on a mix of guesswork, assumption, and reference to Google, is what leads to proficiency.

4) As someone familiar with Stata's missing values, I would be reluctant to give up on the current implementation, which includes 26 "special" missing values that can be used to indicate the reason a value is missing, knowledge of which can be crucial in survey-based data, where we may need to distinguish "not applicable" from "refused to answer" from ... . Losing the ability to encode those outcomes within a missing value would require creating, for every variable with NA values, a second variable to track the reason for the NA.
4 likes
Leave a comment:
Henry Strawforrd replied

18 Jan 2022, 01:18
Finally introducing a proper missing number that is NA. Missing = infinity is probably the most common cause of coding errors.
For instance, I just found out that the sorts I've been are wrong: https://github.com/matthieugomez/stata-pitfalls/wiki
Leave a comment:
Clyde Schechter replied

16 Jan 2022, 22:28
Fairly recently, Stata acquired the ability to access the value of a matrix cell identifying the cell by the names of the rows and columns. So, for example it became possible to say things like -local cell = M["rowname", "colname"]. Apparently this works only when reading a matrix. If you try -matrix M["rowname", "colname"] = expression-, you get a type mismatch error. It would be helpful to be able to use this syntax to also write to a matrix indexing it by row and column names instead of row and column numbers.
5 likes
Leave a comment:
Rich Goldstein replied

14 Jan 2022, 14:18
re: #249, see

Code:

help trace
2 likes
Leave a comment:
Ivan Manhique replied

14 Jan 2022, 13:26
Do not know whether Stata 17 has this feature. Is there anyhow Stata report the error code with the number of the line where the error comes from in the do-file? If not, I think this would be interesting, specially for very long do-files.
1 like
Leave a comment:
Leonardo Guizzetti replied

13 Jan 2022, 11:39
Ali Atia got me half-way to a solution. If one creates a hyperlink starting with a hash mark (#), this will automatically be recognized by Word as a hyperlink to a bookmark inside the document. Note the rules for valid bookmark names is that they must start with a letter, and may then be a mix of numbers, letters or underscores. E.g.,

Code:

putdocx text ("link to bookmark") , hyperlink("#Name")

Then one would have to insert/create those bookmarks in the final document, but the links remain and will be respected upon export to PDF.
3 likes
Leave a comment:
Leonardo Guizzetti replied

13 Jan 2022, 09:29
Maarten Buis I'm not sure either, but I think it may be something that might be handled by Stata. At least, Word represents links using field codes, and there is the option to that field code that directs Word to use a link to the external world into an internal location. I'll keep my fingers crossed in hopes that this is something StataCorp is willing and able to tackle.
Leave a comment:
Maarten Buis replied

13 Jan 2022, 09:20
Ali Atia thank you very much

Leonardo Guizzetti I see the problem. I don't know if this is a Stata problem or a Word problem. In essence Word needs to know that it is referring to itself, and translate that the pdf document it creates. I don't say you are wrong, I really mean that I don't know where the solution should be.
Leave a comment:
Leonardo Guizzetti replied

13 Jan 2022, 08:37
Originally posted by Ali Atia View Post

Maarten Buis

You can use the hyperlink function to insert bookmark links. Replace your line 48 with:

Code:

putdocx table vars(`i' , 1) = ("`var'"),hyperlink(`"cb.docx#`var': `: variable label `var''"') ,

That may be a good solution for a document intended to remain a docx file. Many times I will convert these from Word to PDF afterwards, in which case those hard-coded links would fail. Compare for example, making a hyperlink in Word that points to an existing location in the document, which will be converted appropriately in a PDF to a link within that new file. This later functionality is what I hope Stata could provide a solution to.
1 like
Leave a comment:
Ali Atia replied

13 Jan 2022, 08:06
Maarten Buis

You can use the hyperlink function to insert bookmark links. Replace your line 48 with:

Code:

putdocx table vars(`i' , 1) = ("`var'"),hyperlink(`"cb.docx#`var': `: variable label `var''"') ,
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: