Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Leonardo Guizzetti
    replied
    Originally posted by Henry Strawforrd View Post
    ... I beg you to search this forum and find out for yourself how often these kind of bysorts are suggested by very experienced users.
    I have been a use here for many years and am familiar with the technique and that it is suggested. Those bysort strategies are often explained as intended to work (or not) with data with missing values, depending on the details shared by the person asking for help and the data example. Nevertheless, two strategies for missing values are often useful: drop them before performing a manipulation or summary, or (ii) ignore them after such an operation.


    Originally posted by Henry Strawforrd View Post
    I would say missing should be neither min nor max of the sort. I think other languages' NA is always false, so it would not be counted for these kind of bysort: gen exercises.
    Missing values are part of the data and must go somewhere when sorted. Where then should Stata put them if not treating them as either the min/max value? Stata treats missing as large, R does the same with "NA", and SAS treats them as small. How the programmer chooses o work with groups defined in part or in whole by such values is up to to the programmer to decide.

    Leave a comment:


  • Henry Strawforrd
    replied
    Originally posted by Leonardo Guizzetti View Post

    I read the posted link and many of the stated "pitfalls" boil down to William's point about not being proficient in Stata (or failing to read the documentation). The list, I think, is intended for people with little or no experience coming to Stata from other languages, who may not have the same expectations about what code ouught or ought not to do.
    .
    If you think this is a beginner's problem, I beg you to search this forum and find out for yourself how often these kind of bysorts are suggested by very experienced users.
    Code:
     bysort id (time): gen temp = time[_N]
    I would say missing should be neither min nor max of the sort. I think other languages' NA is always false, so it would not be counted for these kind of bysort: gen exercises.
    Last edited by Henry Strawforrd; 19 Jan 2022, 04:04.

    Leave a comment:


  • Esben Eriksen
    replied
    opportunity to esitimate multilevel/hierachical beta regressions (rather than just simple beta regressions)

    Leave a comment:


  • Leonardo Guizzetti
    replied
    Originally posted by Henry Strawforrd View Post
    Finally introducing a proper missing number that is NA. Missing = infinity is probably the most common cause of coding errors.
    For instance, I just found out that the sorts I've been are wrong: https://github.com/matthieugomez/stata-pitfalls/wiki
    Originally posted by William Lisowski View Post
    ... my reading suggests that it is just one of many manifestations of the underlying cause, lack of proficiency with Stata ...
    I read the posted link and many of the stated "pitfalls" boil down to William's point about not being proficient in Stata (or failing to read the documentation). The list, I think, is intended for people with little or no experience coming to Stata from other languages, who may not have the same expectations about what code ouught or ought not to do.

    For example, using egen functions, min() and max() mirror their counterpart functions min() and max(). But they do allow a switch to their behaviour which will *not* ignore missing values. However, you have to be deliberate in wanting those behaviours because they would not universally make sense in applications. And those proficient with Stata may expect the default behaviour to be consistent with their other namesakes.

    Likewise, -egen sum, and rowsum transform missing values to zero.- refers to outdated -egen- functions. -egen rowtotal()- does also have a switch to handle the case when all values are missing. Those references are more than ~15 years old, and even then, were addressed here.

    On a choice of treating missing values as infinitely large, is just that, a convention. Some Alternative Software defaults to making them infinitely small, and many similar issues bite inexperienced programmers in the opposite direction, yet you don't see their clients tossing the baby out with the bathwater.

    The advice on comparing floats vs doubles is just wrong. Either value can be compared with implicit or explicit casting to float, or else created as doubles and compared directly. Conversion to strings should never happen in any context.

    Leave a comment:


  • Richard Williams
    replied
    I haven't read it carefully, but this article

    https://www.academia.edu/s/79074d3221

    criticizes Stata's handling of missing data and champions the use of a user-written routine called validly instead. validly has been around since 2013 and I don't remember seeing it mentioned before, but if you don't like how Stata handles missing data see if you like validly any better.

    Leave a comment:


  • William Lisowski
    replied
    Henry Strawforrd #252 -

    1) If sorting Stata missing values is a problem that NA solves, given this dataset
    Code:
         +--------------+
         |  ID   profit |
         |--------------|
      1. | 101        0 |
      2. | 102       42 |
      3. | 103       NA |
      4. | 104     -666 |
         +--------------+
    what is the expected output of
    Code:
    sort profit
    2) My point in (1) is that different users will have different expectations, and so the behavior of NA, whatever it might be, will need to be documented. But the output of help sort begins with
    Code:
    Description
    
        sort arranges the observations of the current data into ascending order
        based on the values of the variables in varlist.  There is no limit to the
        number of variables in varlist.  Missing numeric values (see missing) are
        interpreted as being larger than any other number, so they are placed last
        with . < .a < .b < ... < .z.  When you sort on a string variable, however,
        null strings are placed first and uppercase letters come before lowercase
        letters.
    If that is somehow insufficient, it is difficult to see what could be improved in the description of NA to avoid similar problems.

    3) My point in (2) is that while the behavior of Stata missing values may be a common cause of coding errors, having followed Statalist for a few years, my reading suggests that it is just one of many manifestations of the underlying cause, lack of proficiency with Stata. Using Stata is like using any unfamiliar language. Reference to similar languages you already know will only take you so far. Ultimately, the grammar, syntax, and idioms of a language matter, and the effort to systematically learn those features, rather than relying on a mix of guesswork, assumption, and reference to Google, is what leads to proficiency.

    4) As someone familiar with Stata's missing values, I would be reluctant to give up on the current implementation, which includes 26 "special" missing values that can be used to indicate the reason a value is missing, knowledge of which can be crucial in survey-based data, where we may need to distinguish "not applicable" from "refused to answer" from ... . Losing the ability to encode those outcomes within a missing value would require creating, for every variable with NA values, a second variable to track the reason for the NA.

    Leave a comment:


  • Henry Strawforrd
    replied
    Finally introducing a proper missing number that is NA. Missing = infinity is probably the most common cause of coding errors.
    For instance, I just found out that the sorts I've been are wrong: https://github.com/matthieugomez/stata-pitfalls/wiki

    Leave a comment:


  • Clyde Schechter
    replied
    Fairly recently, Stata acquired the ability to access the value of a matrix cell identifying the cell by the names of the rows and columns. So, for example it became possible to say things like -local cell = M["rowname", "colname"]. Apparently this works only when reading a matrix. If you try -matrix M["rowname", "colname"] = expression-, you get a type mismatch error. It would be helpful to be able to use this syntax to also write to a matrix indexing it by row and column names instead of row and column numbers.

    Leave a comment:


  • Rich Goldstein
    replied
    re: #249, see
    Code:
    help trace

    Leave a comment:


  • Ivan Manhique
    replied
    Do not know whether Stata 17 has this feature. Is there anyhow Stata report the error code with the number of the line where the error comes from in the do-file? If not, I think this would be interesting, specially for very long do-files.

    Leave a comment:


  • Leonardo Guizzetti
    replied
    Ali Atia got me half-way to a solution. If one creates a hyperlink starting with a hash mark (#), this will automatically be recognized by Word as a hyperlink to a bookmark inside the document. Note the rules for valid bookmark names is that they must start with a letter, and may then be a mix of numbers, letters or underscores. E.g.,

    Code:
    putdocx text ("link to bookmark") , hyperlink("#Name")
    Then one would have to insert/create those bookmarks in the final document, but the links remain and will be respected upon export to PDF.

    Leave a comment:


  • Leonardo Guizzetti
    replied
    Maarten Buis I'm not sure either, but I think it may be something that might be handled by Stata. At least, Word represents links using field codes, and there is the option to that field code that directs Word to use a link to the external world into an internal location. I'll keep my fingers crossed in hopes that this is something StataCorp is willing and able to tackle.

    Leave a comment:


  • Maarten Buis
    replied
    Ali Atia thank you very much

    Leonardo Guizzetti I see the problem. I don't know if this is a Stata problem or a Word problem. In essence Word needs to know that it is referring to itself, and translate that the pdf document it creates. I don't say you are wrong, I really mean that I don't know where the solution should be.

    Leave a comment:


  • Leonardo Guizzetti
    replied
    Originally posted by Ali Atia View Post
    Maarten Buis

    You can use the hyperlink function to insert bookmark links. Replace your line 48 with:

    Code:
    putdocx table vars(`i' , 1) = ("`var'"),hyperlink(`"cb.docx#`var': `: variable label `var''"') ,
    That may be a good solution for a document intended to remain a docx file. Many times I will convert these from Word to PDF afterwards, in which case those hard-coded links would fail. Compare for example, making a hyperlink in Word that points to an existing location in the document, which will be converted appropriately in a PDF to a link within that new file. This later functionality is what I hope Stata could provide a solution to.

    Leave a comment:


  • Ali Atia
    replied
    Maarten Buis

    You can use the hyperlink function to insert bookmark links. Replace your line 48 with:

    Code:
    putdocx table vars(`i'  , 1) = ("`var'"),hyperlink(`"cb.docx#`var': `: variable label `var''"') ,

    Leave a comment:

Working...
X