Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Common pitfalls in Stata

    I've uploaded on Github a list of mistakes commonly made in Stata, focusing on situations where Stata silently returns something different from what the user may have in mind.
    While most of these situations have been discussed at some point on the Statalist, I wanted to gather all these "gotcha" problems in one place.
    This list can be can be directly edited on Github, so feel free to add your owns.
    stata-pitfalls - A list of common pitfalls in Stata

  • #2
    HI Matthieu,

    Thanks for the list. There are a lot of gotcha's that I would have liked to be aware of when first learning it.
    By the way, have you thought about setting it as a Wiki? ( https://github.com/matthieugomez/stata-pitfalls/wiki )
    It may lower the editing costs, specially for others not familiar with Github.

    Best,
    Sergio

    Comment


    • #3
      Matthieu,

      This looks like a useful list. I'm not familiar with editing in Git, so I will suggest an addition here. In your discussion of data types, you might want to mention that date/time variables need to be created as double. When they are imported from elsewhere they are usually already stored as double but if you create a new one it defaults to float and when you assign a date/time variable (stored as double) to it, some precision is lost.

      Regards,
      Joe

      Comment


      • #4
        Joe is clearly right. Indeed, more generally, dates are one large area where people can be bit hard if they neglect to read the help.

        The if command (NB not the if qualifier) not working as people expect is high on any list. That seems to bite people who come from SAS. I didn't find I needed to unlearn anything I'd picked up in previously encountered programming languages or statistical environments. That is, I have never used SAS.

        Comment


        • #5
          Thanks for the feedback. I have transformed the page as a wiki, so that the page can simply be edited by clicking the "Edit" button, as in wikipedia. I've added a paragraph about dates - just to confirm, you are talking about variables such as 19990103 to denote 01/03/1999, right?

          Comment


          • #6
            Well, no; the key point is that 19990103 is just a large integer and cannot possibly mean 1 March 1999 or 3 January 1999 in Stata.

            Comment


            • #7
              Some people either assume -sort- to be stable (despite it being clear in the -help- file that it's not), or are not aware of the implications this has for reproducible results. I'll try to include it myself, but I mention it in case anyone is willing and has time. A few cases have come up on Statalist.
              You should:

              1. Read the FAQ carefully.

              2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

              3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

              4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

              Comment


              • #8
                Matthieu,

                Dates in Stata are stored as number of days since January 1, 1960. These can be stored as integers with no problem. Date/time variables in Stata are stored as the number of milliseconds since midnight, January 1, 1960. These require data type double to store the 12-13 digits that these usually have.

                Regards,
                Joe

                Comment


                • #9
                  Ok. I actually never used datetimes (I thought you were referring to users storing dates in a string-like float format, such as 19990103). I added your point in the list. Thanks!

                  Comment


                  • #10
                    Joe's point is not quite right either.Daily dates in Stata are stored as number of days since January 1, 1960. Dates can also be yearly, half-yearly, quarterly, monthly and weekly. Then as Joe says, there are date-times too.

                    Comment


                    • #11
                      Jonathan Shaw wrote a paper on Stata "Gotchas" which is available here: https://www.ifs.org.uk/docs/stata_gotchasJan2014.pdf
                      Recommended.

                      Comment


                      • #12
                        The link in #11 seems broken. Try https://journals.sagepub.com/doi/pdf...867X1501500209 otherwise.

                        Comment


                        • #13
                          Just skimming through the paper, that seems like a pretty good list.

                          I would add missing and extraneous spaces. For example

                          Code:
                          . mat sigma1 = r (Sigma)
                          r not found
                          r(111);
                          The real problem is that there is a space between r and (Sigma). It should be

                          Code:
                          . mat sigma1 = r(Sigma)
                          Especially complicating things is that the error messages tend to be way off the mark. The problem isn't that r doesn't exist, the problem is the extra space. Also, I don't Stata is consistent about when extraneous spaces cause trouble and when they don't.

                          Missing spaces can screw up things like continuation lines:

                          Code:
                          webuse nhanes2f, clear
                          regress health i.race, vce(robust)///
                              level(99)
                          Code:
                          . regress health i.race, vce(robust)///
                          option / not allowed
                          r(198);
                          You need a space before the ///. I can't come up with an example now, but I think I have had cases where a missing space caused 2 options to be combined into a single misspelled option.
                          -------------------------------------------
                          Richard Williams, Notre Dame Dept of Sociology
                          StataNow Version: 19.5 MP (2 processor)

                          EMAIL: [email protected]
                          WWW: https://www3.nd.edu/~rwilliam

                          Comment

                          Working...
                          X