Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • I am shocked that Stata may not understand variable names literally

    I have this simplest dataset.

    var new
    0 1

    Then I type

    tab va

    then it should throw me an error saying there is no such variable. But instead it gives me

    var | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 1 100.00 100.00
    ------------+-----------------------------------
    Total | 1 100.00


    as if I meant var when I wrote va.


    It also does actual work. When I type

    replace new =3 if va~=.

    then it actually replaces new.



    I am shocked. I created this mini example after realizing that my code is doing extremely strange job of not failing to do something that it has to fail.

    This is an extremely dangerous thing. If I had believed that my regression output was correct while not realizing this strange behavior of Stata, my research would have been totally wrong, and if someone else finds it out, people will think I did it intentionally and I will be regarded as fraud and my career will be over.

    Why does this happen? And how can I 100% make sure Stata doesn't behave this way? And there likely is a name for this behavior. If yes, what is it called? I need to understand this thoroughly.
    Last edited by James Park; 20 Mar 2019, 01:12.

  • #2
    Another similar instance when I was shocked to learn that Stata may not be literal was when I learned this. If I type

    gen X=Y

    it might create X which is actually different from Y when X is created as float.

    This is extremely dangerous.

    For this problem I fixed by writing "set type double, permanently" in every do file I have because absolutely nothing comes before accuracy of research.

    I am terrified by the possibility that I might not recognize these technicalities, submit wrong result, get my reputation tainted as fraudulent for my entire life, even when I always try to be honest.

    What are other similar non-literal things Stata may do? I want to know all of them comprehensively.
    Last edited by James Park; 20 Mar 2019, 01:15.

    Comment


    • #3
      See
      Code:
      help set varabbrev
      and
      Code:
      help clonevar

      Comment


      • #4
        Code:
        clear
        set obs 2
        gen var1 = 2
        replace va = 1
        
        (2 real changes made)
        Code:
        clear
        set obs 2
        gen var1 = 2
        gen var2 = 3
        replace va = 1
        
        va ambiguous abbreviation
        r(111);

        Comment


        • #5
          Amazing. Thank you very much.

          Also are you aware of any other example where a novice user may write something that doesn't mean what it looks it means?

          I want to know all of them comprehensively because I am really terrified by the possibility of being mistaken as a fraud.

          Originally posted by Andrea Discacciati View Post
          See
          Code:
          help set varabbrev
          and
          Code:
          help clonevar

          Comment


          • #6
            Originally posted by James Park View Post
            For this problem I fixed by writing "set type double, permanently" in every do file I have because absolutely nothing comes before accuracy of research.
            I think that is incorrect. When storing data you need to be realistic about the accuracy of your measurement. You can be certain that your measurements are not accurate upto seven or eight decimal digits. For most variables I work with I trust 2 or maybe 3 significant digits, but no more. Maybe you work in a field that is easy to measure and you can get 4 significant digits, but typically the cost of an extra significant digit extremely fast. 7 significant digits is just not feasible. So for most variables a float is already monumental overkill. So I would consider setting double as a default a bad practice as it does not take your measurement seriously.

            Exceptions are id variables and intermediate steps in a computation. But even there you need to be aware of the fact that a double it still an approximation, and it won't always work as expected. So there to it is good to know how computers work. Here are some resources:

            https://blog.stata.com/2012/04/02/th...-to-precision/

            https://www.youtube.com/watch?v=PZRI1IfStY0

            Originally posted by James Park View Post
            I am terrified by the possibility that I might not recognize these technicalities, submit wrong result, get my reputation tainted as fraudulent for my entire life, even when I always try to be honest.
            Don't worry, people can recognize honest mistakes. Real research projects are large and complex and done by humans, and humans make mistakes. That is a fact of life. What makes work scientific is not the absence of mistakes (that is impossible), but the fact that we document what we do, so other people can understand how we arrived at a certain conclusion and make up their own mind on whether they find that convincing or not.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              I agree with everything others have said in reply. (My own contribution in one corner was writing clonevar.)

              In essence you are surprised by variable name abbreviation. My own view on that: I have strong sympathy for those who think allowing this was a design error and sternly switch set varabbrev off -- but in practice I got used to exploiting it when I first started using Stata and don't want to give it up. And certainly, every now and again it bites, just like any small error.

              Naturally the forum is all about asking questions at any level, but the real surprise for me is that you make so many strong statements about what's wrong and have evidently yet to read all of the first half of the the User's Guide, where this and much else is explained, and which really is essential as a start on a serious understanding of Stata.

              Sorry, but there really can't be a complete list of ways in which you can be burned by using Stata. Using encode on string dates is one, as is destring (usually), but there users just aren't looking carefully enough at the principle and the results.

              Comment


              • #8
                Originally posted by James Park View Post
                If I had believed that my regression output was correct while not realizing this strange behavior of Stata, my research would have been totally wrong
                I believe that in science there usually is no "correct" or "totally wrong" way of doing things. Admittedly, technical errors, such as addressing the wrong variable are one exception; but I seriously doubt that anyone would come to the conclusion that researchers do this on purpose.

                Concerning the differences between float and double precision approximation (that you have brought up before), I would like to add to Maarten's comment. Even if you are working in a field where you have measures that are accurate up to 6 or more digits, typing regress and choosing a linear model, thus modelling conditional (arithmetic) means, assuming linearity, exogeneity, and independent identically distributed errors, over another model, will have a much larger impact on your results than any differences caused by storage type of your variables.

                Best
                Daniel
                Last edited by daniel klein; 20 Mar 2019, 04:58.

                Comment


                • #9
                  First things first. It is so wise to abide by this tenet that it should become a mantra in our times.

                  Basic knowledge and minimum dexterity when dealing with any statistical software is surely a hit or miss.

                  The information considered as "shocking" is not at all concealed.

                  Rather, it is part and parcel of any introductory delving into basic Stata commands.

                  I gather I learned about this aspect in my first lesson, quite a while ago.

                  I assume this shall not be the case but, generally speaking, what appalls me most is the very decision to fiddle with more complex deeds without having a good grasp of trivial chores.

                  Sadly enough.
                  Last edited by Marcos Almeida; 20 Mar 2019, 05:31.
                  Best regards,

                  Marcos

                  Comment


                  • #10
                    Thank you everyone for wonderful advice!

                    Comment

                    Working...
                    X