Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • built-in variables

    Hi,
    I have a variable names idgroup
    idgroup
    1
    2
    3
    I want to replace the second observation by 1:

    . replace idgroup[_n]=1 if _n==2

    I get the error : "weights not allowed"

    What is wrong with my code?

    Thanks,
    Navid


  • #2
    The following code will avoid the error message and accomplish what is needed.
    Code:
    replace idgroup=1 if _n==2
    This doesn't explain what the error message is trying to tell us, although it appears that Stata requires a variable name to the left of the equal sign, not a subscripted variable. In help subscript it is implicit that subscripting is used to return a value of a variable, as opposed to referencing a variable for assignment, I guess.

    Comment


    • #3
      Code:
      replace idgroup = 1 in 2
      Type the following in Stata to see the syntax for replace:

      Code:
      help replace

      Comment


      • #4
        Your longer, larger goals for doing this are opaque. but for the immediate goal of making observation 2 have the value 2, it would be:
        Code:
        replace idgroup=2 if _n==2
        or, if you want whatever value came before it, then

        Code:
        replace idgroup=idgroup[_n-1] if _n==2
        Does this clarify anything?

        Comment


        • #5
          There is an important advantage of

          Code:
          replace idgroup = 2 in 2
          over

          Code:
           
          replace idgroup = 2 if _n == 2
          With the first, Stata does what you would expect, go straight to observation 2, work on it and then stop.

          With the second, Stata tests every observation to see if the specified condition is true.


          Comment


          • #6
            Thanks for clarification

            Comment


            • #7
              My thanks to Nick for pointing out the efficiency gains when you know a particular observation number to replace. Now if someone could just explain the error message that Navid and I received when using a subscripted expression on the left of the equal sign.
              Code:
              . about
              Stata/SE 13.1 for Mac (64-bit Intel)
              Revision 19 Dec 2014
              Copyright 1985-2013 StataCorp LP
              . clear
              . set obs 1
              obs was 0, now 1
              . generate x = 41
              . replace x[n] = 42
              weights not allowed
              r(101);
              I was going to suggest that help replace be more explicit about the limitation to a bare variable name on the left, but on reflection an error message that better reflected the problem would suffice. It's an easy error to make, coming to Stata from other programming languages.

              Comment


              • #8
                The error message is best understood with two ground rules in mind.

                1. Stata (if you like, the command parser in Stata, but this will be long-winded enough) here is a robot. It knows rules of syntax but is clueless on meaning.

                2. Stata works from left to right, tokenizing as it goes. Tokens are defined by white space, quotation marks, etc.

                Stata looks at

                Code:
                 
                replace
                and then

                Code:
                x
                and is happy in so far as it is robotically possible. The first token was replace(a command name) and the second was x (a variable name); so far, so good.

                The next token was (fixing a typo)

                Code:
                [_n]
                


                and Stata here has only one thought in mind: you are trying to specify weights where they don't belong. The square brackets are taken by Stata as a cue that you are trying to specify weights.

                Naturally, you are getting an error message here, which is right because the syntax is wrong, but for the wrong reason. The reason that you put that there was a misguided belief either that subscripts are required there (wrong) or that subscripts are allowed there (also wrong).

                By and large, Stata has a complicated syntax and one that is fairly consistent, with emphasis on "complicated" and "fairly". The net consequence is that Stata's error messages are almost always right but the errors are often wrong because the parser is not indefinitely subtle at catching the idiosyncrasies of individual commands.

                The cost of living with a very general parser is some inefficiency of diagnosis. One of the upsides (is that a word?) of having such a parser is that it usually does most of the work for a user-programmer. The converse of having to write intricate parsers to handle user syntax would cut user-written software by a fair proportion.








                Comment


                • #9
                  Thanks, Nick. I have no particular insight into how the parser is processing the statement, but having not yet reached the equal sign, I'd think it could display a different error message ("weights or subscripts not allowed here") at that point than it would later. But that's a decision of StataCorp. Given that, though, I'm going to revert to my initial suggestion that help replace be more explicit, in the description section, about the limitation to a bare variable name on the left for either newvar or oldvar.

                  Comment


                  • #10
                    The first wish is natural enough, but I have a strong sense that that's a much more difficult problem that you might
                    think.

                    It's one thing for a parser to detect incorrect syntax; it's another thing to analyse the user's intent or mental error.

                    Code that is based on semantics or attempts to decode what the user intended is not only more difficult to write but also potentially highly buggy. I can't give chapter and verse for that impression; it's a kind of personal distillation of comments by Stata developers over several years.

                    Independently of that, specific code for parsing specific commands is a long-term nightmare. Every now and again Stata changes its syntax and it's a major job for developers to do that between versions. It's like a major transplant. Doing that if the parsing code were spread across numerous individual routines could lead to far more problems than it solved -- or take developer time from the innovations most wanted by users.

                    On your last wish, I beg to disagree. The syntax diagram for replace spells out that only a variable name is allowed before the equals sign. It's a tough call for many users to learn that the syntax diagram means what it says and says what it means, but both seem reasonable principles to me.

                    Replace contents of existing variable.

                    replace oldvar =exp [if] [in] [, nopromote]

                    However, this is a difficult call either way. Several of us have written FAQs or pieces in the Stata Journal on common Stata "gotchas" that often get overlooked, sometimes despite explicit and repeated documentation, and it's a judgment call whether this error deserves that treatment, although I incline to not.

                    Comment


                    • #11
                      Actually, the example we're discussing makes your point that analyzing the user's intent or mental error is difficult. The problem with the existing error message is that the parser is has attempted to analyze the user's intent, by assuming the left bracket introduces an impermissible weight specification rather than an equally impermissible subscript. So perhaps the error message should be corrected to restrict itself to the indisputable statement "a left bracket appears in a place where it is not allowed, whatever the purpose was you intended it for".

                      I agree that this doesn't deserve a FAQ. "Regular Expressions" deserves an up-to-date and reasonably comprehensive FAQ, or alternatively, the existing FAQs from 2005 and 2008 should be taken down or annotated as out-of-date. (I can't pass up an opportunity to raise that issue.)

                      Comment


                      • #12
                        I too would like to see more detailed documentation on regular expressions.

                        Comment


                        • #13
                          FWIW, I think the problem of writing error messages is just a no-win situation. If the message is precise and informative as "a left bracket appears in a place where it is not allowed,..." some will find the message too concrete and obscure. Particularly if there are other left brackets in the command that are allowed. On the other hand, warning people that weights aren't allowed with a command (attempting to guess what they were trying to do) when the brackets in question are actually an equally illegal attempt to introduce subscripts, can be confusing. Nick's narrative about the complexities of building semantics into parsers is quite true. I know because it is something I (briefly) lived through earlier in my career (and am very grateful I will never have to relive it!) There is just so far you can go with syntactic analysis of errors, particularly if the parser has limited look-ahead capabilities. No matter what you do, it will prove inept in some situations.

                          So I think the standard to judge Stata's error messages by is how they compare to those of other actual programming language processors, not some ideal that is not achievable in practice. On that metric, I'd say Stata does rather well.

                          Comment


                          • #14
                            Leaving aside clarity of error messages, is there any particular reason for not allowing subscripts in the left-hand-side?
                            What could be another interpretation of the user's intent in a command:
                            Code:
                            replace x[5]=99
                            and how do weights come into play here (something that Stata complains about in the error message)?

                            Or why not take it one step further to
                            Code:
                            replace .DATA[5,7]=99
                            to do exactly what
                            Code:
                            st_store(5,7,99)
                            does?

                            I am surprised there is still no Stata class interfacing the Stata data itself.

                            Best, Sergiy

                            Comment


                            • #15
                              Originally posted by William Lisowski View Post
                              "Regular Expressions" deserves an up-to-date and reasonably comprehensive FAQ, or alternatively, the existing FAQs from 2005 and 2008 should be taken down or annotated as out-of-date. (I can't pass up an opportunity to raise that issue.)
                              I just read the FAQ on regular expressions (2005) again (I guess the 2008 reference is from here) and I can't find anything inaccurate in the FAQ and I am not aware of new capabilities that would make the FAQ out-of-date. I think that the FAQ goes over the basics of what's supported and makes an effort to indicate what's not supported:

                              Other popular regular-expression syntaxes include the POSIX standard and Perl’s standard. Both expand on these basic operators by including counting operators (use of curly braces), metacharacters (usually of the form :alpha:, etc.), and other syntax-specific additions.
                              [...]

                              Most of these extra syntax elements, however, are not critical and can be represented, albeit in longer form, with Stata’s current parser.
                              Don't get me wrong, I think that Stata users could benefit from better online help documentation (e.g. help regexm()) and some additional tutorials that are especially tuned to the idiosyncrasies of using regular expression searches with Stata string variables (data management) and macros (programming).

                              I also would like to request support for lazy quantifiers.




                              Comment

                              Working...
                              X