Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing string scalar odd behaviour. Intended or bug?

    Hopefully this is not a repeat; could not find references to this issue in a quick search.

    In mata (Stata 14.2), when I index a string scalar, I get the following:

    . mata: s = "hello"; s[1]
    hello

    . mata: s = "hello"; s[2]
    <istmt>: 3301 subscript invalid
    r(3301);
    It seems like s[1] is just an alias for s. I was expecting to get the byte at position 1. Is this the intended behaviour? If so, why?
    if 2 is invalid subscript as per the error message above, what is the range of valid subscripts in this case?

    How do I get the byte at a specific index (other than using substr)?

    Thanks,

    /salah

  • #2
    Originally posted by Salah Mahmud View Post
    It seems like s[1] is just an alias for s. I was expecting to get the byte at position 1. Is this the intended behaviour? If so, why?
    Yes, it is intended behavior. [1] is the element index of the string matrix (vector), and not a byte position of a string scalar.

    if 2 is invalid subscript as per the error message above, what is the range of valid subscripts in this case?
    You've created a string matrix (vector) with a single element, and so its valid subscript range is 1.

    How do I get the byte at a specific index (other than using substr)?
    Wouldn't you be using substr() with 1 as the third argument?

    Comment


    • #3
      Thanks Joseph,
      For me, the ability to subscript a variable explicitly declared as scalar is counterintuitive except in the case of a string which in every programming language I used is either implicitly or explicitly a vector. I guess I will just have to get used to it.

      Comment


      • #4
        Scalars in Mata have one row and one column. See e.g. help m6_glossary

        scalar
        A special case of a matrix with one row and one column. A scalar may be substituted anywhere a matrix,
        vector, column vector, or row vector is required, but not vice versa.
        That's intuitive once it's understood. Further, all of these work with your example:

        Code:
        : s[1,1]
          hello
        
        : s[1]
          hello
        
        : s[,1]
          hello
        
        : s[1,]
          hello
        but no subscript 2 or more will work.

        Comment


        • #5
          Originally posted by Salah Mahmud View Post
          . . . the case of a string which in every programming language I used is either implicitly or explicitly a vector.
          I believe that Stata and Mata are written in such a programming language, and so string variables in them are implemented as such vectors (underneath). It's just that you access their elements (underneath) using a function, substr().

          Comment

          Working...
          X