Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why inlist is limited to 250 arguments

    I don't have a coding issue, I have more of a query about the inner workings of Stata. I'm using a synthetic controls approach to evaluate the causal impact of COVID-19 vaccine mandates on case rates. I have daily panel data for New York City (and in fact, all counties in NY state). I put the syntax in and Stata spits out error 130 "expression too long" at me. So, I figured I'd used trace to investigate.

    ```
    qui replace __00000J=1 if inlist(date,22128...22508)
    ```

    I looked it up, and apparently the inlist function can only have 250 arguments. I don't necessarily have a problem with this, since I've never used (and don't plan on manually using) more than 10ish arguments in a function; the obvious fix is to just limit my total time periods, which I've done. I also suppose not many people will be using daily data with that many time periods either, so in common practice, not many people would need to use that many arguments in inlist.

    But at the same time, I've never encountered an issue like this before. It just seems arbitrary. I guess my question is why is inlist limited to 250? Do other functions such as inrange have similar properties? Perhaps this is a query for someone from Stata Corp?

  • #2
    It just seems arbitrary. I guess my question is why is inlist limited to 250?
    I will speculate here and say that I don't think limits on arguments are so much an issue of the number of arguments allowed internally (though it may be and 250 is certainly a lot by any standard). Rather, I think it has more to do with speed and the expected use-case. Comparing reals is faster than comparing strings, so this may be the reasons for allowing more real values than strings. Certainly the choice to make -inlist()- accept fewer arguments than reals has some degree of subjectivity.

    To me, -inlist()- is best suited for is for convenience comparisons among a small set of elements. Logically, -inlist()- must proceed by making comparisons to each listed value until it either reaches a match or exhausts the lists, and that requires increasing passes through the data as the list grows. With a longer list, alternative approaches are likely to be more efficient, such as a reshape & merge. I also view -inlist()- as a last resort function, when I cannot take advantage of range comparisons or pattern matching.

    As an aside, -inlist()- allows different numbers of arguments, depending on whether the arguments are strings or numbers. The number of arguments is between 2 and 250 for real numbers and between 2 and 10 for strings, and all arguments must be of the same type.

    In your own case, when working with (contiguous) time periods, time-series operations may be useful, or -inrange()- seem more appropriate.

    Do other functions such as inrange have similar properties?
    This question doesn't make sense as -inrange()- defines exactly 3 required arguments.

    Comment


    • #3
      Not the answer, as I have no access to Stata's proprietary code and have heard no story on the precise details, but I started using Stata before either function was added and still appreciate the way that they can help. Sometimes.

      I still remember the frisson of realising -- belatedly -- that not only can you do


      Code:
      if inlist(z, 1, 2, 3, 4, 5)
      as a way of saying that z can be any of 1 to 5 but also you can do

      Code:
      if inlist(1, a, b, c, d, e)
      as a way of saying that a can be 1 or b can be 1 or .... e can be 1. That one saves typing!

      My speculation is that the belatedness of seeing that arises from our (or at least my) early mathematics in which people almost always wrote

      z = 42

      rather than say

      42 = z

      Why is that if the equals sign = means what it says? Regard = as meaning is rather than is equal to and then there is a definite nuance whereby

      z is 42 (meaning, in this context and for our purpose)

      is not quite equivalent to

      42 is z. The asymmetry is, I think, that 42 means what it means beyond this context while z is just arbitrary notation. I suppose no mathematician ever explained this to me at 11 or so (that I can remember) because they regarded it as too obvious to mention or as unnecessary to explain.

      Algol in the 1950s introduced := as the assignment operator because = was used for testing equality, if I understand correctly. That notation has been folded into mathematics too, although not very often perhaps, but I also see =: Both are nice ways of emphasising the small asymmetry here.

      As is well known, C and later Stata jumped the other way and used = for assignment and == for testing equality. (Was C the first language to do this?)

      Comment


      • #4
        Consider
        Code:
        . local list = 1
        
        . forvalues i = 2/250 {
          2.     local list `list',`i'
          3. }
        
        . generate x = max(`list')
        
        . generate y = max(`list',666)
        expression too long
        r(130);
        I surmise that there is an inherent limit of 250 to the number of arguments allowed by a built-in function that accepts an arbitrary number of arguments. This seems to be documented for inlist() but not documented for max() and min().

        Comment


        • #5
          In lesser contexts -- community-contributed commands I have written -- I have allowed a maximum of say 20 different bar specifications for a user's bar chart, 20 being a limit to what is supported, on the grounds that no sensible user should ever want more -- and if they really do, then sorry, but they need to write their own code.or use some other command.

          Comment

          Working...
          X