Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When I use rangestat, there are missing variables returned even though it seems like the interval conditions established are met?

    I have a data set where I want sum the variable epsfiq for two consecutive calendar years (ie 2020 and 2019) by a subgroup variable, cusip.

    When I executed the code, it reflects missing variables even though the codition is met.

    Code:
    describe
    sort cusip datadate
    gen calyear=year(datadate)
    rangestat (sum) eps8qtr=epsfiq, int(calyear 2020 2019) by(cusip)
    Here is the output of the data ( I included the dataex command below and you can see the variable missing. My question is why is rangestat returning missing variables when it seems like the condition is met?

    . dataex cusip calyear epsfiq eps8qtr


    ----------------------- copy starting from the next line -----------------------
    [CODE]
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 cusip float calyear double(epsfiq eps8qtr)
    "62945V109" 2011 . .
    "62945V109" 2011 . .
    "62945V109" 2011 . .
    "62945V109" 2011 .36 .
    "62945V109" 2012 -.09 .
    "62945V109" 2012 .11 .
    "62945V109" 2012 .17 .
    "62945V109" 2012 .21 .
    "62945V109" 2013 .23 .
    "62945V109" 2013 .18 .
    "62945V109" 2013 .23 .
    "62945V109" 2013 .1 .
    "62945V109" 2014 .13 .
    "62945V109" 2014 .19 .
    "62945V109" 2014 .31 .
    "62945V109" 2014 .25 .
    "62945V109" 2015 .18 .
    "62945V109" 2015 .25 .
    "62945V109" 2015 .38 .
    "62945V109" 2015 .33 .
    "62945V109" 2016 .25 .
    "62945V109" 2016 .31 .
    "62945V109" 2016 .33 .
    "62945V109" 2016 .31 .
    "62945V109" 2017 .21 .
    "62945V109" 2017 .4 .
    "62945V109" 2017 .55 .
    "62945V109" 2017 1.06 .
    "62945V109" 2018 .39 .
    "62945V109" 2018 .69 .
    "62945V109" 2018 .62 .
    "62945V109" 2018 .62 .
    "62945V109" 2019 .44 .
    "62945V109" 2019 .7 .
    "62945V109" 2019 .46 .
    "62945V109" 2019 .28 .
    "62945V109" 2020 .33 .
    "62945V109" 2020 .36 .
    "62945V109" 2020 .61 .
    "62945V109" 2020 .35 .
    "62945V109" 2021 .41 .
    "62945V109" 2021 .91 .
    "62945V109" 2021 .83 .
    "62945V109" 2021 1.02 .
    "62945V109" 2022 .57 .
    "62945V109" 2022 1.13 .
    "62945V109" 2022 1.05 .







  • #2
    rangestat (sum) eps8qtr=epsfiq, int(calyear 2020 2019) by(cusip)
    is not the correct command for your purpose. There are two problems. The interpretation of the -interval()- option in -rangestat- is a bit complicated. When the second and third arguments are specified as constants (not variables), they do not mean that the range is with observations falling between those numbers. Rather, the range is being defined as those observations whose value of calyear falls between the current observation's value of calyear + 2020 and the current observation's value of calyear + 2019. So in your data you would be looking to match to observations having calyear something like 4035! The other problem is that the range is always defined as second argument <= value <= third argument. But since 2020 > 2019, there will never be any such observations (even if you had observations with calyear around 4035). The arguments need to be in the other order.

    Putting that all together, the correct use of -rangestat- for your purpose would be:
    Code:
    gen lower = 2019
    gen upper = 2020
    rangestat (sum) eps8qtr = epsfiq, int(calyear lower upper) by(cusip)
    All of that said, you don't need -rangestat- for this You can get the result you want with:
    Code:
    by cusip, sort: egen eps8qtr = total(cond(inrange(calyear, 2019, 2020), epsfiq, .))
    Last edited by Clyde Schechter; 04 Dec 2022, 07:22.

    Comment


    • #3
      rangestat is from SSC, as you are asked to explain (FAQ Advice #12).

      Note that in the question the reported problem is strictly one of missing values. Variables themselves can be missing in none, some or even all observations, but the variables here are present; the issue is that their values are (all) missing.

      That is, if you wish, a piece of pedantry about terminology, tedious or trivial even.

      More importantly, this is first to confirm @Clyde Schechter's point about rangestat syntax. The syntax may be surprising but it is really well chosen -- and I can say that because the choice in this important detail is entirely to the credit of the first and main author, Robert Picard.

      The interval syntax with numbers # # -- interval(varname # #) -- specifies offsets, not absolute values, precisely because this is what rangestat is here designed to do, to support moving windows.

      There are other commands supporting moving windows -- such as tssmooth -- but rangestat and its sibling rangerun are fairly general, especially insofar as they are moderately programmable. See an example from earlier today at https://www.statalist.org/forums/for...-maximum-value

      Conversely, there are indeed many problems in which you want to summarize within a fixed window, but syntax to do this has been present in Stata since very early days, at the simplest by using an if condition and otherwise by using cond().

      For Clyde's device in his last example see also Section 9 in https://www.stata-journal.com/articl...article=dm0055

      Comment


      • #4
        Thanks Clyde! I used the suggested code with the objective of summing 8 quarter for the past two years and then summing 8 quarters for two years after lagging 4 quarters. In other words summing all of calendar year 2020 and 2019 and then summing all of 2019 and 2018 and keep repeating this process until I have summed the two years of earnings back to 2012. Here is the code I used:

        Code:
        describe
        
        sort cusip datadate
        
        gen calyear=year(datadate)
        
        by cusip, sort: egen eps2020t8qtr = total(cond(inrange(calyear, 2019, 2020), epsfiq, .))
        
        by cusip, sort: egen eps2019t8qtr = total(cond(inrange(calyear, 2018, 2019), epsfiq, .))
        
        by cusip, sort: egen eps2018t8qtr = total(cond(inrange(calyear, 2018, 2017), epsfiq, .))
        
        by cusip, sort: egen eps2017t8qtr = total(cond(inrange(calyear, 2017, 2016), epsfiq, .))
        However the results show missing sums for 2018, and 2017. What am I missing?


        ----------------------- copy starting from the next line -----------------------
        [CODE]
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str10 cusip float calyear double epsfiq float(eps2020t8qtr eps2019t8qtr eps2018t8qtr eps2017t8qtr)
        "62945V109" 2011 . 3.53 4.2 0 0
        "62945V109" 2011 . 3.53 4.2 0 0
        "62945V109" 2011 . 3.53 4.2 0 0
        "62945V109" 2011 .36 3.53 4.2 0 0
        "62945V109" 2012 -.09 3.53 4.2 0 0
        "62945V109" 2012 .11 3.53 4.2 0 0
        "62945V109" 2012 .17 3.53 4.2 0 0
        "62945V109" 2012 .21 3.53 4.2 0 0
        "62945V109" 2013 .23 3.53 4.2 0 0
        "62945V109" 2013 .18 3.53 4.2 0 0
        "62945V109" 2013 .23 3.53 4.2 0 0
        "62945V109" 2013 .1 3.53 4.2 0 0
        "62945V109" 2014 .13 3.53 4.2 0 0
        "62945V109" 2014 .19 3.53 4.2 0 0
        "62945V109" 2014 .31 3.53 4.2 0 0
        "62945V109" 2014 .25 3.53 4.2 0 0
        "62945V109" 2015 .18 3.53 4.2 0 0
        "62945V109" 2015 .25 3.53 4.2 0 0
        "62945V109" 2015 .38 3.53 4.2 0 0
        "62945V109" 2015 .33 3.53 4.2 0 0
        "62945V109" 2016 .25 3.53 4.2 0 0
        "62945V109" 2016 .31 3.53 4.2 0 0
        "62945V109" 2016 .33 3.53 4.2 0 0
        "62945V109" 2016 .31 3.53 4.2 0 0
        "62945V109" 2017 .21 3.53 4.2 0 0
        "62945V109" 2017 .4 3.53 4.2 0 0
        "62945V109" 2017 .55 3.53 4.2 0 0
        "62945V109" 2017 1.06 3.53 4.2 0 0
        "62945V109" 2018 .39 3.53 4.2 0 0
        "62945V109" 2018 .69 3.53 4.2 0 0
        "62945V109" 2018 .62 3.53 4.2 0 0
        "62945V109" 2018 .62 3.53 4.2 0 0
        "62945V109" 2019 .44 3.53 4.2 0 0
        "62945V109" 2019 .7 3.53 4.2 0 0
        "62945V109" 2019 .46 3.53 4.2 0 0
        "62945V109" 2019 .28 3.53 4.2 0 0
        "62945V109" 2020 .33 3.53 4.2 0 0
        "62945V109" 2020 .36 3.53 4.2 0 0
        "62945V109" 2020 .61 3.53 4.2 0 0
        "62945V109" 2020 .35 3.53 4.2 0 0
        "62945V109" 2021 .41 3.53 4.2 0 0
        "62945V109" 2021 .91 3.53 4.2 0 0
        "62945V109" 2021 .83 3.53 4.2 0 0
        "62945V109" 2021 1.02 3.53 4.2 0 0
        "62945V109" 2022 .57 3.53 4.2 0 0
        "62945V109" 2022 1.13 3.53 4.2 0 0
        "62945V109" 2022 1.05 3.53 4.2 0 0



        Comment


        • #5
          The problem with #4 is one identified in #2. inrange(varname, #1, #2) is only useful if #1 <= #2. It is not a syntax error to look for years >= 2018 and also <= 2017 but there weren't any, so the total is returned as 0.

          In other words the dates in your last two commands are the wrong way round as is also shown by comparing the syntax of the previous two commands.

          Comment

          Working...
          X