Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues with Rangestat

    Hello,

    I am currently having some issues with the "rangestat" function. I am attempting to compute a rolling mean of variable X, which has unique values for each year and state. I would like to compute a rolling average of X using all the values of X since 1960. For example, for the entry "Alabama" "1990" I would like to generate the average value of X for each year between 1960 and 1990 within Alabama.

    In order to do this, I constructed the variable "years_from_1960" as -(year - 1960). This command works fine. Then I do the following:

    rangestat (mean) rm_X = X, interval(years_from_1960 years_from_1960 0) by(statename)

    Initially, this command correctly generated rolling averages. However, the last few times I have run the command (without changing anything), rm_X is just generated as the value of X for that entry. Is this a common error with rangestat? And is there a quick fix?

    Thanks,

    Erik

  • #2
    It isn't clear to me what is going wrong, and as you don't give example data, I can't verify your problem or troubleshoot it. Suffice it to say, your approach is unnecessarily complicated. Why not do:
    Code:
    gen first_year = 1960
    rangestat (mean) rm_X = X, interval(year first_year 0) by(statename)
    Added: If this doesn't solve your problem, when you post back please provide example data that illustrates the problem you are having. Use the -dataex- command to do that. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thanks for your reply Clyde.

      Regarding your suggestion, it was my impression that the arguments for the "interval" command needed to be of the same variable. That is, if doing interval (year x y), that both x and y need to be values of the "year" variable. Is this incorrect? If so, your approach should work for my purposes, though I am still not sure why I am getting the underlying error.

      Comment


      • #4
        -interval(vble lb ub)- has three pieces. vble must be a variable in your data set. lb and ub can be either variables, numbers, or missing values(.). If lb and ub are variables, Stata uses observations in which lb <= vble <= ub. If they are numbers, Stata uses observations in which current obs' vble + lb <= vble <= current obs' vbles + ub. Missing values simply serve as negative or positive infinity in this regard when specified as lb or ub, respectively.

        You can "mix and match" variable and number (or missing value) specifications of lb and ub. When you have -interval(year first_year 0)- Stata will use observations in which year >= first_year and year <= current obs' year + 0 (which = current obs' year).

        Comment


        • #5
          I see, that makes sense, thank you.

          Regarding example data, my dataset is structured as follows:

          state yr total_amount yrs_from_1960
          AL 1960 9999999 0
          AL 1961 894735927 -1
          AL 1962 317409257 -2
          ....
          AL 2020 39284092 -60
          AK 1960 398025 0
          AK 1961 3258940 -1
          ....

          And so forth for all states and years 1960-2020. Does this provide enough information to help troubleshoot my issue?

          Editted: changed 60 --> -60 in example data (-60 is how yrs_from_1960 for 2020 entries are displayed)
          Last edited by Erik Tiersten-Nyman; 10 Aug 2022, 10:51.

          Comment


          • #6
            OK, I probably should have seen this earlier just looking at the syntax, but having run it with the data in front of me it is now clear what the problem was.

            -interval(yrs_from_1960 yrs_from_1960 0)- means that the current observation will have its statistics calculated from observations where the value of yrs_from_1960 lies between (inclusive) the value of yrs_from_1960 in the current observation and yrs_from_1960 in the current observation + 0 . So that boils down to only observations having the same values of yrs_from_1960 as the current observation. Since you don't, at least in the example, have any observations (in the same state) with the same year, that means just the current observation itself.

            Comment


            • #7
              Thank you for the help Clyde!

              Comment

              Working...
              X