Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Moving average invalid syntax

    Hi all,

    I'm creating my own moving average code since I only want the top four of the previous seven years average for my variable max_yield. I'm getting the error in the forvalues loop. I included the rest in case it can help provide context.

    Thanks in advance.

    sort nrate_year // sort the data by year before running "by nrate_year"
    by nrate_year, sort: egen max_yield = max(cornyieldbuac)
    egen j = seq(), f(1) t(47) //generates repeating sequence to isolate each year's max yield
    xtset j nrate_year
    sort j max_yield

    forvalues y = 2016/2020 {
    // generate a variable that is equal to 1 for the years that are within the previous seven years of the current year
    gen in_range = (nrate_year >= `y'-7 & nrate_year <= `y'-1)

    // tag the top four observations in each group
    by j: egen top_four = tag(max_yield, tag(1 2 3 4) group)

    // generate the moving average of the best four years for the current year
    by j: egen ma = mean(max_yield) if nrate_year == `y' & top_four == 1

    // drop the intermediate variables
    drop in_range top_four
    }


  • #2
    I can't follow your code as there is no example data, and it is unclear what the variables you are working with are. However, it is clear that -by j: egen top_four = tag(max_yield, tag(1 2 3 4) group)- is a syntax error.

    That said, here is code that will give you a running moving average of the top 4 out of the preceding 7 years. I use the online grunfeld data set to illustrate it. This code calculates this moving average of the top 4 for the variable mvalue. Anyway, you will need to adapt it to the actual variables in your data set.

    Code:
    clear*
    webuse grunfeld
    
    capture program drop one_ma
    program define one_ma
        sort mvalue
        summ mvalue in 4/7, meanonly
        gen moving_average_top_4 = r(mean)
        keep in L
        foreach v of varlist year invest mvalue kstock time {
            replace `v' = rr_`v'
        }
        exit
    end
    
    rangerun one_ma, by(company) interval(year -7 -1) sprefix(rr_)
    -rangerun- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.
    Last edited by Clyde Schechter; 20 Apr 2023, 18:59.

    Comment


    • #3
      Clyde's program may behave surprisingly with missing values or incomplete sequences. Depending on what you want, these tweaks may get you closer.

      Code:
      clear*
      webuse grunfeld
      
      gen test = _n if mod(_n, 2)
      
      * for -test- substitute your own variable name 
      capture program drop one_ma
      program define one_ma
          gsort -test 
          summ test in 1/4, meanonly
          gen moving_average_top_4 = r(mean)
          gen count = r(N)
          keep in L
          exit
      end
      
      rangerun one_ma, by(company) interval(year -7 -1)
      As the original author of egen, tag() I confirm that the syntax used in #1 is fantastic in one sense or another.

      This code was written without any benefit or detriment from AI.

      Comment


      • #4
        Nick is right. My code in #2 will give incorrect results when the variable being averaged has missing values or incomplete sequences. I believe the following revision is robust to that problem, in that it will only attempt to calculate the moving average if 1) observations for the preceding 7 years all exist in the data set, and 2) at least four of those observations have non-missing values for the variable being averaged.

        Code:
        clear*
        webuse grunfeld
        
        set seed 1234
        replace mvalue = . if runiform() < 0.1
        
        * for -test- substitute your own variable name
        capture program drop one_ma
        program define one_ma
            gen byte populated = !missing(mvalue)
            summ populated, meanonly
            if r(N) == 7 & r(sum) >= 4 {
                sort populated mvalue
                summ mvalue in 4/7, meanonly
                gen moving_average_top_4 = r(mean)
                drop populated
            }
            keep in L
            exit
        end
        
        rangerun one_ma, by(company) interval(year -7 -1) use(mvalue)

        Comment


        • #5
          Hi Clyde and Nick, thank you so much for your help. I will, in the future, be sure to include the example data. While Clyde's code didn't quite work with my data (because I didn't provide any of the details about my data), Nick's did once it was adapted.

          Additionally, I was trying to run this in a loop that was appending similar files from simulations I'm running, and I actually referred to a post by Clyde from 2014 to check how to do this properly. And so, truly, thank you both for this public service! We all appreciate you.

          Comment

          Working...
          X