Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Building the mean (by id) of interval values around an arbitrary year

    Hello,

    I am trying to visualise an aggregate trend.

    The variables I use are: year , expend, id, year_event

    Now I want to show the expenditures of every id 10 years before and ten years after year_event that is a unique year for every id.
    So, in the end there should be one line that runs over the timeline:
    year_event -10 yrs; year_event -9 yrs; year_event -8 yrs; ....year_event; year_event +1 yr; .....

    Can someone help, please?
    Thank you!

    Rebecca

  • #2
    I'm not sure I understand what you have in your data or what you want to do with it. Moreover, what you describe in the post is rather different from the title of the post. Perhaps you are trying to build up these 21 variables for the purpose of taking their mean? If so, it is a waste of time, memory, and energy to do that when you can instead just do this:

    Code:
    gen low = year_event - 10
    gen high = year_event + 10
    rangestat (mean) expend, by(id) interval(year low high)
    Notes:
    1. -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.
    2. Code not tested, as no example data was provided.

    If this is not what you want, please, when posting back, show example data using the -dataex- command. Also show what you want the results to look like.

    In general, in Stata, creating a series of separate variables like what you describe in #1 is seldom useful. Most things that you would accomplish with those can be better done in other ways, and references to observations separated in time are often best accomplished using the lag and lead operators (-help tsvarlist-).

    If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Well, your definitions of high and low do not, I think, do what you intend. When rangestat encounters an observation where, say, high is missing value (which with your code will be the case if yftrans >= 2007), it interprets that as meaning you want to include all observations with year >= low and no upper limit. I think what you intend here is not that, but rather that you don't want to go looking at years beyond 2007. So your -gen low- and -gen high- statements should be:
      Code:
      gen low = max(yftrans-10, 1960)
      gen high = min(yftrans+10, 2017)
      I also am unclear what purpose the -by(year)- option in your -rangestat- command serves here. I think you want, in each observation, grand means for all observations of any country from years that fall between yftrans-10 and yftrans+10. If that is correct, the -by(year)- option is wrong. Perhaps I am misunderstanding what you want, however.

      Finally, there is no need to loop -rangestsat- over your list of variables. You can just write
      Code:
      rangestat (mean) bcrude dcrude grpop, interval(year low high)
      with no loop at all.

      Also, again I may be misunderstanding what you want to do, but it appears to me that in your -twoway line- command the variable you want on the horizontal axis would be yftrans, not year.

      Added: With regard to attaching data sets, that is not mentioned in the FAQ because it is discouraged. Many Forum members, myself included, would be reluctant to download anything from a stranger. It's too risky. That's why we have -dataex- to show example data.

      Comment


      • #4
        OK. I think I completely misunderstood what you wanted before. Your illustration made it much clearer. I think you want to do this:

        Code:
        //    CREATE A FILE CONTAINING AVERAGE RATES ACROSS
        //    COUNTRIES FOR ALL YEARS
        preserve
        collapse (mean) bcrude dcrude grpop, by(year)
        rename (bcrude dcrude grpop) =_mean
        tempfile means
        save `means'
        list, noobs clean
        
        //    NOW FOR EACH OBSERVATION WITH A VALUE OF YFTRANS
        //    MATCH THAT OBSERVATION TO THE AVERAGES OF ALL YEARS
        //    BETWEEN 10 YEARS BEFORE AND 10 YEARS AFTER
        restore
        keep id yftrans
        drop if missing(yftrans)
        gen low = yftrans - 10
        gen high = yftrans + 10
        rangejoin year low high using `means'
        //    MAKE A GRAPH
        gen delta = year - yftrans
        graph twoway line *_mean delta, sort
        This code is not testable in the example data you provided earlier because that data contains only one year, 1960, that has any matching data in the example, and in both instances where that happens yftrans = 1965, so the "graph" is only a single point on each curve. Nevertheless, I think this is correct.

        This requires the -rangejoin- command, by Robert Picard, available from SSC. -rangejoin- requires the -rangestat- command, but I believe you already have that based on what you show. (If you don't have -rangestat-, you can get it from SSC.)

        Comment

        Working...
        X