Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generate average over the previous 12 months with missing

    Hi all,
    Using a monthly panel dataset I'm trying to build a variable containing the average earnings of a worker over the last 12 months. There are workers with missing observations in my data so for those workers I would like to use the available information. For example, if in the last year the worker was observed for 10 months I want the average earnings over those months. Does anyone have an idea on how to generate this variable? I'm including data for a worker so you can see what I mean. I basically want what I would obtain if I were to use the average function in excel. I need something like rowtotal but for columns.
    Thanks for your help.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float cal_month double renta_all float avg_earning_prev_12m_2
    514 124670         .
    515 249340         .
    516 228790         .
    517 287700         .
    518 401410         .
    519 313730         .
    520 275370         .
    521 205500         .
    522 221940         .
    523 197280         .
    524 267150         .
    525 216460         .
    526 256190 249111.67
    527 221940 260071.67
    528 219200 257788.33
    529 267150 256989.17
    530 256190 255276.67
    531 302770    243175
    532 228790 242261.67
    533 226050    238380
    534 184950  240092.5
    535 306880    237010
    536 234270 246143.33
    537 206870 243403.33
    538 219200 242604.17
    539 246600 239521.67
    540 210980 241576.67
    541 213720 240891.67
    542 150700 236439.17
    543      . 227648.33
    544  16440         .
    545      .         .
    546      .         .
    547      .         .
    548      .         .
    549      .         .
    550      .         .
    551      .         .
    552      .         .
    553      .         .
    554      .         .
    555      .         .
    556  97270         .
    557 197280         .
    558  80830         .
    559      .         .
    560      .         .
    561      .         .
    562      .         .
    563      .         .
    564      .         .
    565      .         .
    566      .         .
    567   1370         .
    568      .         .
    569      .         .
    570      .         .
    571      .         .
    572      .         .
    573      .         .
    574      .         .
    575      .         .
    576      .         .
    577      .         .
    578      .         .
    579      .         .
    580      .         .
    581      .         .
    582      .         .
    583 363050         .
    584 374010         .
    585 374010         .
    586 374010         .
    587  49320         .
    588      .         .
    589      .         .
    590      .         .
    591      .         .
    592      .         .
    593      .         .
    594      .         .
    595      .         .
    596      .         .
    597      .         .
    598      .         .
    599      .         .
    600      .         .
    601      .         .
    602      .         .
    603      .         .
    604      .         .
    605      .         .
    606      .         .
    607      .         .
    608      .         .
    609      .         .
    610      .         .
    611      .         .
    612      .         .
    613 172620         .
    end
    format %tm cal_month

  • #2
    Code:
    rangestat (mean) wanted = renta_all, interval(cal_month -12 -1)
    -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

    I assume that your real data set has more than one worker. So you will have to also specify the -by(worker_id_variable)- option when you run it there.

    Comment


    • #3
      Clyde Schechter

      Hi Clyde,

      i have a similar problem.

      I am trying to do the average of the past 12 months of the variable abs_PRC (format%9.0g) if the variable CUSIP(format %8s) is the same.

      sort CUSIP
      by CUSIP: gen avg_12mo_abs_PRC = (sum(abs_PRC[_n-11.._n]) / 12) if _n >= 12

      I found this formula online, but it gives me the error "variable _n not found".

      PS: i have no empty spaces when the CUSIP changes, that is why i want the CUSIP to remain the same, to avoid using different companies

      Any solutions?

      Comment


      • #4
        The syntax in your -by CUSIP: gen...- command is imaginative. But Stata's syntax parser has no imagination and the error message you got illustrates how you have confused it. It has no idea what to make of [_n-11.._n]. It is a construct which, perhaps, you borrowed from the way Stata accepts ranges of subscripts in some matrix commands. But Stata does not recognize it when subscripting a variable in a data set.

        The simplest solution for this problem is quite analogous to what you can see in #2.
        Code:
        rangestat (mean) avg_12mo_abs_PRC = abs_PRC (count) abs_PRC, by(CUSIP) interval(month -11 0)
        replace avg_12mo_abs_PRC = . if abs_PRC_count < 12
        In the -rangestat- command replace month by the actual name of your monthly date variable. (If you do not have a Stata internal format monthly date variable, you must create one first.)

        #2 also tells you where to get the -rangestat- command.

        In the future, when asking for help with code, it is best to include some example data. My solution here is based on some assumptions about your data which are likely correct, just because they are fairly typical for this kind of data. But they may not be correct, and in that case, the code will either not run or give incorrect results--in which case we have both wasted our time.

        The best way to show example data is by using the -dataex- command, as was done in #1. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        When asking for help with code, always show example data. When showing example data, always use -dataex-.

        Comment

        Working...
        X