Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen conditional based on summing values with criteria relative to each row

    I'm probably not using the correct language to describe what I want, so apologies for that. Here is what I am trying to do:

    I have students across terms and a variable giving how many credits they earned per term. I would like, for each term, to sum the credits for all prior terms, to give the credits earned at the beginning of the term. I can think of a way to do this with egen and total in theory, but I get stuck because the conditional that I need to sum only those credits for those terms that are less than the current term is beyond what I can remember seeing in these kinds of conditionals before, my searching online isn't quite returning what I am looking for, and I can't figure out how to implement this condition in Stata. Here is an example of what I want:
    student_id term credsbyterm priorcreds
    1 1 10 0
    1 2 6 10
    1 3 7 16
    2 2 12 0
    2 3 16 12
    2 5 15 28
    The priorcreds variable is what I want to generate. Essentially I want to sum all the credsbyterm for the same student_id as long as the term number is lower than the current term number (or return a zero if there are no prior terms), but it is that last conditional that I am struggling with. I'm hoping someone has a suggestion? Thanks in advance for any help!

  • #2
    Thanks for the clear explanation and data example. I've recast the latter here to make it slightly easier to use.

    egen doesn't spring to mind here as a solution for precisely the reasons you mention. There is a tsegen on SSC but (despite being an author) I tend not to use it much given later commands.

    Here are two ways to do it. The first is so simple once understood -- you just want the cumulative sum of previous values, and sum() is a built-in function for cumulative sums -- that the second may seem pointless. But the second requiring rangestat from SSC is more easily extended to related but more challenging problems.

    Code:
    clear 
    input student_id    term    credsbyterm    priorcreds
    1    1    10    0
    1    2    6    10
    1    3    7    16
    2    2    12    0
    2    3    16    12
    2    5    15    28
    end 
    
    bysort student_id (term) : gen wanted = sum(credsbyterm[_n-1])
    
    rangestat (sum) WANTED=credsbyterm, int(term . -1) by(student_id)
    replace WANTED = 0 if WANTED == . 
    
    list, sepby(student_id)
    
         +---------------------------------------------------------+
         | studen~d   term   credsb~m   priorc~s   wanted   WANTED |
         |---------------------------------------------------------|
      1. |        1      1         10          0        0        0 |
      2. |        1      2          6         10       10       10 |
      3. |        1      3          7         16       16       16 |
         |---------------------------------------------------------|
      4. |        2      2         12          0        0        0 |
      5. |        2      3         16         12       12       12 |
      6. |        2      5         15         28       28       28 |
         +---------------------------------------------------------+

    Comment

    Working...
    X