Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Capture entry and exit from a panel dataset

    I have a huge dataset organized like this:

    RECDATE ID
    910 3
    910 4
    910 5
    912 4
    912 5
    912 6
    915 5
    915 7


    RECDATE is sorted, so it has gaps. ID could represent 'items' in a basket of user 01 recorded at each point in time.
    I want to record every movement done by the user. That is, from 910 to 912, item 3 dissapeared and item 6 appeared, so date 912 would have two movements. From 912 to 915, item 4 and 6 dissapeared and item 6 appeared, so date 915 would have 3 movements.
    My goal is to get something like this.

    RECDATE MOVES
    910 --
    912 2
    915 3
    ...

    I'm very unfamiliar with stata, so apologize if this is a silly question, but I couldn't find it on previous posts.

    Thanks!

  • #2
    This is a bit tricky because you want to compare items with items from the previous period but there are multiple observations per period.The intuition is to first note the previous period date (last_date) and then use merge to match current date items with those from the previous period. This requires some variable name gymnastics but it would look something like:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(user RECDATE ID)
    1 910 2
    1 910 3
    1 910 4
    1 910 5
    1 912 4
    1 912 5
    1 912 6
    1 915 5
    1 915 7
    2 912 11
    2 912 22
    2 912 33
    2 915 22
    2 915 33
    end
    
    * assumption about the data
    isid user RECDATE ID, sort
    
    * number of items per RECDATE
    by user RECDATE: gen items = _N
    save "statalist_example.dta", replace
    
    * record date change and it carry forward
    by user: gen long last_date = RECDATE[_n-1] if RECDATE != RECDATE[_n-1]
    by user: replace last_date = last_date[_n-1] if mi(last_date)
    
    * rename variables and merge with orginal data on last_date
    rename RECDATE RECDATE0
    rename last_date RECDATE
    rename items items0
    merge 1:1 user RECDATE ID using "statalist_example.dta", keep(master match) gen(match)
    
    * the number of items matched
    bysort user RECDATE0: egen nmatched = total(match == 3)
    
    * the number of new items
    gen newitems = items0 - nmatched
    
    * the number of items removed
    bysort user RECDATE0 (items): gen lost_items = items[1] - nmatched
    
    * return to original order and variable names
    rename RECDATE RECDATE_prev
    rename RECDATE0 RECDATE
    rename items items_prev
    rename items0 items
    isid user RECDATE ID, sort
    list, sepby(user RECDATE)

    Comment

    Working...
    X