Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying observations within a panel when it is required to relate them to other observations within the panel

    I am working with a panel dataset containing spells of events grouped within individuals. I want to identify specific observations (spells) within each panel that meet a condition which implies relating the value of these observations in a specific variable with the value of previous or posterior observations in that same variable. Basically, what I want to tell Stata is something like "identify the first observation with the value X=1 after a previous observation that has the value X=2". The key is how to tell Stata to identify that previous observation. Important: the observations within the panel that should be related are not necessarily consecutive nor are always in the same position within the spell, so the problem cannot be simply solved (to my understanding) by resorting to the [n] and [n+1] variables.

    I hope that the next example clarifies my doubt. The panel variable that groups observations is a person ID. Within each person, I have chronologically ordered spells of events. I am working with intimate partner violence (IPV) criminals/aggressors and the events are the different crimes, periods of prison, periods under supervision, etc. that they had during a certain time window. For each aggressor there is an IPV selection crime. That is, the crime that was used to select this aggressor into the sample of analysis. During the whole observation period, this aggressor may have committed more IPV crimes after the selection crime. I want to identify the first IPV crime that occurs after the selection crime. How can this be done easily or efficiently in Stata? I remember that the next IPV crime may not be the event immediately after the selection crime and that there can be other spells of events between them (prison periods, etc.). Below I show a sample dataset to illustrate what I mean:

    Code:
        clear
        input  float(aggressor_id event_date) str20 event_type
        1 16071 "selection crime"
        1 16095 "other crime IPV"
        1 16121 "Prison entry"
        1 16450 "Prison exit"
        1 16055 "End observation period"
        2 19573 "Other crime no IPV"
        2 19590 "Prison entry"
        2 20100 "Prison exit"
        2 20300 "Selection crime"
        2 20350 "Supervision period entry"
        2 20450 "Supervision period exit"
        2 20520 "Other crime IPV"
        end
        format event_date %td
        list aggressor_id event_date event_type

    In this example, the selection crime of aggressor 1 (aggressor_id=1) happens on 1 January 2004 and the "other crime IPV" happens on 25 January 2004. For this aggressor, the two events that I need to consider are consecutive. However, for aggressor 2 the selection crime occurs on 31 July 2015 and the first other crime IPV occurs on 7 March 2016. There are other event spells in between. My question/doubt is, then, how can I tell Stata to flag the first IPV crime after the selection crime when for different aggressors they are not necessarily consecutive and when the rank order of the selection and the other IPV crime also varies across aggressors?

    Thank you very much for your attention.

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(aggressor_id event_date) str20 event_type
    1 16071 "selection crime"    
    1 16095 "other crime IPV"    
    1 16121 "Prison entry"        
    1 16450 "Prison exit"        
    1 16055 "End observation peri"
    2 19573 "Other crime no IPV"  
    2 19590 "Prison entry"        
    2 20100 "Prison exit"        
    2 20300 "Selection crime"    
    2 20350 "Supervision period e"
    2 20450 "Supervision period e"
    2 20520 "Other crime IPV"    
    end
    format %td event_date
    
    bys aggressor (event_date): gen select= sum(ustrregexm(lower(event_type), "select"))>0
    by aggressor: gen wanted= sum(sum(ustrregexm(trim(itrim(lower(event_type))), "other crime ipv") & select))==1
    Res.:

    Code:
    . l, sepby(aggressor)
    
         +---------------------------------------------------------------+
         | aggres~d   event_d~e             event_type   select   wanted |
         |---------------------------------------------------------------|
      1. |        1   16dec2003   End observation peri        0        0 |
      2. |        1   01jan2004        selection crime        1        0 |
      3. |        1   25jan2004        other crime IPV        1        1 |
      4. |        1   20feb2004           Prison entry        1        0 |
      5. |        1   14jan2005            Prison exit        1        0 |
         |---------------------------------------------------------------|
      6. |        2   03aug2013     Other crime no IPV        0        0 |
      7. |        2   20aug2013           Prison entry        0        0 |
      8. |        2   12jan2015            Prison exit        0        0 |
      9. |        2   31jul2015        Selection crime        1        0 |
     10. |        2   19sep2015   Supervision period e        1        0 |
     11. |        2   28dec2015   Supervision period e        1        0 |
     12. |        2   07mar2016        Other crime IPV        1        1 |
         +---------------------------------------------------------------+
    Last edited by Andrew Musau; 28 Jul 2023, 07:00.

    Comment


    • #3
      Thank you very much, Andrew Musau.

      I am unfamiliar with the -ustrregexm- subexpression and with the 'trim' and 'itrim' options. I checked the Stata help file for this expression but I am afraid I did not understand very well, although I see it does what I want to do. Could you please explain to me what is it doing? Thanks

      Comment


      • #4
        The functions -trim()- and -itrim()- relate respectively to

        Code:
        help strtrim()
        and

        Code:
        help stritrim()
        and are used to eliminate blank spaces. -ustrregexm()- on the other hand is a string function and performs a match of a regular expression, evaluating to 1 if a match is found and 0 otherwise. Here, the expression is exactly the keywords that you specified.

        I want to identify the first IPV crime that occurs after the selection crime.
        However, the main technique in the code is explained in the following Stata FAQ on first and last occurrences: https://www.stata.com/support/faqs/d...t-occurrences/.

        Comment

        Working...
        X