Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying first occurence in unbalanced panel data set

    Hi everyone

    I have a problem concerning identifying the first time a respondent occurs in my data. The reason being that I only wish to continue with the first occurrence for each individual in my further analysis.

    For the purpose I have a year variable, an identifier and a happening

    An individual can figure multiple times in the same year. The identifier is just a unique number. Lastly, the happening is a binary variable [0;1], but I’m only interested in the respondents that have 1 in this case.

    I haven't quite been able to find something that does the trick in past threads.

    Here's an example of how my data looks.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(year identifier occurence)
    2011 111 1
    2012 111 1
    2013 111 1
    2010 211 1
    2015 211 1
    2009 311 1
    2011 311 1
    2011 311 1
    2014 311 1
    2009 411 1
    2010 411 1
    2011 411 1
    2012 411 1
    2013 411 1
    2015 511 1
    2016 511 1
    2001 611 1
    2011 611 1
    2000 711 1
    2001 811 1
    2001 811 1
    2008 811 1
    2011 911 1
    2012 911 1
    2013 911 1
    I've tried
    Code:
     bysort identifier year: gen counter=_N
    but that just gives me a variables containing only 1's.

    What I'm looking for is a counting variable that tells me the nth time this identifyer is occuring. From there I could go on and only keep if counter==1

    The end result could look like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(year identifyer occurence counter)
    2011 111 1 1
    2012 111 1 2
    2013 111 1 3
    2010 211 1 1
    2015 211 1 2
    2009 311 1 1
    2011 311 1 2
    2011 311 1 3
    2014 311 1 4
    2009 411 1 1
    2010 411 1 2
    2011 411 1 3
    2012 411 1 4
    2013 411 1 5
    2015 511 1 1
    2016 511 1 2
    2001 611 1 1
    2011 611 1 2
    2000 711 1 1
    2001 811 1 1
    2001 811 1 2
    2008 811 1 3
    2011 911 1 1
    2012 911 1 2
    2013 911 1 3
    end
    So, in short: How do I construct a "countingvariable" that tells me what time it is that the specific person/identifier occurs? It is importance to note here, that the "first" time is thought of as chronological, so the earliest observation you might say. That's why I have tried to sort by year.


    Kind regards
    Mads

  • #2
    Mads:
    I would try:
    Code:
    bysort identifier (year) : gen flag=sum( occurence)
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      You don't say what you want to happen when occurrence is 0 not 1. If Carlo's code is fine, that's fine; otherwise it may be that you want

      Code:
       bysort identifier (year) : gen flag= cond(occurrence == 0, 0, sum(occurrence))
      or

      Code:
       bysort identifier (year) : gen flag= (occurrence == 0) * sum(occurrence))
      either of which maps 0s to 0 but counts the 1s.

      NB This is an FAQ https://www.stata.com/support/faqs/d...t-occurrences/

      Comment


      • #4
        Hi Nick

        Thank you for you answer.

        I'm unsure about what you syntax' provides, and what the individual parts does. In any case, they dont give me the same answer.


        You could say that my data is devided between ppl. who had an occurencce happening to them, and ppl. that didn't. The ppl. that did can appear more than once. I want a variable, that tells me which of the times are the (chronologically) first.

        When occurencce is 0, and not 1, I dont really want anything to happen - Im not interested in them. In the end I will compare the ones who didn't have an occurence with ppl. that did have an occurence, but only their first time.

        I have a feeling that the codes you provided are close, but when inspecting the results, it seems that it sometimes restart the count for the same person.

        Comment


        • #5
          Sorry, there are two typos in my code. This should make it clearer:


          Code:
          clear
          input float(year identifier occurrence)
          2010 211 1
          2011 211 0 
          2012 211 0 
          2013 211 0 
          2014 211 0 
          2015 211 1
          end 
          
          bysort identifier (year) : gen flag1 = cond(occurrence == 0, 0, sum(occurrence))
          
          bysort identifier (year) : gen flag2 = (occurrence == 1) * sum(occurrence)
          
          list, sep(0) 
          
               +--------------------------------------------+
               | year   identi~r   occurr~e   flag1   flag2 |
               |--------------------------------------------|
            1. | 2010        211          1       1       1 |
            2. | 2011        211          0       0       0 |
            3. | 2012        211          0       0       0 |
            4. | 2013        211          0       0       0 |
            5. | 2014        211          0       0       0 |
            6. | 2015        211          1       2       2 |
               +--------------------------------------------+

          Comment


          • #6
            Hi Nick

            They now provide me with the same answers, and I feel confident that I have what I'm looking for.
            Thank you!

            Kind regards
            Mads

            Comment

            Working...
            X