Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleting all years of an individual in panel data when they do not meet a certain condition in any one of the years.

    Hello,

    I'm new to Stata and have to admit that I am struggling with its programming capabilities. The image attached displays a chunk of my dataset. I have employment records for individuals (pid) by year (syear). The variable pgjobch monitors whether these individuals have switched jobs or not.

    What I would like to do is that if for any pid the pgjobch variable does not contain "first job" (code 5), ALL records for countjobs2 and tenuresum2 are replaced with missing.

    My problem is that I don't know how to check for all syear and then set all records to missing

    I have tried this

    bysort pgjobch (pid): gen keep = 1 if pgjobch[_N] == 5

    But that only sets the last variable to 1, not all of them.
    Before, the dataset unsorted After, sorted, but only with 1 value entered

  • #2
    So you want to keep only people who have pgjobch==5 for at least one time period, or only people who have pgjobch==5 for every time period?

    Here is the code for the first option.
    Code:
    * mark records with relevant criteria
    gen firstjob=0
    replace firstjob = 1 if  pgjobch==5
    
    * for each person, mark every observation with 1 if any firstjob==1, i.e. max()
    bysort pid : egen anyfirstjob = max(firstjob)
    
    * replace other variables for missing
    replace countjobs2=. if anyfirstjob==0
    replace  tenuresum2=. if anyfirstjob==0
    If you actually want the second option, just change the criteria for the initial 0,1 marker.

    Mike

    Comment


    • #3
      Thanks Mike! I did not think about using max.

      Comment

      Working...
      X