Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data set up with no unique KEY

    Dear all,

    I am trying to set up a panel data and I have to reshape my dataset from wide to long. I have managed to setup panels for many other datasets but this one is different. This is the employment history for each individual. What makes this file different is that the variable ID does not uniquely identify the observations since many individuals can have multiple employment spell in a single year (or at least I think this is the problem).

    see a small example of my dataset for the first 3 waves a, b and c and for few variables:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pidp byte(a_jspno b_jspno c_jspno a_jhstpy b_jhstpy c_jhstpy) float(a_jhgpay b_jhgpay c_jhgpay)
    30615 . . .  .  .  .         .         .         .
    34007 . . .  .  .  .         .         .         .
    34084 . . .  .  .  .         .         .         .
    34084 . . .  .  .  .         .         .         .
    34084 . . .  .  .  .         .         .         .
    34122 . . .  .  .  .         .         .         .
    34691 . . .  .  .  .         .         .         .
    35375 . . 1  .  .  1         .         . 1186.6395
    37411 . . 1  .  .  2         .         .  1365.933
    38771 . 1 .  . -8  .         .        -8         .
    40924 . . .  .  .  .         .         .         .
    40924 . . .  .  .  .         .         .         .
    40924 . . .  .  .  .         .         .         .
    41487 . 1 1  .  1 11         .  948.1022  1301.034
    42171 1 1 1  1  1  2  915.9615 1084.2134 1207.4038
    42847 . . .  .  .  .         .         .         .
    46247 . 1 .  . -8  .         .        -8         .
    46927 . 1 1  .  7  7         .       866       866
    46927 . 1 1  .  7  7         .       866       866
    47619 . . .  .  .  .         .         .         .
    47619 . . .  .  .  .         .         .         .
    49651 . 3 1  .  5  5         .     649.5     649.5
    49651 . 4 2  . -8  5         .        -8     649.5
    49651 . 1 2  .  5  5         .     649.5     649.5
    49651 . 5 2  . -8  5         .        -8     649.5
    49651 . 2 2  . -8  5         .        -8     649.5
    50335 . . .  .  .  .         .         .         .
    50335 . . .  .  .  .         .         .         .
    54419 2 . .  2  .  .  492.4366         .         .
    54419 1 . . -8  .  .        -8         .         .
    54482 . . .  .  .  .         .         .         .
    65287 1 . .  3  .  . 1338.6503         .         .
    65967 . 1 1  .  1  1         . 1124.1346 1065.9331
    66651 . . .  .  .  .         .         .         .
    67335 1 1 . -8 -9  .        -8  777.4554         .
    67335 2 1 .  2 -9  . 1665.3846  777.4554         .
    68007 2 . . 11  .  .     606.2         .         .
    68007 1 . .  3  .  .     562.9         .         .
    68687 . . .  .  .  .         .         .         .
    69375 . 2 1  .  3 -8         . 1118.6073        -8
    69375 . 1 1  . -8 -8         .        -8        -8
    70047 1 . . 11  .  . 1457.2115         .         .
    74811 . . .  .  .  .         .         .         .
    76165 . . .  .  .  .         .         .         .
    76165 . . .  .  .  .         .         .         .
    76165 . . .  .  .  .         .         .         .
    78887 1 1 . 11  7  .  537.2866 508.47095         .
    79571 1 . . 11  .  . 1096.9501         .         .
    80247 . . 1  .  .  5         .         .   789.124
    80931 . . 1  .  . -8         .         .        -8
    80931 . . 1  .  . -8         .         .        -8
    end
    label values a_jhstpy a_jhstpy
    label def a_jhstpy -8 "inapplicable", modify
    label def a_jhstpy 1 "promoted", modify
    label def a_jhstpy 2 "left for better job", modify
    label def a_jhstpy 3 "made redundant", modify
    label def a_jhstpy 11 "other reason", modify
    label values b_jhstpy b_jhstpy
    label def b_jhstpy -9 "missing", modify
    label def b_jhstpy -8 "inapplicable", modify
    label def b_jhstpy 1 "promoted", modify
    label def b_jhstpy 3 "made redundant", modify
    label def b_jhstpy 5 "temporary job ended", modify
    label def b_jhstpy 7 "stopped health reas", modify
    label values c_jhstpy bc_jhstpy
    label def c_jhstpy -8 "inapplicable", modify
    label def c_jhstpy 1 "promoted", modify
    label def c_jhstpy 2 "left for better job", modify
    label def c_jhstpy 5 "temporary job ended", modify
    label def c_jhstpy 7 "stopped health reas", modify
    label def c_jhstpy 11 "other reason", modify
    label values a_jhgpay a_jhgpay
    label def a_jhgpay -8 "inapplicable", modify
    label values b_jhgpay b_jhgpay
    label def b_jhgpay -8 "inapplicable", modify
    label values c_jhgpay c_jhgpay
    label def c_jhgpay -8 "inapplicable", modify

    For example pidp 67335 has 2 employment spell for wave a, or pidp 49651 has 5 employment spell for wave b. I think that this makes impossible for STATA to uniquely identify the unique observation ID(i) wave(j). Do you have any thoughts?

    Thanks

  • #2
    So, what you have is simply not panel data. Or, at least, not in the sense of each pidp being a "panel" that occurs once in each time period (wave). Depending on how you will be working with this data, I see three approaches to "taming" the data and making it suitable for analysis in Stata. The crucial thing is to get it into long (or at least partially long) layout. So the start is:
    Code:
    gen long obs_no = _n
    ds a_*
    local to_reshape `r(varlist)'
    local to_reshape: subinstr local to_reshape "a_" "@", all
    reshape long `to_reshape', i(obs_no) j(_wave) string
    drop if missing(jspno) & missing(jhstpy) & missing(jhgpay)
    replace _wave = substr(_wave, 1, 1)
    encode _wave, gen(wave)
    drop _wave
    At this point, if you want, you can even make this look like real panel data by running -egen long panel_id = group(pidp wave obs_no)-. You would have to remember that these different "panels" are not, however, independent entities as would normally be the case. So using the usual suite of -xt- commands on them is going to get you some wrong results.

    Probably more reasonable approach is to keep the pidp as the panel identifier but break up the waves into pseudo-waves that mark the episode level.
    Code:
    by pidp wave (obs_no), sort: replace obs_no = _n
    egen pseudo_wave = group(wave obs_no), label
    drop obs_no
    Here you can -xtset pidp pseudo_wave- and you can legitimately use the regular -xt- commands. You just have to remember that your wave numbers no longer represent equal (or even approximately equal) spacing in time, so you would use lags and leads and the like with great caution, or not at all. The panel is also then strongly unbalanced as many (most) pidp's will have no observations in a lot of the pseudo-waves.

    A third alternative is to use a half-long half-wide layout in which the wave dimension has been put long, but the repeated observations within wave are spread out wide:
    Code:
    by pidp wave (obs_no), sort: gen seq = _n
    drop obs_no
    reshape wide jspno jhstpy jhgpay, i(pidp wave) j(seq)
    Depending on what you do, this may prove more or less suitable for subsequent analysis than the other options proposed.

    Just to be clear: the three options set out here are mutually exclusive. You can't do one after another. If you do one and then decide you want to do something else with one of the other options, you have to go back to the beginning because each of these code sequences destroys information needed by the others.

    Comment


    • #3
      Dear Clyde, thanks for your message.

      I have decided not to set working history as panel data but to summarise information at individual level and then add these variables to my master dataset (a panel data with a unique pidp and wave).

      Now, I am finding some challenges in building work experience variables for each pidp. Let me discuss the full-time work experience construction (as I can apply the same technique to the part-time experience).

      I have created between1 that shows months of work experience between each different spells of employment and ftexp_aggregate that sums up all the full time work experiences for each individual spells for each pidp. However, what I properly need for my research is an accumulated work experiences variable for each wave such as ftworkexp_1, ftworkexp_2, ..., ftworkexp_28.

      Some individuals change their status during the time of the study and start to accumulate experience from 0 (i.e. wave=10 Start_MY=403 Status=7.Full-time student, wave=11 Start_MY=414 Status=2.Paid employment work).
      Many individuals have work experiences before the first interview, I was thinking to start with a variable that aggregates past work experience prior the first appearance in the sample, then accumulate work experience as the individual progress over each wave, till he/she changes status and eventually disappears from the analysis.

      Each individual has been interviewed in different date IntDate_MY starting in 1991m9 (380), I do have the start and end period of each employment spell however most of the time these cross-over multiple waves and it is hard to capture the right amount of experience accumulated in each each wave.

      See an example of the dataset (I have kept the only observations that have Status as 1.Self-employed, 2.Paid Employment, as these are the only variables that generate work experience):

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long pidp byte Spell int(Start_MY End_MY Status) byte(Start_Flag End_Flag) int(Birth_MY IntDate_MY) byte(Job_Hours Job_Change End_Ind Status_Spells) int Last_IntDate_MY byte Source_Type float(between1 ftexp)
       2727  2 -51  81 2  0  0 -319 393  1  3 1 1 501 3 132 183
       2727  4 363 404 2  0  1 -319 404  1 .m 0 1 501 1  41 183
       2727  7 478 488 2  0  1 -319 488  1  3 0 1 501 1  10 183
       2727  8 488 501 2  1  1 -319 501  2 .m 0 1 501 1  13   .
       3407  2 161 216 2  0  0  -55 392  1  3 1 1 541 3  55 250
       3407  3 216 228 1  0  0  -55 392 .m  4 1 1 541 3  12   .
       3407  5 233 266 2  0  0  -55 392  1  3 1 1 541 3  33 250
       3407  6 266 281 1 12 12  -55 392 .m  4 0 1 541 3  15   .
       3407  7 281 392 1  0  1  -55 392  1 .m 0 1 541 1 111 250
       3407  9 396 411 1  1  0  -55 416 .m .m 1 1 541 1  15   .
       3407 11 413 416 1  0  1  -55 416  1  3 0 1 541 1   3 250
       3407 12 416 456 1  1  1  -55 456  1  3 0 1 541 1  40 250
       3407 14 470 478 2  1  1  -55 478  1 .m 0 1 541 1   8 250
       4091  4  95 107 1  0  0 -175 392 .m  3 1 1 416 3  12   .
       4091  5 107 144 2  0  4 -175 392  1  3 1 1 416 3  37 121
       4091  7 155 324 1  4  4 -175 392 .m  3 1 1 416 3 169   .
       4091  9 332 416 2  0  1 -175 416  1 .m 0 1 416 1  84 121
       4767  2 -54 -48 2  0  0 -247 393  1  3 1 1 405 3   6 311
       4767  4 -36  72 2  0  4 -247 393  2  3 1 1 405 3 108   .
       4767  6  83 120 2  4  4 -247 393  1  3 1 1 405 3  37 311
       4767  8 131 137 1 12 12 -247 393 .m  4 0 1 405 3   6   .
       4767  9 137 405 1  0  1 -247 405  1 .m 0 1 405 1 268 311
       5451  2 -43 -42 2  0  0 -223 392  1  3 1 1 420 3   1 133
       5451  4  81 129 1  6  0 -223 392 .m  3 1 1 420 3  48   .
       5451  5 129 147 2  6  0 -223 392  2  4 1 1 420 3  18   .
       5451  7 153 201 1  6  0 -223 392 .m  3 1 1 420 3  48   .
       5451  8 201 273 2  6  0 -223 392  2  4 1 1 420 3  72   .
       5451  9 273 341 2 12 12 -223 392  1  4 0 1 420 3  68 133
       5451 10 341 405 2  0  1 -223 405  1 .m 0 1 420 1  64 133
       6135  3 355 362 2  0  0  149 392  1  3 1 1 405 3   7  35
       6135  6 366 373 2 12 12  149 392  1  3 1 1 405 3   7  35
       6135  8 380 388 2  1  0  149 392  1  3 1 1 405 1   8  35
       6135 11 392 405 2  1  1  149 405  1  3 0 1 405 1  13  35
       6807  2 246 321 2  6  0   65 392  1  3 1 1 405 3  75 135
       6807  4 345 347 2 12 12   65 392  1  3 0 1 405 3   2 135
       6807  5 347 405 2  0  1   65 405  1 .m 0 1 405 1  58 135
       6882  4 441 453 1  0  1  161 453  1  3 0 1 464 1  12  21
       6882  5 453 455 1  1  0  161 464 .m  3 1 1 464 1   2   .
       6882  6 455 464 2  0  1  161 464  1  3 0 1 464 1   9  21
       7487  4 354 362 2  0  0   89 393  1  3 1 1 432 3   8  67
       7487  5 362 367 1 12 12   89 393 .m  4 0 1 432 3   5   .
       7487  6 367 393 1  0  1   89 393  1 .m 0 1 432 1  26  67
       7487  8 399 432 1  0  1   89 432  1 .m 0 1 432 1  33  67
       8167  2 305 360 2  0 12   89 392  1  3 0 1 432 3  55 100
       8167  3 360 380 2  0  1   89 380  1 .m 0 1 432 1  20 100
       8167  4 380 389 2  1  0   89 392 .m .m 1 1 432 1   9   .
       8167  5 389 392 2  0  1   89 392  1  2 0 1 432 1   3 100
       8167  6 392 404 2  0  0   89 405 .m .m 1 1 432 1  12   .
       8167  7 404 405 2  0  1   89 405  1  2 0 1 432 1   1 100
       8167  8 405 411 2  1  0   89 421  1  3 1 1 432 1   6 100
       8167  9 411 421 2  0  1   89 421  1  3 0 1 432 1  10 100
       8167 10 421 427 2  1  0   89 432 .m .m 1 1 432 1   6   .
       8167 11 427 432 2  0  1   89 432  1  2 0 1 432 1   5 100
       8847  3 367 380 2  0  1  113 380  1 .m 0 1 380 1  13  13
       9527  2 269 273 2  0  0   17 394  2  3 1 1 516 3   4   .
       9527  3 273 305 2  0  0   17 394  1  4 1 1 516 3  32 171
       9527  6 320 356 2 12 12   17 394  1  3 0 1 516 3  36 171
       9527  7 356 409 2  0  0   17 417  1 .m 1 1 516 1  53 171
       9527 10 454 504 2  0  1   17 504  1  3 0 1 516 1  50 171
      12251  2 346 354 2  0  0  137 393  1  3 1 1 537 3   8  10
      12251  5 388 390 2  0  0  137 393  1  3 1 1 537 1   2  10
      12935  5 400 420 2  0  0  161 428  1  3 1 1 482 1  20  27
      12935 10 458 465 2  0  1  161 465  1  3 0 1 482 1   7  27
      13607  3 -97 380 2  4  1 -463 380  2 .m 0 1 380 1 477   .
      14287  3 236 241 2  0  0   17 393  1  3 1 1 489 3   5  29
      14287  4 241 259 1  0  0   17 393 .m  3 1 1 489 3  18   .
      14287  5 259 272 2  0  0   17 393  1  3 1 1 489 3  13  29
      14287  6 272 289 1  0  0   17 393 .m  3 1 1 489 3  17   .
      14287  7 289 293 2  0  0   17 393  1  4 1 1 489 3   4  29
      14287  9 302 309 2  0  0   17 393  1  3 1 1 489 3   7  29
      14287 13 332 358 1  0  0   17 393 .m  3 1 1 489 3  26   .
      14287 15 361 370 2  0  0   17 382  2 .m 1 1 489 1   9   .
      14971  2 245 340 2  0  0   29 393  1  3 1 1 548 3  95  95
      15645  2 174 188 2  0  0  -43 393  1  3 1 1 560 3  14 209
      15645  4 234 311 2  0  0  -43 393  1  3 1 1 560 3  77 209
      15645  8 406 408 2  0  0  -43 416 .m  3 1 1 560 1   2   .
      15645  9 408 416 2  0  0  -43 416 .m  2 1 1 560 1   8   .
      15645 10 416 420 2  1  0  -43 429 .m .m 1 1 560 1   4   .
      15645 11 420 442 2  0  1  -43 442  1  2 0 1 560 1  22 209
      15645 12 442 452 2  1  0  -43 453 .m .m 1 1 560 1  10   .
      15645 13 452 453 2  0  1  -43 453  1  2 0 1 560 1   1 209
      15645 14 453 464 2  1  0  -43 465 .m .m 1 1 560 1  11   .
      15645 15 464 465 2  0  1  -43 465  2  2 0 1 560 1   1   .
      15645 16 465 489 2  1  1  -43 489  1 .m 0 1 560 1  24 209
      15645 17 489 560 2  1  1  -43 560  1  2 0 1 560 1  71 209
      16339  2 393 404 2  1  1  149 404  2  3 0 1 564 1  11   .
      16339  3 404 408 2  1  0  149 418 .m .m 1 1 564 1   4   .
      16339  4 408 418 2  0  1  149 418  1  2 0 1 564 1  10 100
      16339  5 418 427 2  1  0  149 430 .m .m 1 1 564 1   9   .
      16339  6 427 464 2  0  1  149 464  1  2 0 1 564 1  37 100
      16339  8 511 564 2  0  1  149 564  1 .m 0 1 564 1  53 100
      17015  3 485 488 2  1  1  257 488  2  3 0 1 548 1   3   .
      17015  8 524 548 2  0  1  257 548  1  3 0 1 548 1  24  24
      17687  2 306 318 2  0  0  113 395  1  3 1 1 478 3  12  12
      17687  7 446 453 1  0  1  113 453  2  3 0 1 478 1   7   .
      20482  3 454 455 2  0  1  197 455  1  3 0 1 455 1   1   1
      21087  3 267 336 2 12 12   53 393  1  3 0 1 482 3  69 215
      21087  4 336 482 2  0  1   53 482  1 .m 0 1 482 1 146 215
      21767  2  90 105 2  0  0 -175 393  1  3 1 1 504 3  15 297
      21767  4 114 201 2  0  0 -175 393  1  3 1 1 504 3  87 297
      end
      format %tm Start_MY
      format %tm End_MY
      format %tm Birth_MY
      format %tm IntDate_MY
      format %tm Last_IntDate_MY
      label values Status status
      label def status 1 "1. Self-Employed", modify
      label def status 2 "2. Paid Employment", modify
      label values Start_Flag flag
      label values End_Flag flag
      label def flag 0 "0. No imputation", modify
      label def flag 1 "1. Truncated: Seam Spell", modify
      label def flag 4 "4. Truncated: Missing month", modify
      label def flag 6 "6. Imputed: Month in Season", modify
      label def flag 12 "12. Truncated, Life History", modify
      label values Job_Hours job_hours
      label def job_hours 1 "1. Full-Time", modify
      label def job_hours 2 "2. Part-Time", modify
      label def job_hours .m ".m. Missing/Refused/Don't Know", modify
      label values Job_Change job_change
      label values End_Ind activity_ended_indicator
      label def activity_ended_indicator 0 "0. No", modify
      label def activity_ended_indicator 1 "1. Yes", modify
      label values Source_Type source_type
      label def source_type 1 "1. Annual History", modify
      label def source_type 3 "3. Life History", modify

      I would like to ask your opinion on the matter and what do you think is the most suitable way to face the variable construction.

      Thanks

      Comment


      • #4
        I'm sorry, but I don't understand the data. You speak of survey waves, but no variable is called that. Is spell the same as wave? And what do Start_MY and End_MY represent? Are they the start and end of a "wave" or of some employment episode. And whichever they are, where do we find the other? Are you trying to total up months of employment or hours of employment? And how are those things related to betwen1 and ftexp?

        Comment


        • #5
          The survey waves are in my master dataset, firs wave is 1 collected between 1991/09 till 1991/12, then wave 2 from 1992/09 till 1992/12, wave 3 is 1993... till wave 28 (see an example below):
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long pidp int(birthy age) byte(sex gor_dv) float wave
          818448689 1949 41 2 7 1
          547060005 1965 26 2 8 1
          817337569 1961 30 2 7 1
          612729645 1932 59 2 2 1
          408403245 1951 40 1 9 1
          680767725 1942 49 1 2 1
          771807 1948 42 2 6 1
          2848531 1931 60 2 2 1
          3638687 1940 51 1 3 1
          341381085 1938 53 2 2 1
          204832329 1931 60 2 3 1
          748282885 1929 61 2 8 1
          2804327 1963 27 1 5 1
          340125805 1961 30 1 8 1
          479663845 1964 27 1 3 1
          3465967 1944 47 2 3 1
          273748285 1956 35 1 2 1
          545665325 1965 26 2 5 1
          204269969 1952 39 1 6 1
          4367647 1930 61 1 11 1
          205104325 1952 38 1 11 1
          411349005 1960 31 2 11 1
          817542249 1959 32 2 10 1
          614370485 1965 26 1 7 1
          4392127 1936 55 2 11 1
          751134125 1957 34 2 2 1
          71472085 1970 21 2 3 1
          816917325 1950 41 2 3 1
          68429089 1942 49 2 9 1
          3406811 1957 34 2 3 1
          751776049 1962 29 2 7 1
          3121207 1960 31 1 2 1
          816416849 1960 31 2 9 1
          478139285 1956 34 2 10 1
          751827045 1962 29 1 7 1
          816666409 1967 24 2 5 1
          615112365 1970 21 2 7 1
          1054011 1951 40 1 8 1
          206326965 1930 61 2 7 1
          682516689 1952 39 2 11 1
          3406127 1952 38 1 3 1
          748687493 1969 22 1 5 1
          4733487 1945 46 2 11 1
          4082727 1934 57 1 1 1
          205768689 1963 28 2 5 1
          4009287 1963 28 1 2 1
          613596645 1945 46 2 1 1
          136295805 1952 39 1 8 1
          2816567 1933 58 1 5 1
          4316651 1934 56 1 10 1
          end
          label values birthy a_birthy
          label values age ba_age
          label values sex a_sex
          label def a_sex 1 "male", modify
          label def a_sex 2 "female", modify
          label values gor_dv a_gor_dv
          label def a_gor_dv 1 "north east", modify
          label def a_gor_dv 2 "north west", modify
          label def a_gor_dv 3 "yorkshire and the humber", modify
          label def a_gor_dv 5 "west midlands", modify
          label def a_gor_dv 6 "east of england", modify
          label def a_gor_dv 7 "london", modify
          label def a_gor_dv 8 "south east", modify
          label def a_gor_dv 9 "south west", modify
          label def a_gor_dv 10 "wales", modify
          label def a_gor_dv 11 "scotland", modify
          This is were I will add my variables for work experience.

          Spell is the number of different "Status" period for each individual i.e. if individual have spell=4 then he/she has 4 periods of a Status 1.spell student 2.spell paid employment 3.spell paid employment 4.spell retired.
          In this case I have considered only "Status" that generate work experience such as self-employed and Paid Employment and not the other spells that do not generate experience, therefore for the individual taken as a example with spell=4, only spell 2 and 3 are shown because these generate work experience.

          Start_MY and End_MY represent the start and end of an employment episode.

          Then there is the IntDate_MY that tells when an individual was interviewed or when the wave was collected, the individual provided current and past information.

          between1 represents the months of employment for each spell (employment episode) but usually these spells run over multiple waves. I am trying to find months of employments accumulated in each wave.

          For instance:

          Spell = 1 Status = student
          Start_MY = 1957m9 End_My = 1960m5
          Spell 1 does not accumulate experience.

          Spell = 2 Status = paid employment
          Start_MY = 1960m9 End_MY = 2000m09
          Spell 2 accumulates experience.
          between1 = 480

          The interview appears in wave 1 IntDate_MY = 1990m10

          He will start in wave 1 with:

          wave = 1 workexp = 361

          Then

          wave = 2 workexp = 373

          wave = 3 workexp = 385
          …

          wave = 11 workexp = 480
          Last edited by Pio Medolla; 15 Nov 2021, 02:24.

          Comment


          • #6
            Well, this is helpful, but I am still unclear about what you mean by the amount of employment experience collected during each "wave." Apparently you waves run from September through December of each calendar year. But I imagine you don't want to only count employment that occurs during those four months of each year. Presumably you want each "wave" to count the employment experience over some full year period. But which full year period? For example, wave 1 is conducted between September 1991 and December 1991. So you want to count the employment between September 1991 and August 1992? Or any time in calendar year 1991? Or between December 1991 and November 1992? Or calendar year 1992? Or something else?

            Comment


            • #7
              The first two waves are collected between Sept to Dec, then all the others are collected from Jan to December of each calendar year. I am accumulating the employment experience for each calendar year a pidp is followed up.

              I have merged the work history with wave, and I think I have managed to almost obtain what I am looking for if it wasn't that sometimes the same spell is reported multiple times throughout the time a pidp is observed. I want to drop these and leave just one so that I don't overcount the spell when I sum the between1

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input long pidp float(Wave IntDate_MY) int(Status Start_MY End_MY) byte Job_Hours float between1
              597047  3 381 2 283 381 1 98
              597047  4 381 2 283 381 1 98
              597047  5 381 2 283 381 1 98
              597047  6 381 2 283 381 1 98
              597047  7 381 2 283 381 1 98
              597047  8 381 2 283 381 1 98
              597047  9 381 2 283 381 1 98
              597047 10 381 2 283 381 1 98
              597047 11 381 2 283 381 1 98
              597047 12 381 2 283 381 1 98
              597047 14 381 2 283 381 1 98
              end
              format %tm IntDate_MY
              format %tm Start_MY
              format %tm End_MY
              label values Status status
              label def status 2 "2. Paid Employment", modify
              label values Job_Hours job_hours
              label def job_hours 1 "1. Full-Time", modify

              is it something on the line:
              by pidp Wave: gen count = _n if ??
              drop if count > 1

              Comment


              • #8
                Well, what I think you want is to drop repeated references to the same spell of employment. And while I can't tell how you have merged the data, from the example data you show, it seems that the same spell can show up in multiple waves. I assume a spell is uniquely identified by the combination of Status, Start_MY, End_MY, and Job_Hours. So, I think the code would be more like this:

                Code:
                by pidp Status Start_MY End_MY Job_Hours (Wave), sort: keep if _n == 1
                This will keep the observation from the earliest wave in which the spell occurs, and remove all others. If the same spell actually occurs more than once even within the first wave where it shows up in your merged data, it will keep one of those observations only--which one is selected at random and is not reproducible.

                That said, this will "credit" the wave in which a spell begins with the full duration of employment even if that spell extends over many years. That is not what I understood you to want in #1, it seems odd to me. Are you sure this is what you want?

                If you want to "credit" each wave for the number of months that the spell overlaps the calendar year, with any months that precede the person's first wave under observation, then I think it would be this:

                Code:
                isid pidp Wave, sort
                gen wave_year = 1990 + Wave
                by pidp (Wave): gen byte first_wave = (_n == 1)
                
                gen year_start  = ym(wave_year, 1)
                replace year_start = . if first_wave
                gen year_end = ym(wave_year, 12)
                
                gen months_in_year = max(min(End_MY, year_end) - max(Start_MY, year_start), 0)
                And if you want you can then drop observations where months_in_year == 0.

                Now, your example data contains only one episode and it entirely precedes the person's first wave in the survey. So this code cannot be considered thoroughly tested. I believe it is correct, but if it doesn't work properly in your real data, when you post back please be sure to give an example that shows where it goes wrong, and explain, unless it is blatantly obvious, how it gives a result that differs from what you want.
                Last edited by Clyde Schechter; 26 Nov 2021, 11:43.

                Comment


                • #9
                  The same spell cannot occur more than once in the same wave but different spells occur in a single wave. So I think that pidp Wave Spell identify a unique observation, so maybe isid pidp Wave Spell, sort.

                  Now what I haven't mention so far is that there are some calendar years (1992, 2001, 2002, 2009, 2013) in which job history information are collected, these are available for around 70% of the pidps. Other informations are collected every wave. Because I found difficult to "credit" each wave with the exact amount of month of employment I decided to calculate experience only for only these years (an approach copied on a research paper that have used this dataset) and for the 28 wave (aggregate experience) by doing something along the line:

                  Code:
                  ****** gen work experience for 1992 **********
                  drop if IntDate_MY > 395
                  replace End_MY = 395 if End_MY >395
                  gen between1 = End_MY-Start_MY if Status == 1 |Status == 2
                  drop if between1 ==.
                  gen spell_durationfull = between1 if Job_Hours == 1
                  gen spell_durationpart = between1 if Job_Hours == 2
                  by pidp, sort: egen fulltime_exp_2 = sum(spell_durationfull) if End_MY <= 395
                  by pidp, sort: egen parttime_exp_2 = sum(spell_durationpart) if End_MY <= 395
                  keep pidp fulltime*    parttime*
                  save workexp2, replace
                  
                  *********** gen work experience for 2001 **********
                  drop if IntDate_MY <491 & IntDate_MY>503
                  * then pretty much the same...
                  By repeating this for 2002, 2009, 2013 and 2018 I can obtain the experience credited, although I am still working on it and I will need to deal with the experience already accumulated in previous periods for the pidp that bring a spell forward in the next interview.

                  The code that you provided to me seems interestingly let see if I can "finally" obtaining a variable experience for each wave. Let me come back to you once I have made some progress, thank you.
                  In the mean time find attached a large sample of the update dataset with your code:

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input long pidp float Wave byte Spell float IntDate_MY int(Status Start_MY End_MY) byte Job_Hours float(between1 wave_year) byte first_wave float(year_start year_end months_in_year)
                  361130325 10  2 502 2  426  428  2   2 2000 1   . 491   2
                  361130325 12  4 478 2  462  478  1  16 2002 0 504 515   0
                  361130325 13  5 494 2  478  483 .m   5 2003 0 516 527   0
                  361130325 14  6 502 2  483  502  1  19 2004 0 528 539   0
                  361130325 15  7 514 2  502  511 .m   9 2005 0 540 551   0
                  361130325 16  8 514 2  511  514  1   3 2006 0 552 563   0
                  361130325 17  9 527 2  514  522 .m   8 2007 0 564 575   0
                  361130325 18 10 549 2  522  544  1  22 2008 0 576 587   0
                  361130325 22 11 574 2  544  567  1  23 2012 0 624 635   0
                  361130325 23 12 641 2  567  631  1  64 2013 0 636 647   0
                  361130325 25 14 705 2  633  705  1  72 2015 0 660 671  11
                  361130329 12  3 502 2  367  384  1  17 2002 1   . 515  17
                  361130329 14  5 502 2  386  452  1  66 2004 0 528 539   0
                  361130329 15  6 494 2  452  477 .m  25 2005 0 540 551   0
                  361130329 16  7 494 2  477  494  2  17 2006 0 552 563   0
                  361130329 17  8 502 2  494  497 .m   3 2007 0 564 575   0
                  361130329 18  9 527 2  497  519  1  22 2008 0 576 587   0
                  361130329 18 11 527 2  525  527  2   2 2008 0 576 587   0
                  361130329 18 12 550 2  527  541  2  14 2008 0 576 587   0
                  361130329 18 13 550 2  541  548  1   7 2008 0 576 587   0
                  361131005 10  2 502 2  396  444  1  48 2000 1   . 491  48
                  361131005 11  3 502 2  444  456  2  12 2001 0 492 503   0
                  361131005 12  4 494 2  456  494  2  38 2002 0 504 515   0
                  361131005 13  5 502 2  494  502 .m   8 2003 0 516 527   0
                  361131005 15  7 537 2  518  537  2  19 2005 0 540 551   0
                  361131005 16  8 560 2  537  556  1  19 2006 0 552 563   4
                  361131005 17  9 584 2  556  584  2  28 2007 0 564 575  11
                  361131005 18 10 604 2  584  604  2  20 2008 0 576 587   3
                  361131005 20 11 616 1  604  616  2  12 2010 0 600 611   7
                  361131005 21 12 627 2  616  627  2  11 2011 0 612 623   7
                  361131005 22 13 640 2  627  640  1  13 2012 0 624 635   8
                  361141205 11  3 504 2  143  204  1  61 2001 1   . 503  61
                  361141205 14  6 504 2  318  342  1  24 2004 0 528 539   0
                  361141205 20 11 478 2  476  478  2   2 2010 0 600 611   0
                  361141205 21 12 496 2  478  495 .m  17 2011 0 612 623   0
                  361141205 22 13 496 2  495  496  1   1 2012 0 624 635   0
                  361141205 23 14 617 2  496  559  1  63 2013 0 636 647   0
                  361141205 27 16 617 2  560  616  1  56 2017 0 684 695   0
                  361141205 27 17 642 2  616  631  1  15 2017 0 684 695   0
                  361141209 21  3 605 2  593  596  2   3 2011 1   . 623   3
                  361141209 24  6 617 2  606  617  2  11 2014 0 648 659   0
                  361141209 26  7 642 2  617  641  1  24 2016 0 672 683   0
                  361142565 10  2 504 2  353  443  1  90 2000 1   . 491  90
                  361142565 11  3 495 2  443  495  1  52 2001 0 492 503   3
                  361142565 12  4 515 2  495  510  2  15 2002 0 504 515   6
                  361142565 13  5 538 2  510  534  1  24 2003 0 516 527  11
                  361142565 14  6 548 2  534  548  1  14 2004 0 528 539   5
                  361142565 15  7 584 2  548  584  2  36 2005 0 540 551   3
                  361142565 16  8 702 2  584  702  1 118 2006 0 552 563   0
                  361142569 10  2 504 2  336  420  1  84 2000 1   . 491  84
                  361142569 11  3 481 2  420  469  1  49 2001 0 492 503   0
                  361142569 12  4 495 2  469  488  1  19 2002 0 504 515   0
                  361142569 13  5 495 2  488  495  1   7 2003 0 516 527   0
                  361142569 14  6 548 2  495  540  1  45 2004 0 528 539  11
                  361142569 15  7 604 2  540  596  1  56 2005 0 540 551  11
                  361142569 16  8 702 2  596  702  1 106 2006 0 552 563   0
                  361160245 10  2 502 2  365  414  1  49 2000 1   . 491  49
                  361160245 12  4 502 2  416  447  1  31 2002 0 504 515   0
                  361160245 13  5 495 2  447  495  1  48 2003 0 516 527   0
                  361160245 14  6 502 2  495  496  1   1 2004 0 528 539   0
                  361160245 15  7 585 2  496  577  1  81 2005 0 540 551  11
                  361160245 16  8 585 2  577  585  1   8 2006 0 552 563   0
                  361173165 10  2 502 2  407  426  1  19 2000 1   . 491  19
                  361173165 12  4 502 2  439  464  1  25 2002 0 504 515   0
                  361173165 17  9 494 2  486  487  1   1 2007 0 564 575   0
                  361173165 22 13 537 2  524  534  1  10 2012 0 624 635   0
                  361173165 23 14 561 2  534  558  1  24 2013 0 636 647   0
                  361173165 24 15 657 2  558  650  1  92 2014 0 648 659   2
                  361173165 25 16 700 2  650  700  1  50 2015 0 660 671  11
                  361173169 18  3 561 2  548  549 .m   1 2008 1   . 587   1
                  361173169 18  4 561 2  549  561  1  12 2008 0 576 587   0
                  361173169 18  5 585 2  561  574  2  13 2008 0 576 587   0
                  361173169 18  6 585 2  574  585  1  11 2008 0 576 587   9
                  361173173 18  3 586 2  571  586  1  15 2008 1   . 587  15
                  361207165 10  2 501 2   77  361  1 284 2000 1   . 491 284
                  361208525 12  4 539 2  522  534  1  12 2002 1   . 515   0
                  361208525 13  5 574 2  534  567  1  33 2003 0 516 527   0
                  361208525 14  6 605 2  567  604  1  37 2004 0 528 539   0
                  361208525 15  7 617 2  604  609  1   5 2005 0 540 551   0
                  361208525 16  8 703 2  609  703  1  94 2006 0 552 563   0
                  361208529 14  1 562 2  486  562  1  76 2004 1   . 539  53
                  361208529 15  2 574 2  562  573  1  11 2005 0 540 551   0
                  361208529 16  3 574 1  573  574  1   1 2006 0 552 563   0
                  361208529 17  4 617 2  574  617  1  43 2007 0 564 575   1
                  361224165 10  2 502 2 -247 -116  1 131 2000 1   . 491 131
                  361224165 12  4 502 2  -51  307  2 358 2002 0 504 515   0
                  361236405 10  2 502 2   65  425  1 360 2000 1   . 491 360
                  361236409 10  2 502 2   21  129  1 108 2000 1   . 491 108
                  361236409 12  4 502 2  170  180  2  10 2002 0 504 515   0
                  361236409 14  6 502 2  188  220  2  32 2004 0 528 539   0
                  361236409 16  8 502 2  228  237  1   9 2006 0 552 563   0
                  361236409 17  9 494 2  237  494  1 257 2007 0 564 575   0
                  361236409 18 10 502 2  494  496 .m   2 2008 0 576 587   0
                  361236409 20 11 537 2  496  532  1  36 2010 0 600 611   0
                  361236409 22 13 617 2  533  611  2  78 2012 0 624 635   0
                  361238445 10  2 504 2  437  443  1   6 2000 1   . 491   6
                  361238445 11  3 504 2  443  444  2   1 2001 0 492 503   0
                  361238445 12  4 504 2  444  464  1  20 2002 0 504 515   0
                  361238445 13  5 480 2  464  465  1   1 2003 0 516 527   0
                  361238445 14  6 480 2  465  477  1  12 2004 0 528 539   0
                  end
                  format %tm IntDate_MY
                  format %tm Start_MY
                  format %tm End_MY
                  label values Status status
                  label def status 1 "1. Self-Employed", modify
                  label def status 2 "2. Paid Employment", modify
                  label values Job_Hours job_hours
                  label def job_hours 1 "1. Full-Time", modify
                  label def job_hours 2 "2. Part-Time", modify
                  label def job_hours .m ".m. Missing/Refused/Don't Know", modify
                  Last edited by Pio Medolla; 26 Nov 2021, 13:14.

                  Comment


                  • #10
                    So, with the correction of the -isid- command, using your expanded data I see a mistake in the code I showed in #8. The command to generate the months in year variable should be:

                    Code:
                    gen months_in_year = max(min(End_MY, year_end) - max(Start_MY, year_start) + 1, 0)
                    I think this will now appropriately allocate all of a spell across the calendar years.

                    Comment


                    • #11
                      Hi Clyde, I believe I have created the variables experience that I was looking for, this is the code:

                      Code:
                      gen between1 = End_MY-Start_MY if Status == 1 |Status == 2
                      drop if between1 ==.
                      drop if Job_Hours ==.m
                      by pidp Status Start_MY End_MY Job_Hours (Wave), sort: keep if _n == 1
                      
                      isid pidp Wave Spell, sort
                      gen wave_year = 1990 + Wave
                      by pidp (Wave): gen byte first_wave = (_n == 1)
                      
                      gen year_end = ym(wave_year, 12)
                      
                      gen months_in_year = max(min(End_MY, year_end) - Start_MY, 0)
                      
                      gen ft_months_in_year= months_in_year if Job_Hours ==1
                      gen pt_months_in_year= months_in_year if Job_Hours ==2
                      
                      *generate experience var fulltime
                      local nwave 28
                      forvalues  i = 1/`nwave'{
                          by pidp, sort: egen ftexp_`i' = total(ft_months_in_year) if Wave <=`i'
                          }
                          
                      *generate experience var parttime
                      local nwave 28
                      forvalues  i = 1/`nwave'{
                          by pidp, sort: egen ptexp_`i' = total(pt_months_in_year) if Wave <=`i'
                          }
                          
                      keep pidp ftexp* ptexp* Job_Hours
                      by pidp, sort: keep if _n==1
                      reshape long ftexp_ ptexp_, i(pidp) j(wave)
                      by pidp, sort: drop if ftexp ==. & ptexp ==.
                      because I need work experience accumulated from the beginning of the working life till each wave ends I don't need the year start, this will be useful I want to calculate the experience for each single wave.

                      This is how the final variables look like:

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input long pidp byte(wave Job_Hours) float(ftexp_ ptexp_)
                      204453573 12 1   2   0
                      204453573 13 1  32   0
                      204453573 14 1  35   0
                      204453573 15 1  61   0
                      204453573 16 1  64   0
                      204453573 17 1  72   0
                      204453573 18 1  80   0
                      204453573 19 1  80   0
                      204453573 20 1  88   0
                      204453573 21 1 113   0
                      204453573 22 1 113   0
                      204453573 23 1 113   0
                      204453573 24 1 113   0
                      204453573 25 1 138   0
                      204453573 26 1 146   0
                      204453573 27 1 156   0
                      204453573 28 1 200   0
                      204454247 20 1  28   0
                      204454247 21 1  28   0
                      204454247 22 1  28   0
                      204454247 23 1  28  31
                      204454247 24 1  28  31
                      204454247 25 1  28  31
                      204454247 26 1  28  31
                      204454247 27 1  28  31
                      204454247 28 1  28  31
                      204454251 20 1  73  77
                      204454251 21 1  73  77
                      204454251 22 1  73  77
                      204454251 23 1  73  77
                      204454251 24 1  73  77
                      204454251 25 1  73  77
                      204454251 26 1  73  77
                      204454251 27 1  73  77
                      204454251 28 1  73  77
                      204454927 21 1 256  48
                      204454927 22 1 256  48
                      204454927 23 1 256  48
                      204454927 24 1 256  48
                      204454927 25 1 256  48
                      204454927 26 1 256  48
                      204454927 27 1 256  48
                      204454927 28 1 256  48
                      204456319 19 1  77   0
                      204456319 20 1  77   0
                      204456319 21 1  77   0
                      204456319 22 1  77   0
                      204456319 23 1  77   0
                      204456319 24 1  77   0
                      204456319 25 1  77   0
                      204456319 26 1  77   0
                      204456319 27 1  77   0
                      204456319 28 1  77   0
                      204456971 21 1 400   0
                      204456971 22 1 400   0
                      204456971 23 1 400   0
                      204456971 24 1 400   0
                      204456971 25 1 400   0
                      204456971 26 1 400   0
                      204456971 27 1 400   0
                      204456971 28 1 400   0
                      204457647 27 1 212   0
                      204457647 28 1 212   0
                      204457655 23 1  25   0
                      204457655 24 1  25   0
                      204457655 25 1  25   0
                      204457655 26 1  25   0
                      204457655 27 1  25   0
                      204457655 28 1  25   0
                      204457659 27 1   4   0
                      204457659 28 1  53  12
                      204457663 26 1   2   0
                      204457663 27 1   2   0
                      204457663 28 1  21   0
                      end
                      label values Job_Hours job_hours
                      label def job_hours 1 "1. Full-Time", modify
                      label def job_hours 2 "2. Part-Time", modify
                      I believe there is an incongruence with the pidp = 204457659

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input long pidp byte Spell float IntDate_MY int(Status Start_MY End_MY) byte Job_Hours float(between1 wave_year) byte first_wave float(year_end months_in_year Wave)
                      204457659 4 641 2 638 641 1  3 2017 1 695  3 27
                      204457659 5 652 1 641 642 1  1 2018 0 707  1 28
                      204457659 6 687 2 642 656 1 14 2018 0 707 14 28
                      204457659 7 687 2 656 687 1 31 2018 0 707 31 28
                      204457659 8 698 2 687 698 2 11 2018 0 707 11 28
                      end
                      format %tm IntDate_MY
                      format %tm Start_MY
                      format %tm End_MY
                      label values Status status
                      label def status 1 "1. Self-Employed", modify
                      label def status 2 "2. Paid Employment", modify
                      label values Job_Hours job_hours
                      label def job_hours 1 "1. Full-Time", modify
                      label def job_hours 2 "2. Part-Time", modify
                      Last edited by Pio Medolla; 01 Dec 2021, 06:55.

                      Comment


                      • #12
                        What specifically is the incongruence you are concerned about?

                        Comment


                        • #13
                          The panel 204457659 appears only for 2 wave, he/she has 4 months of full time experience in wave 27 and then 53 months of working experience in wave 28. If you notice the panel started to gain these experience reported in wave 28 before 2017 (spell 5 starts in 641, spell 6 starts in 642, spell 7 in 656 and spell 8 in 687) and therefore wave 27 is underestimating the actual work experience.

                          Comment


                          • #14
                            Ah, I see your data layout is more complicated than I had understood. You have employment spells in the same observations as interviews when the dates have nothing to do with each other. This represents the complexity of the fact that you sometimes learn about a spell of employment years after it has actually occurred. I think to get this all right, we have to actually separate the interviews from the employment spells, then expand the employment spells into yearly-observations, and then put the employment spells and the interviews back together agreeing on the year. Finally, that precedes an interview (but follows an earlier interview) gets added up and "credited" to the interview that follows. There also appears to be some data on employment spells that extend after the pidp's last interview--I don't understand how that happens. Those remain as "orphan" observations with a missing value for the wave number. That would look something like this:

                            Code:
                            * Example generated by -dataex-. For more info, type help dataex
                            clear*
                            input long pidp float Wave byte Spell float IntDate_MY int(Status Start_MY End_MY) byte Job_Hours float(between1 wave_year) byte first_wave float(year_start year_end)
                            361130325 10  2 502 2  426  428  2   2 2000 1   . 491
                            361130325 12  4 478 2  462  478  1  16 2002 0 504 515
                            361130325 13  5 494 2  478  483 .m   5 2003 0 516 527
                            361130325 14  6 502 2  483  502  1  19 2004 0 528 539
                            361130325 15  7 514 2  502  511 .m   9 2005 0 540 551
                            361130325 16  8 514 2  511  514  1   3 2006 0 552 563
                            361130325 17  9 527 2  514  522 .m   8 2007 0 564 575
                            361130325 18 10 549 2  522  544  1  22 2008 0 576 587
                            361130325 22 11 574 2  544  567  1  23 2012 0 624 635
                            361130325 23 12 641 2  567  631  1  64 2013 0 636 647
                            361130325 25 14 705 2  633  705  1  72 2015 0 660 671
                            361130329 12  3 502 2  367  384  1  17 2002 1   . 515
                            361130329 14  5 502 2  386  452  1  66 2004 0 528 539
                            361130329 15  6 494 2  452  477 .m  25 2005 0 540 551
                            361130329 16  7 494 2  477  494  2  17 2006 0 552 563
                            361130329 17  8 502 2  494  497 .m   3 2007 0 564 575
                            361130329 18  9 527 2  497  519  1  22 2008 0 576 587
                            361130329 18 11 527 2  525  527  2   2 2008 0 576 587
                            361130329 18 12 550 2  527  541  2  14 2008 0 576 587
                            361130329 18 13 550 2  541  548  1   7 2008 0 576 587
                            361131005 10  2 502 2  396  444  1  48 2000 1   . 491
                            361131005 11  3 502 2  444  456  2  12 2001 0 492 503
                            361131005 12  4 494 2  456  494  2  38 2002 0 504 515
                            361131005 13  5 502 2  494  502 .m   8 2003 0 516 527
                            361131005 15  7 537 2  518  537  2  19 2005 0 540 551
                            361131005 16  8 560 2  537  556  1  19 2006 0 552 563
                            361131005 17  9 584 2  556  584  2  28 2007 0 564 575
                            361131005 18 10 604 2  584  604  2  20 2008 0 576 587
                            361131005 20 11 616 1  604  616  2  12 2010 0 600 611
                            361131005 21 12 627 2  616  627  2  11 2011 0 612 623
                            361131005 22 13 640 2  627  640  1  13 2012 0 624 635
                            361141205 11  3 504 2  143  204  1  61 2001 1   . 503
                            361141205 14  6 504 2  318  342  1  24 2004 0 528 539
                            361141205 20 11 478 2  476  478  2   2 2010 0 600 611
                            361141205 21 12 496 2  478  495 .m  17 2011 0 612 623
                            361141205 22 13 496 2  495  496  1   1 2012 0 624 635
                            361141205 23 14 617 2  496  559  1  63 2013 0 636 647
                            361141205 27 16 617 2  560  616  1  56 2017 0 684 695
                            361141205 27 17 642 2  616  631  1  15 2017 0 684 695
                            361141209 21  3 605 2  593  596  2   3 2011 1   . 623
                            361141209 24  6 617 2  606  617  2  11 2014 0 648 659
                            361141209 26  7 642 2  617  641  1  24 2016 0 672 683
                            361142565 10  2 504 2  353  443  1  90 2000 1   . 491
                            361142565 11  3 495 2  443  495  1  52 2001 0 492 503
                            361142565 12  4 515 2  495  510  2  15 2002 0 504 515
                            361142565 13  5 538 2  510  534  1  24 2003 0 516 527
                            361142565 14  6 548 2  534  548  1  14 2004 0 528 539
                            361142565 15  7 584 2  548  584  2  36 2005 0 540 551
                            361142565 16  8 702 2  584  702  1 118 2006 0 552 563
                            361142569 10  2 504 2  336  420  1  84 2000 1   . 491
                            361142569 11  3 481 2  420  469  1  49 2001 0 492 503
                            361142569 12  4 495 2  469  488  1  19 2002 0 504 515
                            361142569 13  5 495 2  488  495  1   7 2003 0 516 527
                            361142569 14  6 548 2  495  540  1  45 2004 0 528 539
                            361142569 15  7 604 2  540  596  1  56 2005 0 540 551
                            361142569 16  8 702 2  596  702  1 106 2006 0 552 563
                            361160245 10  2 502 2  365  414  1  49 2000 1   . 491
                            361160245 12  4 502 2  416  447  1  31 2002 0 504 515
                            361160245 13  5 495 2  447  495  1  48 2003 0 516 527
                            361160245 14  6 502 2  495  496  1   1 2004 0 528 539
                            361160245 15  7 585 2  496  577  1  81 2005 0 540 551
                            361160245 16  8 585 2  577  585  1   8 2006 0 552 563
                            361173165 10  2 502 2  407  426  1  19 2000 1   . 491
                            361173165 12  4 502 2  439  464  1  25 2002 0 504 515
                            361173165 17  9 494 2  486  487  1   1 2007 0 564 575
                            361173165 22 13 537 2  524  534  1  10 2012 0 624 635
                            361173165 23 14 561 2  534  558  1  24 2013 0 636 647
                            361173165 24 15 657 2  558  650  1  92 2014 0 648 659
                            361173165 25 16 700 2  650  700  1  50 2015 0 660 671
                            361173169 18  3 561 2  548  549 .m   1 2008 1   . 587
                            361173169 18  4 561 2  549  561  1  12 2008 0 576 587
                            361173169 18  5 585 2  561  574  2  13 2008 0 576 587
                            361173169 18  6 585 2  574  585  1  11 2008 0 576 587
                            361173173 18  3 586 2  571  586  1  15 2008 1   . 587
                            361207165 10  2 501 2   77  361  1 284 2000 1   . 491
                            361208525 12  4 539 2  522  534  1  12 2002 1   . 515
                            361208525 13  5 574 2  534  567  1  33 2003 0 516 527
                            361208525 14  6 605 2  567  604  1  37 2004 0 528 539
                            361208525 15  7 617 2  604  609  1   5 2005 0 540 551
                            361208525 16  8 703 2  609  703  1  94 2006 0 552 563
                            361208529 14  1 562 2  486  562  1  76 2004 1   . 539
                            361208529 15  2 574 2  562  573  1  11 2005 0 540 551
                            361208529 16  3 574 1  573  574  1   1 2006 0 552 563
                            361208529 17  4 617 2  574  617  1  43 2007 0 564 575
                            361224165 10  2 502 2 -247 -116  1 131 2000 1   . 491
                            361224165 12  4 502 2  -51  307  2 358 2002 0 504 515
                            361236405 10  2 502 2   65  425  1 360 2000 1   . 491
                            361236409 10  2 502 2   21  129  1 108 2000 1   . 491
                            361236409 12  4 502 2  170  180  2  10 2002 0 504 515
                            361236409 14  6 502 2  188  220  2  32 2004 0 528 539
                            361236409 16  8 502 2  228  237  1   9 2006 0 552 563
                            361236409 17  9 494 2  237  494  1 257 2007 0 564 575
                            361236409 18 10 502 2  494  496 .m   2 2008 0 576 587
                            361236409 20 11 537 2  496  532  1  36 2010 0 600 611
                            361236409 22 13 617 2  533  611  2  78 2012 0 624 635
                            361238445 10  2 504 2  437  443  1   6 2000 1   . 491
                            361238445 11  3 504 2  443  444  2   1 2001 0 492 503
                            361238445 12  4 504 2  444  464  1  20 2002 0 504 515
                            361238445 13  5 480 2  464  465  1   1 2003 0 516 527
                            361238445 14  6 480 2  465  477  1  12 2004 0 528 539
                            end
                            format %tm IntDate_MY
                            format %tm Start_MY
                            format %tm End_MY
                            format %tm year_start
                            format %tm year_end
                            label values Status status
                            label def status 1 "1. Self-Employed", modify
                            label def status 2 "2. Paid Employment", modify
                            label values Job_Hours job_hours
                            label def job_hours 1 "1. Full-Time", modify
                            label def job_hours 2 "2. Part-Time", modify
                            label def job_hours .m ".m. Missing/Refused/Don't Know", modify
                            
                            
                            //  SEPARATE THE DATA INTO INTERVIEWS AND SPELLS OF EMPLOYMENT
                            capture frame drop interviews
                            frame put pidp Wave wave_year, into(interviews)
                            frame interviews {
                                duplicates drop
                                isid pidp wave_year
                            }
                            
                            //  ALLOCATE THE EMPLOYMENT SPELLS OVER CALENDAR YEARS
                            keep if inlist(Status, 1, 2) // ELIMINATE SPELLS NOT SELF EMPLOYED OR PAID EMPLOYMENT
                            gen Start_Y = year(dofm(Start_MY))
                            gen End_Y = year(dofm(End_MY))
                            expand End_Y - Start_Y + 1
                            by pidp Spell, sort: gen year = Start_Y + _n - 1
                            drop Start_Y End_Y
                            gen January = ym(year, 1)
                            gen December = ym(year, 12)
                            format January December %tm
                            gen months_worked = max(min(End_MY, December) - max(Start_MY, January) + 1, 0)
                            
                            //  AGGREGATE UP TO CALENDAR YEAR LEVEL WITHIN PIDP
                            collapse (sum) months_worked, by (pidp year)
                            
                            //  LINK TO THE INTERVIEW DATA
                            frlink 1:1 pidp year, frame(interviews pidp wave_year)
                            frget Wave, from(interviews)
                            drop interviews
                            frame drop interviews
                            
                            //  COMBINE ALL WORK PRIOR AND UP TO EACH WAVE
                            by pidp (year), sort: gen n_interviews = sum(!missing(Wave[_n-1]))
                            collapse (sum) months_worked (max) year (lastnm) Wave, by(pidp n_interviews)
                            Now, there is one clear problem, that I believe cannot be solved in your data. There are some years where it appears a person worked for 13 months. This arises when there are two spells reported and the end month of the first is the same as the start month of the next. In that case, that month gets double counted. Which, in some cases, might be real--perhaps the person worked both jobs at the same time in that transition month. But I suspect that mostly it will not be real. But there is no clear way to decide which employment spell should take credit for that transition month, at least not on the basis of anything I see in the data. A blunt approach would be to simply cap the number of months worked in a calendar year at 12. But that still wouldn't cover a situation where a person works Jan-Mar and Mar-May and gets credited for 6 months of work instead of, presumably, 5.

                            Comment


                            • #15
                              There also appears to be some data on employment spells that extend after the pidp's last interview--I don't understand how that happens.
                              I believe I have checked the dataset and It doesn't seem to be any employment spells that occur after the Last Interview date. IntDate_MY is the interview during a wave and Last_IntDate_MY is the last interview of the whole dataset.
                              Code:
                               . by pidp, sort: gen count = _n if End_MY > IntDate_MY
                              (349,586 missing values generated)
                              
                              . summ count
                              
                                  Variable |        Obs        Mean    Std. Dev.       Min        Max
                              -------------+---------------------------------------------------------
                                     count |          0
                              
                              . by pidp, sort: gen recount = _n if End_MY > Last_IntDate_MY
                              (349,586 missing values generated)
                              
                              . summ recount
                              
                                  Variable |        Obs        Mean    Std. Dev.       Min        Max
                              -------------+---------------------------------------------------------
                                   recount |          0
                              There are 106 full-time student spells that extend after the interview date but this is due to an approximation in the process of extracting the information and anyway these spells do not account for work experience.

                              For the final problem, I think I just need to get on with it and consider this as a limitation of the dataset. The code is finally what I am looking for, thanks to you now I have a much more understanding of not only STATA but of the dataset.

                              Last thing, regarding the frame command, I have researched it (seems quite complicated as well) and understood that I cannot run it on my Stata 14.2. Will this require me to create two different datasets, save, then merge them and then modify the code here:

                              Code:
                               // SEPARATE THE DATA INTO INTERVIEWS AND SPELLS OF EMPLOYMENT
                              capture frame drop interviews
                              frame put pidp Wave wave_year, into(interviews)
                              frame interviews {
                                  duplicates drop
                                   isid pidp wave_year
                              }
                              
                              //  LINK TO THE INTERVIEW DATA
                              frlink 1:1 pidp year, frame(interviews pidp wave_year)
                              frget Wave, from(interviews)
                              drop interviews frame
                              drop interviews
                              Thanks Clyde
                              Last edited by Pio Medolla; 02 Dec 2021, 16:17.

                              Comment

                              Working...
                              X