Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spell data - values between two spells, imputation

    Dear community,

    I have spell data on employment with the corresponding firm id if employed.
    I have calculated the tenure duration over time (tenure_cum) and the total tenure duration in a firm (tenure_max) according to the rule I imposed, which is: consider spell of unemployment (with no firm id) as employment spell if the unemployment duration between 2 employment spells in the same firm ist < 6 months.


    I now have the dataset of this form:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long persnr int begepi long endepi float(spell duration employed same_firm tenure_cum tenure_max)
    1 10768 10769 32  2 1 0  2 43
    1 10856 10868 33 13 0 .  .  .
    1 10869 10924 34 56 0 .  .  .
    1 10932 10957 35 26 1 1 28 43
    1 10958 10972 36 15 1 1 43 43
    end
    format %d begepi
    format %d endepi


    And would like to obtain this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long persnr int begepi long endepi float(spell duration employed same_firm tenure_cum tenure_max)
    1 10768 10769 32  2 1 0  2 43
    1 10856 10868 33 13 0 1  . 43
    1 10869 10924 34 56 0 1  . 43
    1 10932 10957 35 26 1 1 28 43
    1 10958 10972 36 15 1 1 43 43
    end
    format %d begepi
    format %d endepi
    This refers to each firm id, keeping in mind that for the same firm id I can have different tenures for the same person over time.
    Do you have any suggestions on this?

    Thanks in advance.
    Last edited by Inna Petrunyk; 16 Mar 2019, 12:21.

  • #2
    I'm not completely sure I understand, but if I do, what you want is:

    Code:
    by persnr (begepi), sort: gen spell2 = sum(employed != employed[_n-1])
    by persnr spell2 (begepi), sort: gen duration2 = endepi[_N] - begepi[1]
    
    by persnr (begepi), sort: replace tenure_max = tenure_max[_n-1] ///
        if missing(tenure_max) & duration2 <= 183 // 6 MONTHS = 183 DAYS

    Comment


    • #3
      Sorry, let me start from the original dataset that I have. I think the auxiliary variables I created may have confused you.

      This is an example of the dataset in which I have spells with labor market status (status), only for employment spells I have also the corresponding establishment identifier (firmid).

      I want to calculate tenure over time, i.e, for each persnr and firmid over time I want to create a variable with information on total employment duration in a firmid, allowing interruptions < 6 months.
      It is here to consider that a person may be employed in the same firmid at differnt points of time with employment relationship in other firms in between. This implies that for a person having been employed in a certain firm I can have different tenures, depending at what time point I observe the person. So the variable tenure will be constant for person-firmid if the interruption of employment in that firm is short, and then there may another value for tenure for the same person-firmid several years later, for example.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long(persnr firmid) int begepi long endepi byte status float(spell duration)
      337837 50249512  5479  5843  4 20  365
      337837 50249512  5844  6209  4 21  366
      337837      503  6210  6574  4 22  365
      337837      503  6575  6939  4 23  365
      337837      503  6940  7304  4 24  365
      337837 50249512  7305  7452  4 25  148
      337837        .  7453  7670  7 26  218
      337837 50249512  7881  8035  4 27  155
      337837 50249512  8036  8127  4 28   92
      337837        .  8128  8400  7 29  273
      337837        .  8401  8486 10 30   86
      337837 50249512  8487  8765  4 31  279
      337837      503  8766  9131  4 32  366
      337837      503  9132  9496  4 33  365
      337837 50249512  9497  9861  4 34  365
      337837 50249512  9862 10047  4 35  186
      337837        . 10133 10147 19 36   15
      337837 50249512 10227 10297  4 37   71
      337837        . 10303 10419 19 38  117
      337837        . 10420 10422 19 39    3
      337837        . 10423 10471 19 40   49
      337837 50249512 10472 10513  4 41   42
      337837 50249512 10514 10552  4 42   39
      337837        . 10553 10592 10 43   40
      337837        . 10593 10762 10 44  170
      337837        . 10839 10957 10 45  119
      337837        . 10958 11314 10 46  357
      337837        . 11315 11319 10 47    5
      345476 50250994  5479  5843  4  9  365
      345476 50250994  5844  6008  4 10  165
      345476        .  6009  7302  1 11 1294
      345476        .  7303  8763  1 12 1461
      345476 50509895  9315  9496  4 13  182
      345476 50509895  9497  9695  4 14  199
      345476        .  9696  9703 20 15    8
      345476        .  9704  9852 20 16  149
      345476        .  9853  9861 20 17    9
      345476 50509895  9862 10176  4 18  315
      437068 50484498  5479  5843  4  4  365
      437068 50484498  5844  6209  4  5  366
      437068 50380719  6210  6574  4  6  365
      437068 50380719  6575  6939  4  7  365
      437068 50484498  6940  7304  4  8  365
      437068 50484498  7305  7357  4  9   53
      437068 50303829  7668  7670  4 10    3
      437068      503  7671  8035  4 11  365
      437068      503  8036  8083  4 12   48
      437068 50487075  8084  8200  4 13  117
      437068 50268647  8401  8588  4 14  188
      437068 50487075  8589  8678  4 15   90
      437068 50469889  8681  8696  4 16   16
      437068 50433174  8749  8750  4 17    2
      437068        .  8760  8765 10 18    6
      437068        .  8766  8993 10 19  228
      437068 50298965  8994  9025  4 20   32
      437068 50298965  9132  9496  4 21  365
      437068 50298965  9497  9861  4 22  365
      437068 50488955  9912 10091  4 23  180
      437068        . 10110 10113 10 24  104
      437068        . 10114 10124 10 25   13
      437068        . 10128 10134 10 26   30
      437068 50488955 10154 10330  4 27   74
      437068        . 10436 10492 10 28   57
      437068        . 10493 10592 10 29  100
      437068        . 10593 10702 10 30  110
      437068 50488955 10703 10721  4 31   19
      437068 50380719 10768 10769  4 32    2
      437068        . 10856 10868 10 33   13
      437068        . 10869 10924 10 34   56
      437068 50380719 10932 10957  4 35   26
      437068 50380719 10958 10972  4 36   15
      437068 50247116 11220 11322  4 37  103
      437068 50247116 11323 11687  4 38  365
      437068 50247116 11688 11763  4 39   76
      437068 50452297 12139 12293  4 40  155
      437068        . 12327 12391  7 41   65
      437068 50452297 12618 12671  4 42   54
      437068        . 12672 12783  7 43  112
      437068        . 12991 13112  7 44  122
      end
      format %d begepi
      format %d endepi
      label values firmid betnr_de
      label values status status
      label def status 1 "1 school", modify
      label def status 4 "4 employed", modify
      label def status 7 "7 unemployed", modify
      label def status 10 "7 unemployed", modify
      label def status 19 "19 ill", modify
      label def status 20 "20 maternity", modify

      With the following code I calculated tenure durations and finally date of notice period (you're fired).

      Code:
      sort persnr firmid spell
      
      keep if firmid < .
      
      gen        tenure = duration
      replace    tenure = tenure + tenure[_n-1] if (persnr == persnr[_n-1] & ///
                           firmid == firmid[_n-1] & begepi < endepi[_n-1] + 182)
      
      gen tenure_d = 0
      replace tenure_d = 1 if (persnr == persnr[_n-1] & firmid == firmid[_n-1] & ///
                                begepi < endepi[_n-1] + 182)
      
      
      gen tenure_fine = tenure if (tenure_d[_n+1] == 0)
      gsort persnr firmid -begepi
      replace tenure_fine = tenure_fine[_n-1] if (persnr == persnr[_n-1]) & ///
                             (tenure_d[_n-1] == 1)
      
      sort persnr spell
      save data_small, replace
      
      
      use data, clear
      
      sort persnr spell
      merge 1:1 persnr firmid begepi using data_small
      drop _merge
      sort persnr spell
      
      save data_final, replace
      
      
      gen tenure_y = tenure_fine/365.25
      gen tenure_years = round(tenure_y, .001)
      drop tenure_y
      
      gen        notice_period_years = .
      replace notice_period_years = 0.039 if (tenure_years <= 0.5)
      replace notice_period_years = 0.077 if (tenure_years >= 0.583    &    tenure_years < 2)
      replace notice_period_years = 0.083 if (tenure_years >= 2        &    tenure_years < 5)
      replace notice_period_years = 0.167 if (tenure_years >= 5        &    tenure_years < 8)
      replace notice_period_years = 0.250 if (tenure_years >= 8        &    tenure_years < 10)
      replace notice_period_years = 0.333 if (tenure_years >= 10        &    tenure_years < 12)
      replace notice_period_years = 0.417 if (tenure_years >= 12        &    tenure_years < 15)
      replace notice_period_years = 0.500 if (tenure_years >= 15        &    tenure_years < 20)
      replace notice_period_years = 0.583 if (tenure_years >= 20        &    tenure_years <  .)
      
      gen notice_period_days = notice_period_years*365.25
      
      
      gen notice_period_date= endepi - notice_period_days + 1 if (tenure == tenure_fine)
      format notice_period_date %td
      gsort persnr firmid -begepi
      replace notice_period_date = notice_period_date[_n-1] if (persnr == persnr[_n-1]) & ///
                             (tenure_d[_n-1] == 1)
      
      sort persnr spell
      
      save data_notice, replace
      Now I want to paste this notice date (or tenure, the code will be the same), which is a constant within an employment relationship (including periods of interruptions < 6 months), to all the observations with missings (e.g. due to short-term illness) that refer to that employment relationship.
      Last edited by Inna Petrunyk; 17 Mar 2019, 07:56.

      Comment


      • #4
        I'm sorry, but I don't think I understand. If I am understanding you correctly, there aren't any such short interruptions in your example data. If there are, can you point them out? If not, can you provide an example where there are some, and say where they are in the example?

        As an aside, looking at the code you've used, you can save yourself a lot of keystrokes (and probably a lot of errors and some time) by making effective use of Stata's -by:- prefix. For example:

        Code:
        replace    tenure = tenure + tenure[_n-1] if (persnr == persnr[_n-1] & ///
                             firmid == firmid[_n-1] & begepi < endepi[_n-1] + 182)
        can be much more efficiently coded as:

        Code:
        by persnr firmid, sort: replace tenure = tenure + tenure[_n-1] if begepi < endepi[_n-1]  + 182
        If you are not familiar with -by:- you are missing out on one of Stata's best and most important features. -help by-

        Comment


        • #5
          Thanks. You're right, I can also use the prefix by.

          Here is an example with employment spell interruptions due to illness (status=19):

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long(persnr firmid) int begepi long endepi byte status float(spell duration tenure tenure_d tenure_fine)
          337837 50249512  9497  9861  4 34 365 365 0 703
          337837 50249512  9862 10047  4 35 186 551 1 703
          337837        . 10133 10147 19 36  15   . .   .
          337837 50249512 10227 10297  4 37  71 622 1 703
          337837        . 10303 10419 19 38 117   . .   .
          337837        . 10420 10422 19 39   3   . .   .
          337837        . 10423 10471 19 40  49   . .   .
          337837 50249512 10472 10513  4 41  42 664 1 703
          337837 50249512 10514 10552  4 42  39 703 1 703
          end
          format %d begepi
          format %d endepi
          label values firmid betnr_de
          label values status status
          Then, if I want to include these short interruptions between employment spells in the same firm in the calculation of tenure duration, I suppose I should change the corresponding part of the code from above as follows:

          Code:
          * gen tenure = duration
            by persnr firmid, sort : replace tenure = tenure + tenure[_n-1] + begepi-endepi[_n-1]-1 ///
                                               if (begepi < endepi[_n-1] + 182))
          Last edited by Inna Petrunyk; 17 Mar 2019, 13:47.

          Comment


          • #6
            I don't think your code in the last box will work, because in the observations you want to extend the tenure into, tenure starts out as missing, so it will remain missing. Also, if I understood you correctly, you do not want to increase the tenure by the amount of the (brief) absence--you want to keep it where it was previously, am I right?

            If I have that right, then this code should do it:

            Code:
            gen long last_employer = firmid, after(firmid)
            by persnr (begepi), sort: replace last_employer = last_employer[_n-1] ///
                if missing(last_employer)
            gen low = begepi - 182
            gen high = begepi - 1
            format low high %td
            rangestat (max) last_tenure = tenure, by(persnr last_employer) interval(endepi low high)
            replace tenure = last_tenure if missing(tenure)
            drop low high last_employer
            -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.

            Comment


            • #7
              Your're right, the box works only with non-missing values in tenure, which is the case if I first keep if betnr < . and then merge with the previously saved dta.
              Anyway, your code is perfect if brief interruptions should not be included in the tenure. In fact, I looked for a solution for both cases.
              Thanks a lot for your help, also for showing how rangestat works.

              Could you, please, explain me why I get different results when using this code:

              Code:
              gen        tenure = duration
              replace    tenure = tenure + tenure[_n-1] if (persnr == persnr[_n-1] &  ///
                                   firmid == firmid[_n-1] &  begepi < endepi[_n-1] + 182)

              and the one you suggested:

              Code:
              gen        tenure = duration
              by persnr firmid, sort: replace  tenure = tenure + tenure[_n-1] if ///
                                      begepi < endepi[_n-1] + 182
              starting from this dataset with no missings:


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input long(persnr firmid) int begepi long endepi byte status float(spell duration)
              337837      503  6210  6574 4 22 365
              337837      503  6575  6939 4 23 365
              337837      503  6940  7304 4 24 365
              337837      503  8766  9131 4 32 366
              337837      503  9132  9496 4 33 365
              337837 50249512  5479  5843 4 20 365
              337837 50249512  5844  6209 4 21 366
              337837 50249512  7305  7452 4 25 148
              337837 50249512  7881  8035 4 27 155
              337837 50249512  8036  8127 4 28  92
              337837 50249512  8487  8765 4 31 279
              337837 50249512  9497  9861 4 34 365
              337837 50249512  9862 10047 4 35 186
              337837 50249512 10227 10297 4 37  71
              337837 50249512 10472 10513 4 41  42
              337837 50249512 10514 10552 4 42  39
              345476 50250994  5479  5843 4  9 365
              345476 50250994  5844  6008 4 10 165
              345476 50509895  9315  9496 4 13 182
              345476 50509895  9497  9695 4 14 199
              345476 50509895  9862 10176 4 18 315
              437068      503  7671  8035 4 11 365
              437068      503  8036  8083 4 12  48
              437068 50247116 11220 11322 4 37 103
              437068 50247116 11323 11687 4 38 365
              437068 50247116 11688 11763 4 39  76
              437068 50268647  8401  8588 4 14 188
              437068 50298965  8994  9025 4 20  32
              437068 50298965  9132  9496 4 21 365
              437068 50298965  9497  9861 4 22 365
              437068 50303829  7668  7670 4 10   3
              437068 50380719  6210  6574 4  6 365
              437068 50380719  6575  6939 4  7 365
              437068 50380719 10768 10769 4 32   2
              437068 50380719 10932 10957 4 35  26
              437068 50380719 10958 10972 4 36  15
              437068 50433174  8749  8750 4 17   2
              437068 50452297 12139 12293 4 40 155
              437068 50452297 12618 12671 4 42  54
              437068 50469889  8681  8696 4 16  16
              437068 50484498  5479  5843 4  4 365
              437068 50484498  5844  6209 4  5 366
              437068 50484498  6940  7304 4  8 365
              437068 50484498  7305  7357 4  9  53
              437068 50487075  8084  8200 4 13 117
              437068 50487075  8589  8678 4 15  90
              437068 50488955  9912 10091 4 23 180
              437068 50488955 10154 10330 4 27  74
              437068 50488955 10703 10721 4 31  19
              end
              format %d begepi
              format %d endepi
              label values firmid betnr_de
              label values status status
              label def status 4 "4 employed", modify
              Last edited by Inna Petrunyk; 18 Mar 2019, 04:31.

              Comment


              • #8
                There is a difference between them, and both of them are actually wrong in two ways! Sorry about what I wrote in #4--too quick and I didn't test it!

                The difference between them arises because the version with -by- sorts the data before it runs, and because persnr and firmid do not uniquely identify observations, the sort order is indeterminate, and following Stata's general rule, it randomizes the order of observations within combinations of persnr and firmid. So the order of the data is different when these codes run, which implies that varname[_n-1] has different values. In order for either of these codes to be correct, it is important that the sort order also account for the begepi variable.

                The other problem is that the reference to varname[_n-1] will always be missing value when _n == 1. Not only does that get things wrong in the first observation, but the code will then continue to propagate missing values into subsequent observations until the -if- condition finally fails due to a long episode.

                So the correct ways to do this are:

                Code:
                sort persnr firmid begepi
                gen        tenure = duration
                replace    tenure = tenure + tenure[_n-1] if (persnr == persnr[_n-1] &  ///
                                     firmid == firmid[_n-1] &  begepi < endepi[_n-1] + 182 & _n > 1)
                
                gen        tenure2 = duration
                by persnr firmid (begepi), sort: replace  tenure2 = tenure2 + tenure2[_n-1] if ///
                                        begepi < endepi[_n-1] + 182    & _n > 1
                                        
                assert tenure == tenure2

                Comment


                • #9
                  Ok, now I see. I actually have done sort by persnr firmid spell as in #3, that's why I got correct results.
                  I will add the option _n > 1 to my code. Thanks a lot.

                  Comment

                  Working...
                  X