Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Team composition - account for employee having multiple positions (aka double counting)

    Hello everyone,

    I had a question regarding my dataset cleaning which I hope you would be able to help me out
    Disclaimer: I am using Stata/MP version 15.1

    I am calculating team diversity using panel data; looking at diversity at year 0-year x and so calculating team diversity of the team that was present during a specific time period to account for changes in the composition.
    However, sometimes I encounter the following issues:

    1) Same person (based on employee number = DMUci ) has multiple positions within the team (eg. Manager and Administrator), sometimes with a different appointment date. However, I do not want to count them double.
    So, I would like to only count them once, as of the first time they joined the team.

    2) The same person left the team and then came back (based on AppointmentDates and ResignationDates), I only want to count them for the time period they were in the team. (It could even be that the resignation date of the last job position matches the appointment date of the next position)

    I am not sure how to handle both issues simultaneously.
    FYI: DMUci = employee identification number
    BvdIdNumber= company identification number

    For the first issue I would have used
    Code:
    egen distinctuci = tag(BvdIdNumber DMUci)
    However, this is still missing code to differentiate based on AppointmentDate, so the observation with the oldest AppointmentDate is tagged 1 (so I can remove the observations with a 0).

    Additionally, this does not account for the second issue
    Here I would have used the following code if it weren't for the first issue of a person holding multiple positions at once
    Code:
    egen distinctuci = tag(BvdIdNumber DMUci DMAppointmentDate)
    Below an excerpt of my data using dataex, with some of the special cases highlighted. If more info is needed please let me know.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str16 BvdIdNumber str10 DMUci float(AppointmentDate ResignationDate) byte distinctuci
    "BE0518917237"   "P042037775" 22140     . 0
    "BE0518917237"   "P042037775" 19444     . 1
    "BE0518917237"   "P044189380" 19414     . 1
    "CHCHE112126430" "P133073696" 22307 22307 0
    "CHCHE112126430" "P133073695" 16972 22307 0
    "CHCHE112126430" "P133073695" 18743 22307 0
    "CHCHE112126430" "P133073695" 22307 22307 1
    "CHCHE112126430" "P133073696" 16972 22307 1
    "CHCHE112166642" "P072822777" 16500 19220 1
    "CHCHE112166642" "P072822777" 21627 21627 0
    "CHCHE112166642" "P110502195" 16500 19220 1
    "CHCHE112166642" "P072822777" 19220 21627 0
    "CHCHE112180352" "P299671617" 16500 21602 0
    "CHCHE112180352" "P110585610" 21602 21602 0
    "CHCHE112180352" "P110585610" 16500 21602 0
    "GB05752302"     "P048875232" 16882 17209 0
    "GB05752302"     "P005709232"     .     . 0
    "GB05752302"     "P005709232" 16882     . 0
    "GB05752302"     "P048875232" 17302     . 0
    "GB05752555"     "P005709386"     .     . 0
    "GB05752555"     "P005709387" 16882 17871 0
    "GB05752555"     "P005709387" 16882 17871 1
    "GB05752555"     "P005709386" 16882     . 1
    "IT06753520961"  "P204165295" 21732 21914 1
    "IT06753520961"  "P204165296" 22091 22280 1
    "IT06753520961"  "P204165295" 21360 21549 0
    "IT06753520961"  "P204165296" 21360 21549 0
    "IT06753520961"  "P204165295" 22091 22280 0
    "IT06753520961"  "P204165296" 21732 21914 0
    end
    format %td AppointmentDate
    format %td ResignationDate
    Thank you in advance for your help!
    Best regards,
    Laura

  • #2
    Understand the problems but not the solution. I don't think forcing a distinct indicator will be foolproof. Let's assume someone could have a job A from Jan 2022 to Jun 2022, and job B from Mar 2022 to today, then which line will you keep? They are overlapping, but not nested. According to your scheme, either omitting or including the duplicate will be wrong.

    I also do not agree with the distinction coding in the last company (IT06753520961), these two staff seem to be on an annual contract, and yet only one year was counted, wouldn't this make them non-existing in other two years?

    Given the statement below:
    looking at diversity at year 0-year x and so calculating team diversity of the team that was present during a specific time period to account for changes in the composition.
    It'd perform and egen tag every year, so that you can access who is unique in every year:

    Code:
    gsort BvdIdNumber DMUci
    forvalues y = 2005/2021{
    egen in`y' = tag(DMUci BvdIdNumber) if year(AppointmentDate) <= `y' & year(ResignationDate) >= `y'
    }
    Results:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str16 BvdIdNumber str10 DMUci float(AppointmentDate ResignationDate) byte(in2005 in2006 in2007 in2008 in2009 in2010 in2011 in2012 in2013 in2014 in2015 in2016 in2017 in2018 in2019 in2020 in2021)
    "BE0518917237"   "P042037775" 22140     . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
    "BE0518917237"   "P042037775" 19444     . 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0
    "BE0518917237"   "P044189380" 19414     . 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
    "CHCHE112126430" "P133073695" 18743 22307 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
    "CHCHE112126430" "P133073695" 22307 22307 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    "CHCHE112126430" "P133073695" 16972 22307 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
    "CHCHE112126430" "P133073696" 22307 22307 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
    "CHCHE112126430" "P133073696" 16972 22307 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
    "CHCHE112166642" "P072822777" 16500 19220 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
    "CHCHE112166642" "P072822777" 21627 21627 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
    "CHCHE112166642" "P072822777" 19220 21627 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0
    "CHCHE112166642" "P110502195" 16500 19220 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
    "CHCHE112180352" "P110585610" 21602 21602 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
    "CHCHE112180352" "P110585610" 16500 21602 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
    "CHCHE112180352" "P299671617" 16500 21602 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
    "GB05752302"     "P005709232"     .     . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    "GB05752302"     "P005709232" 16882     . 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    "GB05752302"     "P048875232" 17302     . 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    "GB05752302"     "P048875232" 16882 17209 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    "GB05752555"     "P005709386"     .     . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    "GB05752555"     "P005709386" 16882     . 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    "GB05752555"     "P005709387" 16882 17871 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
    "GB05752555"     "P005709387" 16882 17871 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    "IT06753520961"  "P204165295" 21360 21549 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
    "IT06753520961"  "P204165295" 22091 22280 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
    "IT06753520961"  "P204165295" 21732 21914 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
    "IT06753520961"  "P204165296" 21360 21549 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
    "IT06753520961"  "P204165296" 21732 21914 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
    "IT06753520961"  "P204165296" 22091 22280 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
    end
    format %td AppointmentDate
    format %td ResignationDate
    Then, by some looping and an "if inYYYY == 1", it should be easy to compute annual diversity.
    Last edited by Ken Chui; 29 Mar 2023, 07:14.

    Comment


    • #3
      Hi Ken Chui, thank you for your reply! Yes, it might be better to look at it from a year to year basis.
      I will also try to adapt the code as I'm looking not only at years but also at quarters, months and changes.

      For the quarters and months variables I just created a dummy variable telling me if someone is present during that time (see code below, I used forvalues of the max values possible (a company in my dataset can be max 16y old/64 quarters old/ 191 months old)

      Code:
      forvalues i = 0/16 {
      gen year`i'=0
      
      replace year`i'=1 if AppointmentYear <= YearIncorporation & ResignationDate > YearIncorporation + `i'  | AppointmentYear <= YearIncorporation + `i'  & missing(ResignationDate)
      }
      
      forvalues i = 0/64 {
      gen quarter`i'=0
      
      replace quarter`i'=1 if AppointmentQuarter <= IncorporationQuarter & ResignationQuarter > IncorporationQuarter + `i'  | AppointmentQuarter <= IncorporationQuarter + `i'  & missing(ResignationQuarter)
      }
      
      forvalues i = 0/191 {
      gen month`i'=0
      
      replace month`i'=1 if AppointmentMonth <= IncorporationMonth & ResignationMonth > IncorporationMonth + `i'  | AppointmentMonth <= IncorporationMonth + `i'  & missing(ResignationMonth)
      }
      For change I have variables telling me when a team change occurred (thanks to the help of others (https://www.statalist.org/forums/for...ble-a-and-or-b)), as well as whether they were present or not at the time

      Code:
      forvalues i = 1/27 {
      bys BvdIdNumber: gen change_present`i'=0
      
      replace change_present`i'=1 if AppointmentDate <= change_`i' & ResignationDate >=  change_`i' | AppointmentDate <= change_`i'  & missing(ResignationDate)
      }
      
      forvalues i = 1/27 {
      bys BvdIdNumber: replace change_present`i' =. if missing(change_`i')
      }

      However, I did not account for double counting of employees. So I think following your suggestion might be best of looking at it from a year to year (or other time values) basis.
      Rather than my initial idea of removing them upfront before calculating whether they were present or not.

      Thank you very much for your input!
      Best regards,
      Laura

      Comment


      • #4
        Update: I have changed the above code and thought I found a solution, but it isn't giving me the desired result. But in case it would be of use to someone I will update my progress here.

        Code to determine whether a person was present during a certain time period
        Code:
        forvalues i = 0/16 {
        gen year`i'=0
        
        bysort BvdIdNumber: replace year`i'= 1 if yofd(AppointmentDate) <= yofd(DateIncorporation) +  `i' & yofd(ResignationDate) > yofd(DateIncorporation) + `i' | yofd(AppointmentDate) <= yofd(DateIncorporation) +  `i' & missing(ResignationDate)
        
        }
        
        forvalues i = 0/64 {
        gen quarter`i'=0
        
        bysort BvdIdNumber: replace quarter`i'= 1 if qofd(AppointmentDate) <= qofd(DateIncorporation) +  `i' & qofd(ResignationDate) > qofd(DateIncorporation) + `i' | qofd(AppointmentDate) <= qofd(DateIncorporation) +  `i' & missing(ResignationDate)
        
        }
        
        forvalues i = 0/191 {
        gen month`i'=0
        
        bysort BvdIdNumber: replace month`i'= 1 if mofd(AppointmentDate) <= mofd(DateIncorporation) +  `i' & mofd(ResignationDate) > mofd(DateIncorporation) + `i' | mofd(AppointmentDate) <= mofd(DateIncorporation) +  `i' & missing(ResignationDate)
        
        }
        Code to determine whether during a team change a person was present
        Code:
        forvalues i = 1/27 {
        bys BvdIdNumber: gen change_present`i'=0
        
        replace change_present`i'=1 if AppointmentDate <= change_`i' & ResignationDate >=  change_`i' | AppointmentDate <= change_`i'  & missing(ResignationDate)
        }
        
        forvalues i = 1/27 {
        bys BvdIdNumber: replace change_present`i' =. if missing(change_`i')
        }
        Faulty code to check whether I'm not double counting people
        Code:
        forvalues i = 0/16 {
        egen distinctuci_year`i'= tag(BvdIdNumber DMUci) if year`i' == 1
        
        }
        
        forvalues i = 0/64 {
        egen distinctuci_quarter`i'= tag(BvdIdNumber DMUci) if quarter`i' == 1
        
        }
        
        forvalues i = 0/191 {
        egen distinctuci_month`i'= tag(BvdIdNumber DMUci) if month`i' == 1
        
        }
        
        forvalues i = 1/27 {
        egen distinctuci_change`i'= tag(BvdIdNumber DMUci) if change_present`i' == 1
        
        }

        Comment

        Working...
        X