Hello everyone,
I had a question regarding my dataset cleaning which I hope you would be able to help me out
Disclaimer: I am using Stata/MP version 15.1
I am calculating team diversity using panel data; looking at diversity at year 0-year x and so calculating team diversity of the team that was present during a specific time period to account for changes in the composition.
However, sometimes I encounter the following issues:
1) Same person (based on employee number = DMUci ) has multiple positions within the team (eg. Manager and Administrator), sometimes with a different appointment date. However, I do not want to count them double.
So, I would like to only count them once, as of the first time they joined the team.
2) The same person left the team and then came back (based on AppointmentDates and ResignationDates), I only want to count them for the time period they were in the team. (It could even be that the resignation date of the last job position matches the appointment date of the next position)
I am not sure how to handle both issues simultaneously.
FYI: DMUci = employee identification number
BvdIdNumber= company identification number
For the first issue I would have used
However, this is still missing code to differentiate based on AppointmentDate, so the observation with the oldest AppointmentDate is tagged 1 (so I can remove the observations with a 0).
Additionally, this does not account for the second issue
Here I would have used the following code if it weren't for the first issue of a person holding multiple positions at once
Below an excerpt of my data using dataex, with some of the special cases highlighted. If more info is needed please let me know.
Thank you in advance for your help!
Best regards,
Laura
I had a question regarding my dataset cleaning which I hope you would be able to help me out
Disclaimer: I am using Stata/MP version 15.1
I am calculating team diversity using panel data; looking at diversity at year 0-year x and so calculating team diversity of the team that was present during a specific time period to account for changes in the composition.
However, sometimes I encounter the following issues:
1) Same person (based on employee number = DMUci ) has multiple positions within the team (eg. Manager and Administrator), sometimes with a different appointment date. However, I do not want to count them double.
So, I would like to only count them once, as of the first time they joined the team.
2) The same person left the team and then came back (based on AppointmentDates and ResignationDates), I only want to count them for the time period they were in the team. (It could even be that the resignation date of the last job position matches the appointment date of the next position)
I am not sure how to handle both issues simultaneously.
FYI: DMUci = employee identification number
BvdIdNumber= company identification number
For the first issue I would have used
Code:
egen distinctuci = tag(BvdIdNumber DMUci)
Additionally, this does not account for the second issue
Here I would have used the following code if it weren't for the first issue of a person holding multiple positions at once
Code:
egen distinctuci = tag(BvdIdNumber DMUci DMAppointmentDate)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str16 BvdIdNumber str10 DMUci float(AppointmentDate ResignationDate) byte distinctuci "BE0518917237" "P042037775" 22140 . 0 "BE0518917237" "P042037775" 19444 . 1 "BE0518917237" "P044189380" 19414 . 1 "CHCHE112126430" "P133073696" 22307 22307 0 "CHCHE112126430" "P133073695" 16972 22307 0 "CHCHE112126430" "P133073695" 18743 22307 0 "CHCHE112126430" "P133073695" 22307 22307 1 "CHCHE112126430" "P133073696" 16972 22307 1 "CHCHE112166642" "P072822777" 16500 19220 1 "CHCHE112166642" "P072822777" 21627 21627 0 "CHCHE112166642" "P110502195" 16500 19220 1 "CHCHE112166642" "P072822777" 19220 21627 0 "CHCHE112180352" "P299671617" 16500 21602 0 "CHCHE112180352" "P110585610" 21602 21602 0 "CHCHE112180352" "P110585610" 16500 21602 0 "GB05752302" "P048875232" 16882 17209 0 "GB05752302" "P005709232" . . 0 "GB05752302" "P005709232" 16882 . 0 "GB05752302" "P048875232" 17302 . 0 "GB05752555" "P005709386" . . 0 "GB05752555" "P005709387" 16882 17871 0 "GB05752555" "P005709387" 16882 17871 1 "GB05752555" "P005709386" 16882 . 1 "IT06753520961" "P204165295" 21732 21914 1 "IT06753520961" "P204165296" 22091 22280 1 "IT06753520961" "P204165295" 21360 21549 0 "IT06753520961" "P204165296" 21360 21549 0 "IT06753520961" "P204165295" 22091 22280 0 "IT06753520961" "P204165296" 21732 21914 0 end format %td AppointmentDate format %td ResignationDate
Best regards,
Laura
Comment