Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating new variables (change_1,change_n) according to changes in variable A and/or B

    Dear all,

    I was hoping someone would be able to help me figure out the following:

    I want to create new variables (change_1, change_2...change_n), that contains the date of when a change in the team happened (someone left/joined).
    The variables used to track this are 'AppointmentDate' (joined) and 'ResignationDate' (left).
    Whenever a change happens, I want to create a new variable 'change_1', 'change_2'...., and it contains either the appointment or resignation date.

    BvdIdNumber = the company number (used as a grouping variable).
    UCI = employee number


    It could be that some firms only have 1 change vs others having multiple during their lifecycle, therefore that I wanted to not only look at fixed time periods (eg. years) as for some firms this means no change happened, while for others many team changes have occurred.
    Additionally, if possible, I would only like to track changes as of 1 year (potentially 1/2 year) after the company's start (DateofIncorporation) not to count the founding team as a change.
    Small side note: There are still some duplicates in the data (aka same employee number and same appointment/resignation date), do I need to get rid of these first?


    My goal with this is to create a wide dataset that I can then, after removing individual-level variables, turn into a long format to perform a survival analysis (stcox).
    Rather than using only years or months to identify team composition, I would also try my hand at looking at changes and calculating gender/age/nationality diversity after each change (perhaps using stsplit).
    I would then based on the date in the variable calculate a diversity measure for all people present at that time.

    Would anyone know how to start coding this or point me in the right direction?
    I have not been able to find what I am looking for, but it could also be me not using the right vocab.

    Thanks in advance for your help!
    Below is an excerpt of my data using dataex


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str16 BvdIdNumber str10 DateOfIncorporation float(FirmFailure2 DateFailure) str10(DMUci DMAppointmentDate DMResignationDate) float DMGenderDummy str10 DMBirthdate str27 Nationality
    "AT9070350951" "2016"       1     . "P039706490" "04/05/2016" ""           0 "1978"       "Austria"       
    "AT9070350951" "2016"       1     . "P039706490" "04/05/2016" ""           0 "1978"       "Austria"       
    "AT9070350951" "2016"       1 22410 "P039706490" "04/05/2016" ""           0 "1978"       "Austria"       
    "AT9070350951" "2016"       1 22410 "P206368021" "04/05/2016" ""           0 "1961"       "Austria"       
    "AT9070350951" "2016"       1 22410 "P039706490" "04/05/2016" ""           0 "1978"       "Austria"       
    "AT9070350951" "2016"       1     . "P206368021" "04/05/2016" ""           0 "1961"       "Austria"       
    "AT9070422953" "11/12/2017" 1 21844 "P301627755" "18/09/2018" "16/09/2019" 0 "10/03/1995" "Austria"       
    "AT9070422953" "11/12/2017" 1 21844 "P300272480" "24/01/2018" ""           0 "17/04/1993" "Austria"       
    "AT9110939024" "08/01/2016" 1 21825 "P117216063" "04/02/2016" ""           0 "1968"       "Germany"       
    "AT9110939024" "08/01/2016" 1 21825 "P001968694" "04/02/2016" ""           0 "30/03/1962" "Germany"       
    "BE0536436625" "28/06/2013" 1 20438 "P009351640" "31/01/2015" "16/12/2015" 0 "1967"       "Canada"        
    "BE0536436625" "28/06/2013" 1 20438 "P009351700" "31/01/2015" "16/12/2015" 0 "1960"       "United Kingdom"
    "BE0536436625" "28/06/2013" 1     . "P009351640" "31/01/2015" "16/12/2015" 0 "1967"       "Canada"        
    "BE0536436625" "28/06/2013" 1 20438 "P048688796" "31/01/2015" "16/12/2015" 0 "1971"       "United Kingdom"
    "BE0536436625" "28/06/2013" 1     . "P009351700" "31/01/2015" "16/12/2015" 0 "1960"       "United Kingdom"
    "BE0536436625" "28/06/2013" 1     . "P048688796" "31/01/2015" "16/12/2015" 0 "1971"       "United Kingdom"
    "BE0536436625" "28/06/2013" 1 20438 "P014424187" "28/06/2013" ""           0 "1956"       "Canada"        
    "BE0536436625" "28/06/2013" 1     . "P014424187" "28/06/2013" ""           0 "1956"       "Canada"        
    "BE0555980046" "10/07/2014" 1 21236 "P039659380" "05/08/2015" "21/02/2018" 0 "1965"       "United Kingdom"
    "BE0555980046" "10/07/2014" 1 21236 "P076509994" "10/07/2014" ""           0 "11/11/1970" "France"        
    "BE0656918246" "22/06/2016" 1 21081 "P349820174" "22/06/2016" "23/06/2020" 0 "06/11/1980" "Bulgaria"      
    "BE0656918246" "22/06/2016" 1 21081 "P141888237" "24/11/2016" ""           0 "31/07/1989" "Bulgaria"      
    "BE0656918246" "22/06/2016" 1 21081 "P349820174" "22/06/2016" "23/06/2020" 0 "06/11/1980" "Bulgaria"      
    "BE0656918246" "22/06/2016" 1 21081 "P141888237" "24/11/2016" ""           0 "31/07/1989" "Bulgaria"      
    "BE0657890622" "30/06/2016" 1 21720 "P077578372" "30/06/2016" "20/06/2019" 0 "19/11/1964" "France"        
    "BE0657890622" "30/06/2016" 1 21720 "P350651532" "30/06/2016" "20/06/2019" 1 "10/11/1965" "France"        
    "BE0664736644" "17/10/2016" 1 21181 "P365060689" "17/10/2016" ""           0 "22/01/1981" "Netherlands"   
    "BE0664736644" "17/10/2016" 1 21181 "P438431943" "17/10/2016" ""           1 "11/12/1991" "Romania"       
    "BE0664799594" "18/10/2016" 1 21906 "P006533913" "18/10/2016" "23/12/2019" 0 "16/02/1968" "United Kingdom"
    "BE0664799594" "18/10/2016" 1 21906 "P263565204" "18/10/2016" "23/12/2019" 0 "24/08/1939" "Greece"        
    "BE0664799594" "18/10/2016" 1 21906 "P006533911" "18/10/2016" "23/12/2019" 0 "29/07/1969" "Greece"        
    "BE0808687713" "23/12/2008" 1     . "P068611593" "02/01/2011" "03/10/2017" 0 "15/01/1968" "Denmark"       
    "BE0808687713" "23/12/2008" 1 19837 "P068611593" "02/01/2011" "03/10/2017" 0 "15/01/1968" "Denmark"       
    end
    format %td DateFailure

    Best regards,
    Laura

  • #2
    Code:
    //  REPLACE STRING VARIABLES THAT LOOK LIKE DATES WITH STATA NUMERIC DATE VARIABLES
    foreach v of varlist DMAppointmentDate DMResignationDate {
        gen _`v' = daily(`v', "DMY"), after(`v')
        assert missing(_`v') == missing(`v')
        format _`v' %td
        drop `v'
    }
    
    gen _DateOfIncorporation = daily(DateOfIncorporation, "DMY"), after(DateOfIncorporation)
    replace _DateOfIncorporation = mdy(1, 1, real(DateOfIncorporation)) ///
        if missing(_DateOfIncorporation)
    assert missing(_DateOfIncorporation) == missing(DateOfIncorporation)
    format _DateOfIncorporation %td
    drop DateOfIncorporation
    rename _* *
    
    
    frame put BvdIdNumber DMAppointmentDate DMResignationDate DateOfIncorporation, ///
        into(changes)
    frame change changes
    gen `c(obs_t)' obs_no = _n
    reshape long DM@Date, i(obs_no) j(event) string
    drop if missing(DMDate)
    drop obs_no event
    duplicates drop
    drop if DMDate - DateOfIncorporation < 365 // DON'T COUNT CHANGES IN FIRST YEAR
    by BvdIdNumber (DMDate), sort: gen _j = _n
    rename DMDate change_
    reshape wide change_, i(BvdIdNumber) j(_j)
    At the end of this code, the original data will still be in the default frame, except that the useless string dates have been replaced by Stata numeric dates. The contents of frame changes will be a reduced data set containing each firm's BvdIdNumber and a series of change_* variables containing the corresponding appointment and resignation dates. Change events occurring within 365 days of incorporation are excluded.

    Your DateOfIncorporation variable is sometimes given as a complete date, and sometimes as just the year. For those where only the year is given, I have imputed an incorporation date of 1 January of that year. If that doesn't seem appropriate to you, you can change that accordingly.

    Note that if two or more change events occur on the same date, they are counted as a single change

    You do not have to drop the duplicate observations from your data in order for this particular code to work properly. But it is seldom a good idea to use a data set that has purely duplicate observations. In fact, just having such a data set usually means that something went wrong in the data management that created it. So I urge you to carefully review how this data set was created, and fix any errors you find in the process. If you cannot find any errors (other than perhaps forgetting to drop duplicate observations at a point where they were created without error) then -duplicates drop- will get rid of the duplicates for you easily.

    Comment


    • #3
      Clyde Schechter, thank you for your help and advice! Also thanks for pointing out the issues with my date variable!

      I am not yet versed in the use of frames, but will read up about it now, as it seems quite useful for regular use (until now I have been saving each dataset separately in a different file).
      However, trying to run the file, I immediately get the error "command frame is unrecognized".

      I will look into this, but in the meantime, I ran your code on the dataex sample, and it produced the desired outcome!
      Thank you very much!

      Best regards,
      Laura

      Comment


      • #4
        However, trying to run the file, I immediately get the error "command frame is unrecognized".
        That almost certainly means you are using a version of Stata from before version 16, which is when frames were introduced. The Forum FAQ does ask people to explicitly state in their posts what version of Stata they are using if it is not the current one, which is presently 17.

        Here is a revised version of the code that does not rely on frames. As a side effect, the original data is lost. Since the original data set contains those dysfunctional string date variables, what I suggest you do is, at the point indicated below by the comment, -save- the data in a new data set. Then, when you need to go back to the individual level data, but with numeric date variables, you can. At the end of the code, the data set of BvdIdNumbers and associated change variables is in memory.

        Code:
        //  REPLACE STRING VARIABLES THAT LOOK LIKE DATES WITH STATA NUMERIC DATE VARIABLES
        foreach v of varlist DMAppointmentDate DMResignationDate {
            gen _`v' = daily(`v', "DMY"), after(`v')
            assert missing(_`v') == missing(`v')
            format _`v' %td
            drop `v'
        }
        
        gen _DateOfIncorporation = daily(DateOfIncorporation, "DMY"), after(DateOfIncorporation)
        replace _DateOfIncorporation = mdy(1, 1, real(DateOfIncorporation)) ///
            if missing(_DateOfIncorporation)
        assert missing(_DateOfIncorporation) == missing(DateOfIncorporation)
        format _DateOfIncorporation %td
        drop DateOfIncorporation
        rename _* *
        // I SUGGEST YOU STORE THIS DATA IN A FILE NOW
        
        
        keep BvdIdNumber DMAppointmentDate DMResignationDate DateOfIncorporation
        gen `c(obs_t)' obs_no = _n
        reshape long DM@Date, i(obs_no) j(event) string
        drop if missing(DMDate)
        drop obs_no event
        duplicates drop
        drop if DMDate - DateOfIncorporation < 365 // DON'T COUNT CHANGES IN FIRST YEAR
        by BvdIdNumber (DMDate), sort: gen _j = _n
        rename DMDate change_
        reshape wide change_, i(BvdIdNumber) j(_j)


        Comment


        • #5
          Hi Clyde Schechter, sorry for not mentioning it, I completely forgot. I checked and indeed my organization uses Stata 15.1. Will make sure I mention this in the future, thanks for making me aware!

          Thanks for the revised code!

          Comment

          Working...
          X