Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Identify Individual Observations in Previous years

    Dear all,

    I encounter some challenges in computing a variable in Stata 14.2 from my dataset:

    I want to construct a dummy variable, that tells me if an individual (variable name: person_id), who was working together with other individual(s) in 2007 (the last year of observation) in the same firm (variable name: org_number), has accumulated previous joint working experience in another firm (i.e., prior to 2007 - that would be 2005 and 2006) with at least one of the individuals he/she was working in 2007.

    That means I am interested if an individual has accumulated work experience in the previous years (i.e., 2005 and 2006) with at least one of his/her "2007 coworker/s", in another firm than the one firm they were both working for in 2007 (hence 2007 is the reference year). This should be coded as “1”.

    If the individual had no prior work experience accumulated in the previous years in another firm with his "2007 co-workers", this should be coded as “0.” My dataset has following structure (just an excerpt as confidential):

    CODE]
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(person_id org_number2007 org_number2006 org_number2005)
    1 1 1 1
    2 1 1 4
    3 1 1 4
    4 1 1 .
    5 2 4 5
    6 2 7 6
    7 2 7 6
    8 3 4 3
    9 3 4 4
    10 4 8 8
    end
    [/CODE]


    For instance person_id 2 and person_id3 have accumulated joint work experience in year 2005 in a firm that was different to 2007), therefore they should receive a “1” in 2005.

    I encounter difficulties in writing a code that identifies the same individuals that worked together in 2007 and which can track the same individuals to the years before. Is there any command in Stata that could fix this problem?

    Any help is highly appreciated!

    Best,

    Miriam
    Last edited by Miriam Vogel; 07 Jan 2017, 06:39.

  • #2
    Hello Miriam,

    Welcome to the Stata Forum.

    I'm not sure if I understood your query, and, unfortunately, the abridged data did not went correctly under CODE delimiters.

    That said, even though I don't have the time to check it out in Stata now, I believe you may play with

    Code:
    . egen byte younameit = diff(org_number2007 org_number2006 org_number2005)
    The new variable "younameit" will present 1 whenever there is a difference between organization/years, and zero otherwise.

    Shall you wish the opposite, you will just need to recode the variable afterwards.

    Hopefully that helps.

    Best,

    Marcos
    Last edited by Marcos Almeida; 07 Jan 2017, 07:01.
    Best regards,

    Marcos

    Comment


    • #3
      Dear Marcos, thank you for taking the time to reply.

      It seems my question was a bit misleading. I am not interested in the difference between organizations but rather in the individuals that work in the organizations.

      Let me clarify: my Dataset is composed of individuals (which can be identified by Personal IDs). For those individuals it is indicated it in which organization they work for in different years. For instance the individual with the Person_ID == 3 works in 2007 and 2006 in organization 1 (indicated by org_number2007 and org_number2006). In 2005 however, this individual works in organization number 4.

      Now I want to identify all individuals who work together in year 2007. In a second step, I however, I want to know which of the individuals who worked together in 2007 also worked together in a previous year (i.e. 2006 and/or 2005) in another organization (with a different org_number). Thus I am interested if individuals working in the same organization in 2007 previously worked together in another organization.

      So for instance Person ID 1, 2, 3 and 4 worked together in 2007 (indicated by Org_Number2007), then I check in 2006 the Org_Number2006 did not change, so they did not accumulate any work experience together in another organization. When I however check for Org_number2005, I can see that two of them worked in another organization because Org_Number2007 is different from Org_number2005 (because different number).

      Hence I am interested if at least two individuals who worked together in 2007 in the same organization, have acquired joint work experience prior to 2007 in another organization.

      I have coded this variables now manually (named joint_work_experience2006 and jointworkexperience2005) but since I have quite many observations, I would like to know if there is a code which could do that automatically? I hope with the elaborations above it is now easier to understand my question. Thank you!
      Person_ID Org_Number2007 Org_Number2006 Org_Number2005 Joint_work experience2006 Joint_work experience2005
      1 1 1 1 0 0
      2 1 1 4 0 1
      3 1 1 4 0 1
      4 1 1 . 0 .
      5 2 4 5 0 0
      6 2 7 6 1 1
      7 2 7 6 1 1
      8 3 4 3 1 0
      9 3 4 4 1 0
      10 4 8 8 0 0
      Last edited by Miriam Vogel; 07 Jan 2017, 10:27.

      Comment


      • #4
        If I understand what you are looking for, the following code will do it:

        Code:
        input byte(person_id org_number2007 org_number2006 org_number2005)
        1 1 1 1
        2 1 1 4
        3 1 1 4
        4 1 1 .
        5 2 4 5
        6 2 7 6
        7 2 7 6
        8 3 4 3
        9 3 4 4
        10 4 8 8
        end
        
        //    GO TO LONG LAYOUT
        reshape long org_number, i(person_id) j(year)
        
        //    MAKE A COPY OF THE DATA IN ANOTHER FILE
        tempfile copy
        save `copy'
        
        //    JOIN THE DATA TO THE COPY, MATCHING
        //    ORG NUMBER & YEAR TO CREATE A DATA SET
        //    OF PAIRS OF PEOPLE WORKING TOGETHER IN
        //    ANY ORGANIZATION IN ANY YEAR
        rename person_id person_id2
        joinby org_number year using `copy'
        
        //    WORKING WITH YOURSELF DOESN'T COUNT
        drop if person_id == person_id2
        
        //    AND A WITH B IS EQUIVALENT TO B WITH A
        //    SO LOOK ONLY AT PAIRS WITH ID < ID2
        keep if person_id < person_id2
        
        //    NOW IDENTIFY PAIRS OF WORKERS MEETING DESIRED CRITERION,
        //    WHICH HAS TWO COMPONENTS:
        
        //    CRITERION 1: WORKED AT SAME ORGANIZATION IN 2007
        gsort person_id person_id2 -year
        by person_id person_id2: gen byte criterion1 = (year[1] == 2007)
        
        //    CRITERION 2: WORKED AT SAME ORGANIZATION BEFORE 2007,
        //    BUT IT WAS A DIFFERENT ORGANIZATION
        by person_id person_id2: gen byte criterion2 = org_number != org_number[1]
        by person_id person_id2 (criterion2), sort: replace criterion2 = criterion2[_N]
        
        //    FINAL CRITERION: BOTH CRITERION 1 AND CRITERION 2
        gen byte criterion = criterion1 & criterion2
        At the end, the observations with criterion ==1 identify precisely those pairs of person_id's (person_id and person_id2) who worked at the same organization in 2007 and also previously worked together at some different organization.

        By the way, your attempt to use -dataex- in #1 almost worked: it appears that you didn't include the initial [ when you copied from Stata's results window. But the meat of what was needed was there and it was usable. By contrast, the table you show in #3, although easier for human eyes to read, is not readily importable to Stata and is not helpful. So keep using -dataex-. Even if you mess the copy/paste part a tad, it's still more helpful than a human-readable table that's difficult to import. It's especially important for problems like this, which few people, if any, could solve without actually trying it out on the data.

        Added: As is so often the case, this problem is not hard to solve with data in long layout, but difficult or impossible in wide. I think one could stare at the wide layout data originally provided for days and not see the solution. But once you put the data into long, a way forward becomes quickly apparent. Most things in Stata are like that. Wide layout data should only be used for the few specific commands that require it, or after analysis is complete to present the data in a layout with greater human readability.

        NOTE: If your real data set is large, both the -reshape- and the -joinby- commands may take a long time. Be patient. And if it is really large you may run out of memory. If the latter happens, I suggest chopping up the data set into chunk, each of which contains all of the records for one (or a small number) of organizations in 2007. Then apply the above code separately to each chunk, and append the results together.
        Last edited by Clyde Schechter; 07 Jan 2017, 14:44.

        Comment


        • #5
          Dear Clyde, thank you very much for your help - much appreciated. I will try this out.

          Comment

          Working...
          X