How to Identify Individual Observations in Previous years

Miriam Vogel

Join Date: Jan 2017

Posts: 6
#1

How to Identify Individual Observations in Previous years

07 Jan 2017, 06:36

Dear all,

I encounter some challenges in computing a variable in Stata 14.2 from my dataset:

I want to construct a dummy variable, that tells me if an individual (variable name: person_id), who was working together with other individual(s) in 2007 (the last year of observation) in the same firm (variable name: org_number), has accumulated previous joint working experience in another firm (i.e., prior to 2007 - that would be 2005 and 2006) with at least one of the individuals he/she was working in 2007.

That means I am interested if an individual has accumulated work experience in the previous years (i.e., 2005 and 2006) with at least one of his/her "2007 coworker/s", in another firm than the one firm they were both working for in 2007 (hence 2007 is the reference year). This should be coded as “1”.

If the individual had no prior work experience accumulated in the previous years in another firm with his "2007 co-workers", this should be coded as “0.” My dataset has following structure (just an excerpt as confidential):

CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(person_id org_number2007 org_number2006 org_number2005)
1 1 1 1
2 1 1 4
3 1 1 4
4 1 1 .
5 2 4 5
6 2 7 6
7 2 7 6
8 3 4 3
9 3 4 4
10 4 8 8
end
[/CODE]

For instance person_id 2 and person_id3 have accumulated joint work experience in year 2005 in a firm that was different to 2007), therefore they should receive a “1” in 2005.

I encounter difficulties in writing a code that identifies the same individuals that worked together in 2007 and which can track the same individuals to the years before. Is there any command in Stata that could fix this problem?

Any help is highly appreciated!

Best,

Miriam

Last edited by Miriam Vogel; 07 Jan 2017, 06:39.
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

07 Jan 2017, 06:57

Hello Miriam,

Welcome to the Stata Forum.

I'm not sure if I understood your query, and, unfortunately, the abridged data did not went correctly under CODE delimiters.

That said, even though I don't have the time to check it out in Stata now, I believe you may play with

Code:

. egen byte younameit = diff(org_number2007 org_number2006 org_number2005)

The new variable "younameit" will present 1 whenever there is a difference between organization/years, and zero otherwise.

Shall you wish the opposite, you will just need to recode the variable afterwards.

Hopefully that helps.

Best,

Marcos

Last edited by Marcos Almeida; 07 Jan 2017, 07:01.

Best regards,

Marcos
Comment

Miriam Vogel

Join Date: Jan 2017
Posts: 6

07 Jan 2017, 10:22

Dear Marcos, thank you for taking the time to reply.

It seems my question was a bit misleading. I am not interested in the difference between organizations but rather in the individuals that work in the organizations.

Let me clarify: my Dataset is composed of individuals (which can be identified by Personal IDs). For those individuals it is indicated it in which organization they work for in different years. For instance the individual with the Person_ID == 3 works in 2007 and 2006 in organization 1 (indicated by org_number2007 and org_number2006). In 2005 however, this individual works in organization number 4.

Now I want to identify all individuals who work together in year 2007. In a second step, I however, I want to know which of the individuals who worked together in 2007 also worked together in a previous year (i.e. 2006 and/or 2005) in another organization (with a different org_number). Thus I am interested if individuals working in the same organization in 2007 previously worked together in another organization.

So for instance Person ID 1, 2, 3 and 4 worked together in 2007 (indicated by Org_Number2007), then I check in 2006 the Org_Number2006 did not change, so they did not accumulate any work experience together in another organization. When I however check for Org_number2005, I can see that two of them worked in another organization because Org_Number2007 is different from Org_number2005 (because different number).

Hence I am interested if at least two individuals who worked together in 2007 in the same organization, have acquired joint work experience prior to 2007 in another organization.

I have coded this variables now manually (named joint_work_experience2006 and jointworkexperience2005) but since I have quite many observations, I would like to know if there is a code which could do that automatically? I hope with the elaborations above it is now easier to understand my question. Thank you!

Person_ID	Org_Number2007	Org_Number2006	Org_Number2005	Joint_work experience2006	Joint_work experience2005
1	1	1	1	0	0
2	1	1	4	0	1
3	1	1	4	0	1
4	1	1	.	0	.
5	2	4	5	0	0
6	2	7	6	1	1
7	2	7	6	1	1
8	3	4	3	1	0
9	3	4	4	1	0
10	4	8	8	0	0

Last edited by Miriam Vogel; 07 Jan 2017, 10:27.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30059
#4

07 Jan 2017, 14:38

If I understand what you are looking for, the following code will do it:

Code:

input byte(person_id org_number2007 org_number2006 org_number2005) 1 1 1 1 2 1 1 4 3 1 1 4 4 1 1 . 5 2 4 5 6 2 7 6 7 2 7 6 8 3 4 3 9 3 4 4 10 4 8 8 end // GO TO LONG LAYOUT reshape long org_number, i(person_id) j(year) // MAKE A COPY OF THE DATA IN ANOTHER FILE tempfile copy save `copy' // JOIN THE DATA TO THE COPY, MATCHING // ORG NUMBER & YEAR TO CREATE A DATA SET // OF PAIRS OF PEOPLE WORKING TOGETHER IN // ANY ORGANIZATION IN ANY YEAR rename person_id person_id2 joinby org_number year using `copy' // WORKING WITH YOURSELF DOESN'T COUNT drop if person_id == person_id2 // AND A WITH B IS EQUIVALENT TO B WITH A // SO LOOK ONLY AT PAIRS WITH ID < ID2 keep if person_id < person_id2 // NOW IDENTIFY PAIRS OF WORKERS MEETING DESIRED CRITERION, // WHICH HAS TWO COMPONENTS: // CRITERION 1: WORKED AT SAME ORGANIZATION IN 2007 gsort person_id person_id2 -year by person_id person_id2: gen byte criterion1 = (year[1] == 2007) // CRITERION 2: WORKED AT SAME ORGANIZATION BEFORE 2007, // BUT IT WAS A DIFFERENT ORGANIZATION by person_id person_id2: gen byte criterion2 = org_number != org_number[1] by person_id person_id2 (criterion2), sort: replace criterion2 = criterion2[_N] // FINAL CRITERION: BOTH CRITERION 1 AND CRITERION 2 gen byte criterion = criterion1 & criterion2

At the end, the observations with criterion ==1 identify precisely those pairs of person_id's (person_id and person_id2) who worked at the same organization in 2007 and also previously worked together at some different organization.

By the way, your attempt to use -dataex- in #1 almost worked: it appears that you didn't include the initial [ when you copied from Stata's results window. But the meat of what was needed was there and it was usable. By contrast, the table you show in #3, although easier for human eyes to read, is not readily importable to Stata and is not helpful. So keep using -dataex-. Even if you mess the copy/paste part a tad, it's still more helpful than a human-readable table that's difficult to import. It's especially important for problems like this, which few people, if any, could solve without actually trying it out on the data.

Added: As is so often the case, this problem is not hard to solve with data in long layout, but difficult or impossible in wide. I think one could stare at the wide layout data originally provided for days and not see the solution. But once you put the data into long, a way forward becomes quickly apparent. Most things in Stata are like that. Wide layout data should only be used for the few specific commands that require it, or after analysis is complete to present the data in a layout with greater human readability.

NOTE: If your real data set is large, both the -reshape- and the -joinby- commands may take a long time. Be patient. And if it is really large you may run out of memory. If the latter happens, I suggest chopping up the data set into chunk, each of which contains all of the records for one (or a small number) of organizations in 2007. Then apply the above code separately to each chunk, and append the results together.

Last edited by Clyde Schechter; 07 Jan 2017, 14:44.
Comment
Miriam Vogel

Join Date: Jan 2017

Posts: 6
#5

08 Jan 2017, 12:35

Dear Clyde, thank you very much for your help - much appreciated. I will try this out.
Comment

Announcement