count over previous observations that meet a condition

Gioia Mariani

Join Date: May 2017

Posts: 11
#1

count over previous observations that meet a condition

18 May 2017, 06:25

Dear all,
I am quite new to Stata and still struggling with some commands.

My dataset contains information on workers and their workplace: most workers are observed in the years 1990-1997. For each observation, I have a worker_id and a firm_id: some of the workers changed their workplace over the years. I have a dummy variable identifying those workers and the id of the firm they previously worked in. I am trying to build a measure of their network, as given by the number of coworkers in their previous workplace.

Is there a way to count the observations whose firm_id is identical to the previous_firm_id that the job-switching observations have and whose year is antecedent to the one of the switch? It seems to me that some kind of loop must be created, but I haven't been able to find a way yet.

Thank you, any help would be much appreciated.

Gioia
Tags: count, loop
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

18 May 2017, 07:09

Welcome to the Stata Forum / Statalist.

I kindly recommend to follow the FAQ advice, I mean, sharing data under CODE delimiters or by installing the SSC dataex.

Also, you could present "the wishful" variable in the example.

You don't need to share the whole dataset, Just an abridged version or a toy example will do fine.

Best regards,

Marcos
Comment
Gioia Mariani

Join Date: May 2017

Posts: 11
#3

18 May 2017, 08:24

Here is a simplified example of what I have in mind.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(worker_id firm_id) int anno byte(d previous_firm network) 1 1 1990 0 . . 1 1 1991 0 . . 1 1 1992 0 . . 2 1 1990 0 . . 2 1 1991 0 . . 2 2 1992 1 1 2 3 2 1990 0 . . 3 2 1991 0 . . 3 2 1992 0 . . 4 1 1990 0 . . 4 1 1991 0 . . 4 1 1992 0 . . end

Network is what I would like to obtain, i.e. the count of the workers (in this example, worker_id 1 and worker_id 4) which up to the year of the switch (i.e. up to 1991) worked in the previous_firm .

Hope it is clearer in this way!

Best,
Gioia
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#4

18 May 2017, 15:19

This may not be as complicated as you think. However, for the sake of consistency, note the following:

1) The total number of workers varies by year. Therefore, you want a count for the year that the worker changed firms.
2) Since you want the count at the point where the worker changed firms, you can take advantage of the fact that the observation
immediately preceding that defines the firm and year of interest.

Code:

*\\ first generate total no. of workers in a given firm and year. sort worker_id (anno) egen nworkers = count(worker_id), by(firm_id anno) *\\ count no. of coworkers (= total workers - self) gen network2 = cond(d==1, nworkers[_n-1]-1, .)

Beware of duplicate observations. You should eliminate these before running the code.

Last edited by Andrew Musau; 18 May 2017, 15:23.
1 like
Comment
Gioia Mariani

Join Date: May 2017

Posts: 11
#5

19 May 2017, 07:45

This was very helpful, thank you for your answer!
Comment
Gioia Mariani

Join Date: May 2017

Posts: 11
#6

27 May 2017, 08:56

Hello again, this is an excerpt of my data after cleaning and keeping only the observations which have the outcome I am interested in:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long worker_id int year long firm_id byte dummy long previous_firm byte network 3673681 2009 7187167 0 . . 617972 2006 7187167 0 . . 8098419 2010 7187167 0 . . 5159819 2006 7187167 0 . . 7314747 2010 2376 0 . . 5258171 2008 7187167 0 . . 8337358 2009 7187167 0 . . 7774480 2011 7187167 0 . . 878789 2011 7187 0 . . 3898310 2012 7187167 0 . . 67635 2007 7187167 0 . . 8976214 2012 7187167 0 . . 772794 2011 7187167 0 . . 6767379 2006 7187167 0 . . 4447136 2012 7187167 0 . . 476385 2009 7187167 0 . . 3225566 2008 7187167 0 . . 7901186 2006 7187167 0 . . 2059679 2008 7187167 0 . . 5467850 2009 2301 1 7187167 8 end

"Network" is still the variable I want to obtain, i.e. the count of the coworkers before the switch (which is again indexed by the dummy). The code suggested before does not work anymore because now it is not obvious that the previous observations belong to the same firm (sometimes I have only one observation per firm, and the workers who changed workplace are rare in the dataset).

I guess that my question is, more in general, how to count observations conditional of the values of variables of other observations (previous_firm and year), which are "tagged" by the dummy.

I would be very grateful if someone could help!

Best

Gioia
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

27 May 2017, 09:33

I think that you need to calculate the number of workers in the previous firm by reducing the data to one observation per firm and year and then use a cumulative sum to add up the workers for prior years. Once you have this information, you can merge it with the original data.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long worker_id int year long firm_id byte dummy long previous_firm byte network
3673681 2009 7187167 0       . .
 617972 2006 7187167 0       . .
8098419 2010 7187167 0       . .
5159819 2006 7187167 0       . .
7314747 2010    2376 0       . .
5258171 2008 7187167 0       . .
8337358 2009 7187167 0       . .
7774480 2011 7187167 0       . .
 878789 2011    7187 0       . .
3898310 2012 7187167 0       . .
  67635 2007 7187167 0       . .
8976214 2012 7187167 0       . .
 772794 2011 7187167 0       . .
6767379 2006 7187167 0       . .
4447136 2012 7187167 0       . .
 476385 2009 7187167 0       . .
3225566 2008 7187167 0       . .
7901186 2006 7187167 0       . .
2059679 2008 7187167 0       . .
5467850 2009    2301 1 7187167 8
end
gen order = _n
save "statalist_ex.dta", replace

* create a count of number of workers per year per firm
collapse (count) n=worker_id, by(firm_id year)

* make it a cumulative count of prior years
bysort firm_id : gen cumsum = sum(n[_n-1])
drop n

* merge the count of prior years with original data
rename firm_id previous_firm
merge 1:m previous_firm year using "statalist_ex.dta", ///    
    keep(match using) nogen

sort order

Announcement

count over previous observations that meet a condition

Comment

Comment

Comment

Comment

Comment

Comment