Combining Variables Across Observations

Erik Olson

Join Date: Aug 2016
Posts: 3

Combining Variables Across Observations

27 Aug 2016, 15:26

Hello all,

I wish to collapse(sum) all of the variables in 3 company-year observations with 3 company-year observations of another company in my dataset. I would like the second company to absorb the variable values of the first company for the particular years of interest. An example is listed below:

15349	1994	Company 1	2726.6
15349	1995	Company 1	2775
15349	1996	Company 1	2730.4
10787	1994	Company 2	2416.378
10787	1995	Company 2	3381.461
10787	1996	Company 2	3953.936
10787	1997	Company 2	10447
10787	1998	Company 2	16526.6
10787	1999	Company 2	32361.6
10787	2000	Company 2	40404.3

The value in the left hand column is the unique company identifier, which I believe will be needed for coding purposes. The right most column is one particular variable for explanatory purposes.

I would like the observations for company 1 in years 1994, 1995, 1996 to be absorbed by company 2's observations in 1994, 1995, and 1996. Again, although only one variable has been given as a check for anyone willing to help out, I would like all of the variables for each observation of company 1 to be combined with its corresponding company-year observation in company 2. Ideally, I would like the absorbed observations to be removed from the dataset in the process, as I will be doing a nearest neighbor match later in my data analysis.

Should be simple, but I'm new to Stata and want to get this right!

Thanks!

Erik

Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

27 Aug 2016, 16:58

So, before you can do anything, you need to get your data into a Stata data set and have variable names. I've worked your example into a data set and the following code, generated by -dataex- will allow you to import it easily. In the future, please post all example data using the -dataex- command so that those who would help you can also easily replicate your example data with no guesswork and minimal effort.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(company_id year) str9 company_name float whatever long pair_id
15349 1994 "Company 1"   2726.6 1
15349 1995 "Company 1"     2775 1
15349 1996 "Company 1"   2730.4 1
10787 1994 "Company 2" 2416.378 1
10787 1995 "Company 2" 3381.461 1
10787 1996 "Company 2" 3953.936 1
10787 1997 "Company 2"    10447 1
10787 1998 "Company 2"  16526.6 1
10787 1999 "Company 2"  32361.6 1
10787 2000 "Company 2"  40404.3 1
end

Note that I have created a new variable, pair_id. In your example you have only two companies forming one pair. I imagine that in your real data you have several pairs of companies that you would like to combine. You will need to assign a distinct pair_id to each such pair. If there are "singleton" firms that do not get paired with any other firm, then they should each be assigned a distinct pair_id that they do not actually share with any other company.

The following code then does the combining:

Code:

sort pair_id company_id year
by pair_id year, sort: egen combined_whatever = total(whatever)
by pair_id (company_id year), sort: gen cid1 = company_id[1]
by pair_id (company_id year): gen cid2 = company_id[_N]
by pair_id (company_id year): gen cname1 = company_name[1]
by pair_id (company_id year): gen cname2 = company_name[_N]
by pair_id year, sort: keep if _n == 1
drop company_id company_name whatever

list, noobs clean

If I understand your question correctly, you actually have several variables like the one I called whatever that you want to sum in this way. That requires only a slight modification of the code to build a loop over that list of variables.

Code:

sort pair_id company_id year

local to_combine whatever // AND ALSO LIST THE OTHER VARIABLES TO BE COMBINED HERE

foreach v of local to_combine {
    by pair_id year, sort: egen combined_`v' = total(`v')
}

by pair_id (company_id year), sort: gen cid1 = company_id[1]
by pair_id (company_id year): gen cid2 = company_id[_N]
by pair_id (company_id year): gen cname1 = company_name[1]
by pair_id (company_id year): gen cname2 = company_name[_N]
by pair_id year, sort: keep if _n == 1
drop company_id company_name `to_combine'

list, noobs clean

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

27 Aug 2016, 19:26

Clyde gives a flexible robust solution to your problem. I'd like to present an alternative that may meet your needs, especially if indeed only a few pairs of companies need to be combined. It's always good to call attention to the collapse command in any event.

Code:

clear
input int(company_id year) str9 company_name float whatever
15349 1994 "Company 1"   2726.6
15349 1995 "Company 1"     2775
15349 1996 "Company 1"   2730.4
10787 1994 "Company 2" 2416.378
10787 1995 "Company 2" 3381.461
10787 1996 "Company 2" 3953.936
10787 1997 "Company 2"    10447
10787 1998 "Company 2"  16526.6
10787 1999 "Company 2"  32361.6
10787 2000 "Company 2"  40404.3
end
replace company_name = "Company 2" if company_id==15349 & inrange(year,1994,1996)
replace company_id = 10787         if company_id==15349 & inrange(year,1994,1996)
collapse (sum) whatever, by(company_id year company_name)
list, clean

Code:

       compan~d   year   company~e   whatever  
  1.      10787   1994   Company 2   5142.978  
  2.      10787   1995   Company 2   6156.461  
  3.      10787   1996   Company 2   6684.336  
  4.      10787   1997   Company 2      10447  
  5.      10787   1998   Company 2    16526.6  
  6.      10787   1999   Company 2    32361.6  
  7.      10787   2000   Company 2    40404.3

Announcement

Combining Variables Across Observations

Comment

Comment