Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining Variables Across Observations

    Hello all,

    I wish to collapse(sum) all of the variables in 3 company-year observations with 3 company-year observations of another company in my dataset. I would like the second company to absorb the variable values of the first company for the particular years of interest. An example is listed below:
    15349 1994 Company 1 2726.6
    15349 1995 Company 1 2775
    15349 1996 Company 1 2730.4
    10787 1994 Company 2 2416.378
    10787 1995 Company 2 3381.461
    10787 1996 Company 2 3953.936
    10787 1997 Company 2 10447
    10787 1998 Company 2 16526.6
    10787 1999 Company 2 32361.6
    10787 2000 Company 2 40404.3
    The value in the left hand column is the unique company identifier, which I believe will be needed for coding purposes. The right most column is one particular variable for explanatory purposes.

    I would like the observations for company 1 in years 1994, 1995, 1996 to be absorbed by company 2's observations in 1994, 1995, and 1996. Again, although only one variable has been given as a check for anyone willing to help out, I would like all of the variables for each observation of company 1 to be combined with its corresponding company-year observation in company 2. Ideally, I would like the absorbed observations to be removed from the dataset in the process, as I will be doing a nearest neighbor match later in my data analysis.

    Should be simple, but I'm new to Stata and want to get this right!

    Thanks!

    Erik

  • #2
    So, before you can do anything, you need to get your data into a Stata data set and have variable names. I've worked your example into a data set and the following code, generated by -dataex- will allow you to import it easily. In the future, please post all example data using the -dataex- command so that those who would help you can also easily replicate your example data with no guesswork and minimal effort.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(company_id year) str9 company_name float whatever long pair_id
    15349 1994 "Company 1"   2726.6 1
    15349 1995 "Company 1"     2775 1
    15349 1996 "Company 1"   2730.4 1
    10787 1994 "Company 2" 2416.378 1
    10787 1995 "Company 2" 3381.461 1
    10787 1996 "Company 2" 3953.936 1
    10787 1997 "Company 2"    10447 1
    10787 1998 "Company 2"  16526.6 1
    10787 1999 "Company 2"  32361.6 1
    10787 2000 "Company 2"  40404.3 1
    end
    Note that I have created a new variable, pair_id. In your example you have only two companies forming one pair. I imagine that in your real data you have several pairs of companies that you would like to combine. You will need to assign a distinct pair_id to each such pair. If there are "singleton" firms that do not get paired with any other firm, then they should each be assigned a distinct pair_id that they do not actually share with any other company.

    The following code then does the combining:
    Code:
    sort pair_id company_id year
    by pair_id year, sort: egen combined_whatever = total(whatever)
    by pair_id (company_id year), sort: gen cid1 = company_id[1]
    by pair_id (company_id year): gen cid2 = company_id[_N]
    by pair_id (company_id year): gen cname1 = company_name[1]
    by pair_id (company_id year): gen cname2 = company_name[_N]
    by pair_id year, sort: keep if _n == 1
    drop company_id company_name whatever
    
    list, noobs clean
    If I understand your question correctly, you actually have several variables like the one I called whatever that you want to sum in this way. That requires only a slight modification of the code to build a loop over that list of variables.

    Code:
    sort pair_id company_id year
    
    local to_combine whatever // AND ALSO LIST THE OTHER VARIABLES TO BE COMBINED HERE
    
    foreach v of local to_combine {
        by pair_id year, sort: egen combined_`v' = total(`v')
    }
    
    by pair_id (company_id year), sort: gen cid1 = company_id[1]
    by pair_id (company_id year): gen cid2 = company_id[_N]
    by pair_id (company_id year): gen cname1 = company_name[1]
    by pair_id (company_id year): gen cname2 = company_name[_N]
    by pair_id year, sort: keep if _n == 1
    drop company_id company_name `to_combine'
    
    list, noobs clean

    Comment


    • #3
      Clyde gives a flexible robust solution to your problem. I'd like to present an alternative that may meet your needs, especially if indeed only a few pairs of companies need to be combined. It's always good to call attention to the collapse command in any event.
      Code:
      clear
      input int(company_id year) str9 company_name float whatever
      15349 1994 "Company 1"   2726.6
      15349 1995 "Company 1"     2775
      15349 1996 "Company 1"   2730.4
      10787 1994 "Company 2" 2416.378
      10787 1995 "Company 2" 3381.461
      10787 1996 "Company 2" 3953.936
      10787 1997 "Company 2"    10447
      10787 1998 "Company 2"  16526.6
      10787 1999 "Company 2"  32361.6
      10787 2000 "Company 2"  40404.3
      end
      replace company_name = "Company 2" if company_id==15349 & inrange(year,1994,1996)
      replace company_id = 10787         if company_id==15349 & inrange(year,1994,1996)
      collapse (sum) whatever, by(company_id year company_name)
      list, clean
      Code:
             compan~d   year   company~e   whatever  
        1.      10787   1994   Company 2   5142.978  
        2.      10787   1995   Company 2   6156.461  
        3.      10787   1996   Company 2   6684.336  
        4.      10787   1997   Company 2      10447  
        5.      10787   1998   Company 2    16526.6  
        6.      10787   1999   Company 2    32361.6  
        7.      10787   2000   Company 2    40404.3

      Comment

      Working...
      X