Create observations (i.e. adding a row) in the dataset

JungHwan Kim

Join Date: Jun 2020

Posts: 21
#1

Create observations (i.e. adding a row) in the dataset

05 Sep 2020, 21:43

Hello everyone. Thank you for your time to read my question. The below dataset consists of countries and industries. For industries, they are both in columns and rows. This is a sample data that I created following format of the World Input and Output Dataset (please ignore the numerical values: they are randomly assigned). To summarize, I want to add a new row that calculates sum of asian country's values so that the new observation at the last row will show the sum of asian countries's industries' values. I was thinking first to make a dummy variable indicating whether the country listed in country variable is Asian country. After that, I wanted to sum the column by industry, conditioning on country being Asian country(i.e. Japan and China in this sample). However, I was not sure how I could proceed this. Summing the row seems to be much easier which can be done by egen rowtotal if... Could someone please give me adivse to creating the new observation at the last row that calculates the conditional sum of column? Thank you in advance for your help!

Code:

clear input industry country fish forestry manufacture etc 10 1 1294.2 112.5 324.1 1.33 20 1 83.0 3.2 32.1 323.1 30 1 19.1 333.1 5.1 1312.1 10 2 877.4 3.1 3232.1 3.2 20 2 283.1 4.1 3154.2 3.1 30 2 2999.1 2.1 13.2 1.44 10 3 115.9 31.1 0.3 333.1 20 3 1590.8 4.5 1.3 92.1 30 3 42.0 8.2 5.23 0.2 10 4 1321.3 93.1 1.23 33.2 20 4 321.0 134.2 33.2 1.2 30 4 32.4 315.1 3.1 33.2 end label values industry vlindustry label def vlindustry 10 "fishing" 20 "forestry" 30 "manufacture", modify label values country vlcountry label def vlcountry 1 "US" 2 "Japan" 3 "Canada" 4 "China", modify
Tags: None

Joro Kolev

Join Date: Aug 2018
Posts: 3050

05 Sep 2020, 21:58

I am not sure what you are trying to do, but to mechanically achieve what you said you want to achieve:

Code:

. set obs `=_N+1'
number of observations (_N) was 12, now 13

. foreach var of varlist fish forestry manufacture etc {
  2. summ `var' if inlist(country,2,4)
  3. replace `var'=r(sum) in 13
  4. }

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        fish |          6    972.3834     1097.39       32.4     2999.1
(1 real change made)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    forestry |          6       91.95    122.6925        2.1      315.1
(1 real change made)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 manufacture |          6    1072.838     1642.61       1.23     3232.1
(1 real change made)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         etc |          6    12.55667     16.0114        1.2       33.2
(1 real change made)

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10225
#3

06 Sep 2020, 06:38

I want to add a new row that calculates sum of asian country's values so that the new observation at the last row will show the sum of asian countries's industries' values.

You have to get out of the spreadsheet mindset when working with Stata. To get what you want, create a variable holding such information.
3 likes
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

06 Sep 2020, 08:02

I agree with Andrew Musau on this. Stata has variables and observations, which have important differences from arbitrary rows and columns in Excel. Adding a summary observation to a dataset will only lead to problems when you try to do analysis using that data. You could implement Andrew's solution fairly generally with the following code.

Code:

foreach var of varlist fish forestry manufacture etc {
    egen `var'_asia = total(cond(inlist(country,2,4)),`var',0)
    egen `var'_amer = total(cond(inlist(country,1,3)),`var',0) 
}

This creates 2 new variables for each of the original variables: one with the total for the Asian countries, and the other with the total for the North American countries.

Code:

. list industry country fish fish_asia if inlist(country,2,4), sum(fish) abbreviate(12) clean

           industry   country     fish   fish_asia  
   4.       fishing     Japan    877.4      5834.3  
   5.      forestry     Japan    283.1      5834.3  
   6.   manufacture     Japan   2999.1      5834.3  
  10.       fishing     China   1321.3      5834.3  
  11.      forestry     China      321      5834.3  
  12.   manufacture     China     32.4      5834.3  
  Sum                           5834.3

Comment

Mike Lacy

Join Date: Apr 2014

Posts: 2417
#5

06 Sep 2020, 08:51

I wonder if one of the user-written input-output analysis programs for Stata might be of use to JungHwan, as from my limited knowledge of such matters, it sounds like he is trying to calculate quantities that might appear in that field:

Code:

ssc describe iot ssc describe ioic

There might be others, too, I suppose, but that's what I found.
2 likes
Comment
JungHwan Kim

Join Date: Jun 2020

Posts: 21
#6

06 Sep 2020, 09:43

Thank you very much everyone for all the answers. To think of it, I also I fully agree with Andrew Musau that I should not think speadsheet way when I use stata...! Also, I will check out the user code that Mike Lacy found!!
Comment

Announcement