Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create observations (i.e. adding a row) in the dataset

    Hello everyone. Thank you for your time to read my question. The below dataset consists of countries and industries. For industries, they are both in columns and rows. This is a sample data that I created following format of the World Input and Output Dataset (please ignore the numerical values: they are randomly assigned). To summarize, I want to add a new row that calculates sum of asian country's values so that the new observation at the last row will show the sum of asian countries's industries' values. I was thinking first to make a dummy variable indicating whether the country listed in country variable is Asian country. After that, I wanted to sum the column by industry, conditioning on country being Asian country(i.e. Japan and China in this sample). However, I was not sure how I could proceed this. Summing the row seems to be much easier which can be done by egen rowtotal if... Could someone please give me adivse to creating the new observation at the last row that calculates the conditional sum of column? Thank you in advance for your help!

    Code:
    clear
    input industry    country    fish   forestry   manufacture   etc
            10    1    1294.2    112.5    324.1    1.33
            20    1    83.0    3.2    32.1        323.1
            30    1    19.1    333.1    5.1    1312.1
            10    2    877.4    3.1    3232.1    3.2
            20    2    283.1    4.1    3154.2    3.1
            30    2    2999.1    2.1    13.2    1.44
            10    3    115.9    31.1    0.3    333.1
            20    3    1590.8    4.5    1.3    92.1
            30    3    42.0    8.2    5.23    0.2    
            10    4    1321.3    93.1    1.23    33.2
            20    4    321.0    134.2    33.2    1.2
            30    4    32.4    315.1    3.1    33.2
    end
    
    label values industry vlindustry
    label def vlindustry 10 "fishing" 20 "forestry" 30 "manufacture", modify
    label values country vlcountry
    label def vlcountry 1 "US" 2 "Japan" 3 "Canada" 4 "China", modify

  • #2
    I am not sure what you are trying to do, but to mechanically achieve what you said you want to achieve:
    Code:
    . set obs `=_N+1'
    number of observations (_N) was 12, now 13
    
    . foreach var of varlist fish forestry manufacture etc {
      2. summ `var' if inlist(country,2,4)
      3. replace `var'=r(sum) in 13
      4. }
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
            fish |          6    972.3834     1097.39       32.4     2999.1
    (1 real change made)
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
        forestry |          6       91.95    122.6925        2.1      315.1
    (1 real change made)
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
     manufacture |          6    1072.838     1642.61       1.23     3232.1
    (1 real change made)
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
             etc |          6    12.55667     16.0114        1.2       33.2
    (1 real change made)

    Comment


    • #3
      I want to add a new row that calculates sum of asian country's values so that the new observation at the last row will show the sum of asian countries's industries' values.
      You have to get out of the spreadsheet mindset when working with Stata. To get what you want, create a variable holding such information.

      Comment


      • #4
        I agree with Andrew Musau on this. Stata has variables and observations, which have important differences from arbitrary rows and columns in Excel. Adding a summary observation to a dataset will only lead to problems when you try to do analysis using that data. You could implement Andrew's solution fairly generally with the following code.
        Code:
        foreach var of varlist fish forestry manufacture etc {
            egen `var'_asia = total(cond(inlist(country,2,4)),`var',0)
            egen `var'_amer = total(cond(inlist(country,1,3)),`var',0) 
        }
        This creates 2 new variables for each of the original variables: one with the total for the Asian countries, and the other with the total for the North American countries.
        Code:
        . list industry country fish fish_asia if inlist(country,2,4), sum(fish) abbreviate(12) clean
        
                   industry   country     fish   fish_asia  
           4.       fishing     Japan    877.4      5834.3  
           5.      forestry     Japan    283.1      5834.3  
           6.   manufacture     Japan   2999.1      5834.3  
          10.       fishing     China   1321.3      5834.3  
          11.      forestry     China      321      5834.3  
          12.   manufacture     China     32.4      5834.3  
          Sum                           5834.3

        Comment


        • #5
          I wonder if one of the user-written input-output analysis programs for Stata might be of use to JungHwan, as from my limited knowledge of such matters, it sounds like he is trying to calculate quantities that might appear in that field:
          Code:
          ssc describe iot
          ssc describe ioic
          There might be others, too, I suppose, but that's what I found.

          Comment


          • #6
            Thank you very much everyone for all the answers. To think of it, I also I fully agree with Andrew Musau that I should not think speadsheet way when I use stata...! Also, I will check out the user code that Mike Lacy found!!

            Comment

            Working...
            X