Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating customised summary outputs for filtered subsets of main dataset

    Hi all,

    I'm new here and new to Stata. I'm having trouble filtering and operating on my dataset. Below is an example of my data set:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str12 Country str3 CountryCode str11 Entity str320 SectorProduct double(Scope1tCO2eqper1US TotalScope3tCO2eqper1US) str4 Year str1 GoodServiceDuplicate str139 ISIC str162 Description str168 Link
    "Argentina"    "ARG" "Commodities" "Wholesale trade services"                                                                                                                       0  1554.578496209093 "1990" "S" "" "" ""
    "Australia"    "AUS" "Industries"  "Wholesale trade"                                                                                                                 815.432619769583                  0 "1990" "S" "" "" ""
    "Australia"    "AUS" "Commodities" "Wholesale trade"                                                                                                                                0  1710.863990448762 "1990" "S" "" "" ""
    "France"       "FRA" "Industries"  "Wholesale trade and commission trade, except of motor vehicles and motorcycles"                                                 247.4499602260693                  0 "1990" "S" "" "" ""
    "France"       "FRA" "Commodities" "Wholesale trade and commission trade services, except of motor vehicles and motorcycles"                                                        0  572.6176214984904 "1990" "S" "" "" ""
    "Germany"      "DEU" "Commodities" "Wholesale trade"                                                                                                                419.9777968538051  724.1831341358961 "1990" "S" "" "" ""
    end
    "
    I want to filter on SectorProduct depending on text. The example dataset above was filtered using "keep if strpos(lower(SectorProduct), "wholesale trade")>0" (as the full dataset is too large). Firstly, I want to filter on multiple key phrases, how would I go about doing this? I've seen examples using "foreach", however, only ever using one word to filter on. The intent of this is to create a few groups on sectors/products, aka industries. For example, I want to use the strpos function on the strings: 'coke', 'oil refining', 'oil refineries', 'fuel'.

    Secondly, I want to manipulate the dataset such that I get a summary for each year and each grouping of sectors/products (industry). Eg. columns would be Industry1_sum Industry1_stddev Industry1sum_min Industry1sum_max, for each industry.

    Currently, I am using the following commands:
    Code:
    use data
    keep if strpos(lower(SectorProduct), "wholesale trade")>0
    collapse (sum) Scope1tCO2eqper1US/20, by(Year)
    This is clearly missing stdev, min and max (I am unsure how to get these outside of the summary function - am I able to call multiple operations within collapse?) Anyway, my current thought is to do this for each industry, saving each as an individual dta file and then combining them into one at the end. However, I'm wondering if there is a nicer way to do this?

    Any help would be very much appreciated.

    Thank you!
    Last edited by Bede Denham; 30 May 2023, 08:23.

  • #2
    I'm not sure I follow what you want to do. But to the extent I do, the following may get you closer to what you want.

    1. If you read -help collapse- you will see an extensive list of statistics that can be calculated by this very flexible command. And you can use as many of them as you like in a single command. Near the end of that file you will find examples of this.

    2. Also, -command-'s -by()- option need not be restricted to a single variable. You can -collapse whatever, by(industry year)- to get summaries for each industry in each year.

    3. I'm not sure what kind of things you want to filter on other than the specific examples you show, but here's a generic approach that will give you complete flexibility:
    Code:
    local categories `""this category" "that category" "another category" "however many you want""'
    foreach c of local categories {
        command(s)_to_do_stuff_with `c'
    }
    Note: Pay careful attention to the use of compound (`" "') and ordinary (" ") double quotes in the -local- command.

    Thank you for using -dataex- on your very first post!

    Comment


    • #3
      Hi Clyde,

      Thanks for your great advice. I now have the results I wanted in a dataset, which looks like the snippet below (I couldn't get dataex to work this time sorry - too many variables!):


      Year realestate1 realestate3 realestatesd1 realestatesd3 realestatemin1 realestatemin3 realestatemax1 realestatemax3 fi1 fi3 fisd1 fisd3 fimin1 fimin3 fimax1 fimax3
      1990 252.9724 4289.404 1220.5845 37930.346 237.57091 0 3145.0765 85006.783 207.4843 463.1624 339.27851 576.02424 0 0 973.29694 1453.2918
      1991 268.2472 294.7603 1393.3905 2243.8894 226.00561 0 3528.6407 5151.4771 194.0804 409.1914 308.32499 487.4053 0 0 819.73495 1266.6358
      1992 272.0349 261.4669 1408.9538 1934.225 210.89028 0 3559.9787 4453.2979 188.6839 382.1858 310.35224 465.94519 0 0 883.16637 1381.8042
      1993 283.9643 279.3468 1503.1205 2085.1597 232.20576 0 3788.0419 4796.891 186.1284 384.4436 293.33907 459.132 0 0 836.9829 1375.2255
      1994 295.5693 288.9831 1522.8133 2157.9417 217.97097 0 3844.1126 4964.1384 199.6521 392.5906 339.70865 494.57019 0 0 1017.7664 1595.0379
      1995 264.77 264.7224 1347.824 1948.4057 194.8104 0 3422.1366 4487.8469 191.0008 377.9309 321.08788 468.12375 0 0 892.34959 1376.6525

      I am now wanting to put my results in a table for each year, but I am struggling to be able to do this in Stata16. I would like to output a table for each year in the following form using 1990 as an example:

      Industry Scope 1 Emissions Stdev min max
      Real Estate 252.9724 1220.58 237.57091 3145.0765
      FI 207.4843 etc etc etc
      x x1 value for 1990
      y y1 value for 1990
      z z1 value for 1990

      Is there a way to do this within stata? Would be best if I can loop through to create the same table for each year.

      Thanks again for your help!

      Comment


      • #4
        I'm sorry, but the results you are showing in #3 are irregularly laid out and I can't really relate the numbers to the corresponding variable names. Nor can I perceive which numbers in those results correspond to what in the example table you show at the end of the results.

        What I can perceive is that the numbers from the *1 variables seem to appear in the example table, but those from the *3 variables do not. Why is that? What do 1 and 3 signify? And why is there no 2?

        Sorry, I just can't figure out what you have, nor what you want. Please try to find a clearer way of exhibiting the data. If the *3 variables are irrelevant to creation of the table, running -dataex year *1- would probably be a good start.

        Comment

        Working...
        X