Hi all,
I'm new here and new to Stata. I'm having trouble filtering and operating on my dataset. Below is an example of my data set:
"
I want to filter on SectorProduct depending on text. The example dataset above was filtered using "keep if strpos(lower(SectorProduct), "wholesale trade")>0" (as the full dataset is too large). Firstly, I want to filter on multiple key phrases, how would I go about doing this? I've seen examples using "foreach", however, only ever using one word to filter on. The intent of this is to create a few groups on sectors/products, aka industries. For example, I want to use the strpos function on the strings: 'coke', 'oil refining', 'oil refineries', 'fuel'.
Secondly, I want to manipulate the dataset such that I get a summary for each year and each grouping of sectors/products (industry). Eg. columns would be Industry1_sum Industry1_stddev Industry1sum_min Industry1sum_max, for each industry.
Currently, I am using the following commands:
This is clearly missing stdev, min and max (I am unsure how to get these outside of the summary function - am I able to call multiple operations within collapse?) Anyway, my current thought is to do this for each industry, saving each as an individual dta file and then combining them into one at the end. However, I'm wondering if there is a nicer way to do this?
Any help would be very much appreciated.
Thank you!
I'm new here and new to Stata. I'm having trouble filtering and operating on my dataset. Below is an example of my data set:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str12 Country str3 CountryCode str11 Entity str320 SectorProduct double(Scope1tCO2eqper1US TotalScope3tCO2eqper1US) str4 Year str1 GoodServiceDuplicate str139 ISIC str162 Description str168 Link "Argentina" "ARG" "Commodities" "Wholesale trade services" 0 1554.578496209093 "1990" "S" "" "" "" "Australia" "AUS" "Industries" "Wholesale trade" 815.432619769583 0 "1990" "S" "" "" "" "Australia" "AUS" "Commodities" "Wholesale trade" 0 1710.863990448762 "1990" "S" "" "" "" "France" "FRA" "Industries" "Wholesale trade and commission trade, except of motor vehicles and motorcycles" 247.4499602260693 0 "1990" "S" "" "" "" "France" "FRA" "Commodities" "Wholesale trade and commission trade services, except of motor vehicles and motorcycles" 0 572.6176214984904 "1990" "S" "" "" "" "Germany" "DEU" "Commodities" "Wholesale trade" 419.9777968538051 724.1831341358961 "1990" "S" "" "" "" end
I want to filter on SectorProduct depending on text. The example dataset above was filtered using "keep if strpos(lower(SectorProduct), "wholesale trade")>0" (as the full dataset is too large). Firstly, I want to filter on multiple key phrases, how would I go about doing this? I've seen examples using "foreach", however, only ever using one word to filter on. The intent of this is to create a few groups on sectors/products, aka industries. For example, I want to use the strpos function on the strings: 'coke', 'oil refining', 'oil refineries', 'fuel'.
Secondly, I want to manipulate the dataset such that I get a summary for each year and each grouping of sectors/products (industry). Eg. columns would be Industry1_sum Industry1_stddev Industry1sum_min Industry1sum_max, for each industry.
Currently, I am using the following commands:
Code:
use data keep if strpos(lower(SectorProduct), "wholesale trade")>0 collapse (sum) Scope1tCO2eqper1US/20, by(Year)
Any help would be very much appreciated.
Thank you!
Comment