Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to only keep obvs whose "industry_vertical" value matches a long list of industry verticals

    I have a list of firms, firms values, and a variable "industry_verticals" which contains thousands of different values for industry verticals. In my dataset (total of 50000 firms), i have a smaller set of sample (let's say 500 firms) whose industry verticals include around 400 different industry vertical values. I want to use this 400 values to pick out all the firms in the whole dataset (50000 firms) whose "industry_vertical" value matches one of the 400 values from the sub-sample. I am not sure if this is possible with STATA. I will appreciate any help.
    Attached Files
    Last edited by Olivia Johns; 07 Dec 2021, 05:13.

  • #2
    Code:
    use small_dataset.dta, clear
    keep industry_verticals
    by industry_verticals, sort: keep if _n==1 //possibly redundant
    merge 1:m industry_verticals using whole_dataset.dta, keep(3) nogen

    Comment


    • #3
      See https://www.stata.com/support/faqs/d...s-for-subsets/ for more on the method in #2.

      Comment

      Working...
      X