Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating dummies for whether value of a variable for a given obs has appeared before in any of multiple variables

    Hi there,
    I have a dataset on patents and their respective subclasses. Over a large number of inventors and years, my sample totals nearly 4 million patents. Each of them can have various subclasses.
    I want to identify with a dummy variable when a patent subclass first appears in an inventor's portfolio.
    Here is an example with two inventors. Each row corresponds to a different patent. The data has been sorted by date (which is in Stata format), showing the first 3 subclasses for each patent.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str11 inventor_id str8 patent_id float(date_stata class_subclass1 class_subclass2 class_subclass3 wanted_1 wanted_2 wanted_3)
    "3930300-1" "4255207"  7038 132.0609 132.1129 132.0469 1 1 1
    "3930300-1" "4272753"  7230 193.0214        .        . 1 . .
    "3930300-1" "4255209"  7294 132.0541 132.0609 132.0529 1 0 1
    "3930300-1" "4670769"  7602 132.0469 132.1401  132.047 0 1 1
    "3930300-1" "4900689" 10569 132.0541 132.0609 132.1393 0 0 1
    "3930309-1" "4211003"  6781 166.0079 104.0072 443.0042 1 1 1
    "3930309-1" "D257612"  6797 448.0214        .        . 1 . .
    "3930309-1" "4203167"  6909 119.0399  81.0251  81.0435 1 1 1
    "3930309-1" "4265017"  7189 166.0079        .        . 0 . .
    "3930309-1" "4442559"  7853  166.007 385.0057        . 1 1 .
    "3930309-1" "4389775"  7873  166.007 119.0399 448.0214 0 0 0
    end

    The last three variables are the ones I am trying to create.
    For example, in the last observation (i.e. most recent patent for the second inventor) the three dummy variables take the value of 0 because:
    - for dummy wanted_1: subclass 166.007 showed up in the inventor's previous patent under class_subclass1
    - for dummy wanted_2: subclass 119.0399 showed up in the inventor's third patent (4203167) under class_subclass1
    - for dummy wanted_3: subclass 448.0214 showed up in the inventor's second patent as its only subclass.

    The complexity here is due to the fact that the code needs to search over the three variables in all of the inventor's previous patents, then assign 0 or 1 in a specific cell.
    I have tried running all sorts of loops and using countmatch/rangestat, to no avail.

    Thanks for your help

  • #2
    Hello All,

    Can anyone help me with how to plot the placebo gaps after running a loop for the placebo in space. Thanks!

    Comment

    Working...
    X