Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate cluster/group using Geonear with two seperate datasets

    Dear stata users,

    I am trying to merge two seperate datasets which are firm-level panel data and university patent-level data and they do not have variables in common (or maybe patent number but they don't share same number).
    They both have geographical information which is longitude and latitude (below are the examples of the two datasets).

    What I am trying is to do is:
    i) append/merge two datas in one Master data (this step is for the second step; but if this step is unnecessary, it's better not to append due to data limit)
    ii) use - geonear - command to generate clusters.
    : In more detail, I would like to generate clusters of firms by referencing a given university (within XX km or mile). And here, firms should be included in only one nearest cluster (referenced by university).
    iii) In the regression part, I would like to make a dummy variable of firms. For instance, 1 if a given firm has university (or universities) in same industry nearby (e.g. in XX km/mile).

    The example of my university patent-level data is as follows:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long patent_id str10 org_id str70 city str7 state_code str30 bea_code str22 county float(lat lng)
    6575965 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    4595014 "UN_208905" "Westport"     "CT" "118"       "Fairfield"       41.14343  -73.34958
    4916505 "UN_208752" "Honolulu"     "HI" "074"       "Honolulu"         21.3095   -157.863
    4282965 "UN_209262" "Binghamton"   "NY" "162"       "Broome"          42.14631  -75.88652
    5157702 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    5034515 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    5049280 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    6185469 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    7355081 "UN_209181" "Rochester"    "NY" "139"       "Monroe"           43.1683   -77.6026
    5196396 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    4824359 "UN_208853" "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    4581315 "UN_208905" "Norwalk"      "CT" "118"       "Fairfield"       41.12222  -73.43583
    4946773 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    5754575 "UN_208940" "Scottsdale"   "AZ" "128"       "Maricopa"       33.521767 -111.90492
    6502576 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    7084407 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    6130254 "UN_1647"   "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    4038143 "UN_1647"   "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    6166295 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    3948213 "UN_200429" "Des Plaines"  "IL" "032"       "Cook"            42.04673   -87.8859
    7378498 "UN_211119" "Baltimore"    "MD" "174"       "Baltimore City"  39.29463  -76.62521
    5888473 "UN_1542"   "Los Angeles"  "CA" "097"       "Los Angeles"     33.97309  -118.2479
    5037746 "UN_208905" "Westport"     "CT" "118"       "Fairfield"       41.14343  -73.34958
    4816125 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    5925818 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    4974113 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    7219249 "UN_1380"   "Albany"       "NY" "004"       "Albany"           42.6525   -73.7566
    7560287 "UN_209183" "Eugene"       "OR" "053"       "Lane"            44.07368 -123.07876
    4684891 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    7298820 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    4164214 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    6251322 "UN_209442" "Clemson"      "SC" "068"       "Pickens"         34.68306    -82.825
    5951881 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    7609220 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    4654598 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    7638300 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    5395521 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    4043934 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    4666847 "UN_208986" "Bedford"      "MA" "022"       "Middlesex"       42.48429 -71.276794
    6625135 "UN_999999" "Pittsburgh"   "PA" "129"       "Allegheny"       40.47454  -79.95252
    5452824 "UN_209262" "Binghamton"   "NY" "162"       "Broome"          42.14631  -75.88652
    7188559 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    5416115 "UN_999999" "College Park" "MD" "174"       "Prince Georges"   38.9963  -76.92989
    5888789 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    5719784 "UN_211213" "Seattle"      "WA" "152"       "King"            47.61143 -122.33046
    5912466 "UN_1542"   "Los Alamos"   "NM" "147"       "Los Alamos"      35.86632 -106.26762
    5820678 "UN_1542"   "Los Alamos"   "NM" "147"       "Los Alamos"      35.86632 -106.26762
    5448513 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    4769326 "UN_1542"   ""             "CA" "OTHER_USA" ""                       .          .
    6605202 "UN_1542"   "Los Alamos"   "NM" "147"       "Los Alamos"      35.86632 -106.26762
    6448012 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    5625137 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    5362622 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    5385541 "UN_209140" "Loma Linda"   "CA" "097"       "San Bernardino"   34.0524  -117.2618
    5032519 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    6254865 "UN_208823" "Boulder"      "CO" "045"       "Boulder"         40.04973 -105.21426
    5595761 "UN_1698"   "Norman"       "OK" "119"       "Cleveland"        35.2212   -97.4448
    6664137 "UN_207741" "Ewing"        "NJ" "118"       "Mercer"          40.23769  -74.78206
    4409910 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    5530309 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    5335598 "UN_209025" "Phoenix"      "AZ" "128"       "Maricopa"          33.451  -112.0685
    5302609 "UN_1647"   "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    5459235 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    4550187 "UN_1380"   "Albany"       "NY" "004"       "Albany"           42.6525   -73.7566
    5142559 "UN_1380"   "Albany"       "NY" "004"       "Albany"           42.6525   -73.7566
    5212072 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    5272429 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    4897444 "UN_1380"   "Albany"       "NY" "004"       "Albany"           42.6525   -73.7566
    4933639 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    6318146 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    7094205 "UN_999999" "Los Angeles"  "CA" "097"       "Los Angeles"     33.97309  -118.2479
    4431263 "UN_208905" "Norwalk"      "CT" "118"       "Fairfield"       41.12222  -73.43583
    5141851 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    6562619 "UN_211119" "Baltimore"    "MD" "174"       "Baltimore City"  39.29463  -76.62521
    5306447 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    7476384 "UN_1647"   "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    4427306 "UN_208905" "Norwalk"      "CT" "118"       "Fairfield"       41.12222  -73.43583
    5490169 "UN_209046" "Huntsville"   "AL" "076"       "Madison"        34.726868  -86.56732
    6730686 "UN_209142" "Kansas City"  "KS" "084"       "Wyandotte"       39.11573  -94.62714
    4352864 "UN_209254" "Saugus"       "CA" "097"       ""                       .          .
    4701953 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    5416115 "UN_999999" "College Park" "MD" "174"       "Prince Georges"   38.9963  -76.92989
    5849719 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    6046925 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    6222209 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    5856252 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    5747469 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    5486636 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    5238711 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    4043934 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    4871252 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    4076579 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    4874746 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    6689192 "UN_1542"   "Los Alamos"   "NM" "147"       "Los Alamos"      35.86632 -106.26762
    5049673 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    5757839 "UN_1647"   "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    6677444 "UN_209122" "Birmingham"   "AL" "019"       "Shelby"         33.401558  -86.70551
    4689399 "UN_209184" "Palo Alto"    "CA" "146"       "Santa Clara"    37.444324  -122.1497
    5650392 "UN_1698"   "Norman"       "OK" "119"       "Cleveland"        35.2212   -97.4448
    4275270 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    end

    In addition, firm-level panel data looks like:
    Code:
     
     * Example generated by -dataex-. For more info, type help dataex clear input int gvkey double year long patent_num str100 city str8 state_code str33 county float(latitude longitude tag)  0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1  0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1  0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1  0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1  0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1  1 1950 123456 "Maumee"          "OH" "Lucas"          41.70798  -83.70683 1  1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1  1 1950 123456 "Maumee"          "OH" "Lucas"          41.70798  -83.70683 1  1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1  1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1  2 1950 123456 "Waukegan"        "IL" "Lake"           42.16139  -88.13834 1  2 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1  2 1950 123456 "Naugatuck"       "CT" "New Haven"      41.28052 -72.874146 1  2 1950 123456 "Waukegan"        "IL" "Lake"           42.16139  -88.13834 1  2 1950 123456 "Downers Grove"   "IL" "DuPage"         41.81107  -88.02453 1  3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1  3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1  3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1  3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1  3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1  4 1950 123456 "Memphis"         "TN" "Shelby"          35.0337   -89.9343 1  4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1  4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1  4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1  4 1950 123456 "Memphis"         "TN" "Shelby"          35.0337   -89.9343 1  5 1950 123456 "Dayton"          "OH" "Montgomery"     39.84139  -84.41647 1  5 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1  5 1950 123456 "Schaumburg"      "IL" "Cook"           41.65819  -87.67947 1  5 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1  5 1950 123456 "La Jolla"        "CA" "San Diego"      33.02935 -116.85355 1  6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1  6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1  6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1  6 1950 123456 "Palo Alto"       "CA" "Santa Clara"   37.444324  -122.1497 1  6 1950 123456 "Racine"          "WI" "Racine"         42.69673  -87.90308 1  7 1950 123456 "Hoffman Estates" "IL" "Cook"           41.65819  -87.67947 1  7 1950 123456 "Cleveland"       "OH" "Cuyahoga"       41.45346  -81.92177 1  7 1950 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1  7 1950 123456 "Denver"          "CO" "Denver"          39.7507   -104.989 1  7 1951 123456 "Hoffman Estates" "IL" "Cook"           41.65819  -87.67947 1  8 1950 123456 "Culver City"     "CA" "Los Angeles"    34.26187 -118.45866 1  8 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1  8 1950 123456 "Harrisburg"      "PA" "Dauphin"         40.2782  -76.70937 1  8 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1  8 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1  9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1  9 1950 123456 "Austin"          "MN" "Freeborn"       43.75284  -93.16799 1  9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1  9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1  9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1 10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1 10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1 10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1 10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1 10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1 11 1950 123456 ""                "TX" ""                      .          . 1 11 1950 123456 "Downers Grove"   "IL" "DuPage"         41.81107  -88.02453 1 11 1950 123456 "Buffalo"         "NY" "Erie"             42.834  -78.63425 1 11 1950 123456 "Oak Brook"       "IL" "DuPage"          41.8364  -87.95317 1 11 1950 123456 "White Pine"      "MI" "Ontonagon"      46.73806  -89.17944 1 12 1950 123456 "Brentwood"       "TN" "Davidson"      36.260387  -86.70456 1 12 1950 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1 12 1951 123456 "Brentwood"       "TN" "Davidson"      36.260387  -86.70456 1 12 1951 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1 12 1952 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1 13 1950 123456 "Parsippany"      "NJ" "Morris"          40.8819  -74.62099 1 13 1950 123456 "Saint Paul"      "MN" "Ramsey"         44.96996  -93.08317 1 13 1950 123456 "Clarks Summit"   "PA" "Lackawanna"     41.34319 -75.530136 1 13 1950 123456 ""                ""   ""                      .          . 1 13 1951 123456 "Clarks Summit"   "PA" "Lackawanna"      41.3731   -75.6841 1 14 1950 123456 "Richmond"        "VA" "Richmond city"         .          . 1 14 1950 123456 "Atlanta"         "GA" "DeKalb"        33.888504  -84.28954 1 14 1950 123456 "Atlanta"         "GA" "Fulton"        34.040833   -84.3859 1 14 1950 123456 "Carteret"        "NJ" "Middlesex"      40.33243  -74.56883 1 14 1950 123456 "Richmond"        "VA" "Richmond city"         .          . 1 15 1950 123456 ""                ""   ""                      .          . 1 15 1950 123456 ""                ""   ""                      .          . 1 15 1950 123456 ""                ""   ""                      .          . 1 15 1950 123456 "Dayton"          "OH" "Warren"         39.37145  -84.21078 1 15 1950 123456 ""                ""   ""                      .          . 1 16 1950 123456 "New Britain"     "CT" "Hartford"       41.92608  -72.64576 1 16 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1 16 1950 123456 "Columbus"        "OH" "Franklin"       40.03219  -83.13834 1 16 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1 16 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1 17 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1 17 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1 17 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1 17 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1 17 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1 18 1950 123456 "Upland"          "IN" "Grant"           40.5112  -85.82655 1 18 1950 123456 "El Segundo"      "CA" "Los Angeles"    34.26187 -118.45866 1 18 1950 123456 "El Segundo"      "CA" "Los Angeles"    34.26187 -118.45866 1 18 1950 123456 "Mckeesport"      "PA" "Allegheny"      40.59417  -79.97028 1 18 1950 123456 "Oakland"         "CA" ""                      .          . 1 19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1 19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1 19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1 19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1 19 1950 123456 "Cupertino"       "CA" "Santa Clara"   37.444324  -122.1497 1 end

    It should be noted that in the firm-level panel data, there are several observations for a given (gvkey year longitude latitude) since they are merged with patent-level data.

    I have tried to do some steps as aforementioned, but actually I am having so much troubles..
    Could someone please help me with this problems ?


    Thank you in advance,
    AC.

  • #2
    Anne-Claire Jo

    Something like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long patent_id str10 org_id str70 city str7 state_code str30 bea_code str22 county float(lat lng)
    6575965 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    4595014 "UN_208905" "Westport"     "CT" "118"       "Fairfield"       41.14343  -73.34958
    4916505 "UN_208752" "Honolulu"     "HI" "074"       "Honolulu"         21.3095   -157.863
    4282965 "UN_209262" "Binghamton"   "NY" "162"       "Broome"          42.14631  -75.88652
    5157702 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    5034515 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    5049280 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    6185469 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    7355081 "UN_209181" "Rochester"    "NY" "139"       "Monroe"           43.1683   -77.6026
    5196396 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    4824359 "UN_208853" "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    4581315 "UN_208905" "Norwalk"      "CT" "118"       "Fairfield"       41.12222  -73.43583
    4946773 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    5754575 "UN_208940" "Scottsdale"   "AZ" "128"       "Maricopa"       33.521767 -111.90492
    6502576 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    7084407 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    6130254 "UN_1647"   "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    4038143 "UN_1647"   "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    6166295 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    3948213 "UN_200429" "Des Plaines"  "IL" "032"       "Cook"            42.04673   -87.8859
    7378498 "UN_211119" "Baltimore"    "MD" "174"       "Baltimore City"  39.29463  -76.62521
    5888473 "UN_1542"   "Los Angeles"  "CA" "097"       "Los Angeles"     33.97309  -118.2479
    5037746 "UN_208905" "Westport"     "CT" "118"       "Fairfield"       41.14343  -73.34958
    4816125 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    5925818 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    4974113 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    7219249 "UN_1380"   "Albany"       "NY" "004"       "Albany"           42.6525   -73.7566
    7560287 "UN_209183" "Eugene"       "OR" "053"       "Lane"            44.07368 -123.07876
    4684891 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    7298820 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    4164214 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    6251322 "UN_209442" "Clemson"      "SC" "068"       "Pickens"         34.68306    -82.825
    5951881 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    7609220 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    4654598 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    7638300 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    5395521 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    4043934 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    4666847 "UN_208986" "Bedford"      "MA" "022"       "Middlesex"       42.48429 -71.276794
    6625135 "UN_999999" "Pittsburgh"   "PA" "129"       "Allegheny"       40.47454  -79.95252
    5452824 "UN_209262" "Binghamton"   "NY" "162"       "Broome"          42.14631  -75.88652
    7188559 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    5416115 "UN_999999" "College Park" "MD" "174"       "Prince Georges"   38.9963  -76.92989
    5888789 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    5719784 "UN_211213" "Seattle"      "WA" "152"       "King"            47.61143 -122.33046
    5912466 "UN_1542"   "Los Alamos"   "NM" "147"       "Los Alamos"      35.86632 -106.26762
    5820678 "UN_1542"   "Los Alamos"   "NM" "147"       "Los Alamos"      35.86632 -106.26762
    5448513 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    4769326 "UN_1542"   ""             "CA" "OTHER_USA" ""                       .          .
    6605202 "UN_1542"   "Los Alamos"   "NM" "147"       "Los Alamos"      35.86632 -106.26762
    6448012 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    5625137 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    5362622 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    5385541 "UN_209140" "Loma Linda"   "CA" "097"       "San Bernardino"   34.0524  -117.2618
    5032519 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    6254865 "UN_208823" "Boulder"      "CO" "045"       "Boulder"         40.04973 -105.21426
    5595761 "UN_1698"   "Norman"       "OK" "119"       "Cleveland"        35.2212   -97.4448
    6664137 "UN_207741" "Ewing"        "NJ" "118"       "Mercer"          40.23769  -74.78206
    4409910 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    5530309 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    5335598 "UN_209025" "Phoenix"      "AZ" "128"       "Maricopa"          33.451  -112.0685
    5302609 "UN_1647"   "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    5459235 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    4550187 "UN_1380"   "Albany"       "NY" "004"       "Albany"           42.6525   -73.7566
    5142559 "UN_1380"   "Albany"       "NY" "004"       "Albany"           42.6525   -73.7566
    5212072 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    5272429 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    4897444 "UN_1380"   "Albany"       "NY" "004"       "Albany"           42.6525   -73.7566
    4933639 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    6318146 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    7094205 "UN_999999" "Los Angeles"  "CA" "097"       "Los Angeles"     33.97309  -118.2479
    4431263 "UN_208905" "Norwalk"      "CT" "118"       "Fairfield"       41.12222  -73.43583
    5141851 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    6562619 "UN_211119" "Baltimore"    "MD" "174"       "Baltimore City"  39.29463  -76.62521
    5306447 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    7476384 "UN_1647"   "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    4427306 "UN_208905" "Norwalk"      "CT" "118"       "Fairfield"       41.12222  -73.43583
    5490169 "UN_209046" "Huntsville"   "AL" "076"       "Madison"        34.726868  -86.56732
    6730686 "UN_209142" "Kansas City"  "KS" "084"       "Wyandotte"       39.11573  -94.62714
    4352864 "UN_209254" "Saugus"       "CA" "097"       ""                       .          .
    4701953 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    5416115 "UN_999999" "College Park" "MD" "174"       "Prince Georges"   38.9963  -76.92989
    5849719 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    6046925 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    6222209 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    5856252 "UN_1542"   "Oakland"      "CA" "146"       "Alameda"        37.780594 -122.21658
    5747469 "UN_1768"   "Austin"       "TX" "013"       "Williamson"      30.50597  -97.74718
    5486636 "UN_1786"   "Madison"      "WI" "101"       "Dane"              43.073   -89.3817
    5238711 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    4043934 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    4871252 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    4076579 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    4874746 "UN_208904" "Cambridge"    "MA" "022"       "Middlesex"       42.37704  -71.12561
    6689192 "UN_1542"   "Los Alamos"   "NM" "147"       "Los Alamos"      35.86632 -106.26762
    5049673 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    5757839 "UN_1647"   "Ann Arbor"    "MI" "047"       "Washtenaw"       42.27938    -83.784
    6677444 "UN_209122" "Birmingham"   "AL" "019"       "Shelby"         33.401558  -86.70551
    4689399 "UN_209184" "Palo Alto"    "CA" "146"       "Santa Clara"    37.444324  -122.1497
    5650392 "UN_1698"   "Norman"       "OK" "119"       "Cleveland"        35.2212   -97.4448
    4275270 "UN_1542"   "Berkeley"     "CA" "146"       "Alameda"          37.8691  -122.2696
    end
    duplicates drop
    drop if lat == .
    save university_data,replace
    
    * Example generated by -dataex-. For more info, type help dataex 
    clear 
    input int gvkey double year long patent_num str100 city str8 state_code str33 county float(latitude longitude tag) 
     0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1  
    0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
      0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
      0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
      0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
      1 1950 123456 "Maumee"          "OH" "Lucas"          41.70798  -83.70683 1
      1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1 
      1 1950 123456 "Maumee"          "OH" "Lucas"          41.70798  -83.70683 1
      1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1 
      1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1
      2 1950 123456 "Waukegan"        "IL" "Lake"           42.16139  -88.13834 1 
      2 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1 
      2 1950 123456 "Naugatuck"       "CT" "New Haven"      41.28052 -72.874146 1 
      2 1950 123456 "Waukegan"        "IL" "Lake"           42.16139  -88.13834 1
      2 1950 123456 "Downers Grove"   "IL" "DuPage"         41.81107  -88.02453 1 
      3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
      3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
      3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1 
      3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1 
      3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
      4 1950 123456 "Memphis"         "TN" "Shelby"          35.0337   -89.9343 1 
      4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1  
      4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1  
      4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1 
      4 1950 123456 "Memphis"         "TN" "Shelby"          35.0337   -89.9343 1 
      5 1950 123456 "Dayton"          "OH" "Montgomery"     39.84139  -84.41647 1 
      5 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1 
      5 1950 123456 "Schaumburg"      "IL" "Cook"           41.65819  -87.67947 1 
      5 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1 
      5 1950 123456 "La Jolla"        "CA" "San Diego"      33.02935 -116.85355 1  
      6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1 
      6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1 
      6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1
      6 1950 123456 "Palo Alto"       "CA" "Santa Clara"   37.444324  -122.1497 1 
      6 1950 123456 "Racine"          "WI" "Racine"         42.69673  -87.90308 1
      7 1950 123456 "Hoffman Estates" "IL" "Cook"           41.65819  -87.67947 1 
      7 1950 123456 "Cleveland"       "OH" "Cuyahoga"       41.45346  -81.92177 1 
    end
    duplicates drop
    gen firm_id = _n
    rename (patent_num city state_code county) firm_=
    save firm_data, replace
    
    use university_data
    geonear patent_id lat lng using firm_data, n(firm_id latitude longitude) within(100) long near(0)
    merge m:1 patent_id using university_data
    keep if _m == 3
    drop _merge
    merge m:1 firm_id using firm_data
    keep if _m == 3
    drop _merge

    Comment


    • #3
      Scott Merryman Hi Scott, thank you for your help !
      While running the code, I face the following error:

      . geonear patent_num latitude longitude using firm_data, n(firm_id latitude longitude) within(1) long near(0)
      latitude or longitude not constant within patent_num group

      Actually I would like to generate clusters of firms with respect to each reference university that are in same sector (variable name: bea_code) and firms included in only one cluster.

      And for more information, in the firm-level panel data, there are no unique identifier for (gvkey year). In other words, there could be several observations for firm A in year 1999 since the data is merged with patent-level data.

      Comment


      • #4
        For instance, by referencing https://www.statalist.org/forums/for...panel-analysis
        and https://www.statalist.org/forums/for...-with-nearstat , I used the following code:


        sort gvkey year

        *ssc install geonear

        drop if missing(latitude)| missing(longitude)
        duplicates drop gvkey year lat lon, force
        egen newid= group(gvkey lat lon) // unique identifier for each firm with geo info

        tempfile firms names names2
        bys newid (year): keep if _n==1

        preserve
        rename (newid latitude longitude) (nid nlat nlon)
        save `firms', replace
        keep nid org_id
        save `names', replace
        rename (nid org_id) (newid name)
        save `names2', replace
        restore
        *All firms within 1 mile
        geonear newid latitude longitude using "`firms'", n(nid nlat nlon) ign long within(1) near(0)

        merge m:1 nid using `names', keep(master match) nogen
        merge m:1 newid using `names2', keep(master match) nogen
        order newid name nid org_id

        joinby newid using `names2'
        sort newid year
        l gvkey year newid latitude longitude city , sepby(newid)




        But unfortunately, I failed to get what I wanted..

        Comment

        Working...
        X