Dear stata users,
I am trying to merge two seperate datasets which are firm-level panel data and university patent-level data and they do not have variables in common (or maybe patent number but they don't share same number).
They both have geographical information which is longitude and latitude (below are the examples of the two datasets).
What I am trying is to do is:
i) append/merge two datas in one Master data (this step is for the second step; but if this step is unnecessary, it's better not to append due to data limit)
ii) use - geonear - command to generate clusters.
: In more detail, I would like to generate clusters of firms by referencing a given university (within XX km or mile). And here, firms should be included in only one nearest cluster (referenced by university).
iii) In the regression part, I would like to make a dummy variable of firms. For instance, 1 if a given firm has university (or universities) in same industry nearby (e.g. in XX km/mile).
The example of my university patent-level data is as follows:
In addition, firm-level panel data looks like:
It should be noted that in the firm-level panel data, there are several observations for a given (gvkey year longitude latitude) since they are merged with patent-level data.
I have tried to do some steps as aforementioned, but actually I am having so much troubles..
Could someone please help me with this problems ?
Thank you in advance,
AC.
I am trying to merge two seperate datasets which are firm-level panel data and university patent-level data and they do not have variables in common (or maybe patent number but they don't share same number).
They both have geographical information which is longitude and latitude (below are the examples of the two datasets).
What I am trying is to do is:
i) append/merge two datas in one Master data (this step is for the second step; but if this step is unnecessary, it's better not to append due to data limit)
ii) use - geonear - command to generate clusters.
: In more detail, I would like to generate clusters of firms by referencing a given university (within XX km or mile). And here, firms should be included in only one nearest cluster (referenced by university).
iii) In the regression part, I would like to make a dummy variable of firms. For instance, 1 if a given firm has university (or universities) in same industry nearby (e.g. in XX km/mile).
The example of my university patent-level data is as follows:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long patent_id str10 org_id str70 city str7 state_code str30 bea_code str22 county float(lat lng) 6575965 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 4595014 "UN_208905" "Westport" "CT" "118" "Fairfield" 41.14343 -73.34958 4916505 "UN_208752" "Honolulu" "HI" "074" "Honolulu" 21.3095 -157.863 4282965 "UN_209262" "Binghamton" "NY" "162" "Broome" 42.14631 -75.88652 5157702 "UN_1786" "Madison" "WI" "101" "Dane" 43.073 -89.3817 5034515 "UN_1786" "Madison" "WI" "101" "Dane" 43.073 -89.3817 5049280 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 6185469 "UN_1768" "Austin" "TX" "013" "Williamson" 30.50597 -97.74718 7355081 "UN_209181" "Rochester" "NY" "139" "Monroe" 43.1683 -77.6026 5196396 "UN_208904" "Cambridge" "MA" "022" "Middlesex" 42.37704 -71.12561 4824359 "UN_208853" "Ann Arbor" "MI" "047" "Washtenaw" 42.27938 -83.784 4581315 "UN_208905" "Norwalk" "CT" "118" "Fairfield" 41.12222 -73.43583 4946773 "UN_208904" "Cambridge" "MA" "022" "Middlesex" 42.37704 -71.12561 5754575 "UN_208940" "Scottsdale" "AZ" "128" "Maricopa" 33.521767 -111.90492 6502576 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 7084407 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 6130254 "UN_1647" "Ann Arbor" "MI" "047" "Washtenaw" 42.27938 -83.784 4038143 "UN_1647" "Ann Arbor" "MI" "047" "Washtenaw" 42.27938 -83.784 6166295 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 3948213 "UN_200429" "Des Plaines" "IL" "032" "Cook" 42.04673 -87.8859 7378498 "UN_211119" "Baltimore" "MD" "174" "Baltimore City" 39.29463 -76.62521 5888473 "UN_1542" "Los Angeles" "CA" "097" "Los Angeles" 33.97309 -118.2479 5037746 "UN_208905" "Westport" "CT" "118" "Fairfield" 41.14343 -73.34958 4816125 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 5925818 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 4974113 "UN_208904" "Cambridge" "MA" "022" "Middlesex" 42.37704 -71.12561 7219249 "UN_1380" "Albany" "NY" "004" "Albany" 42.6525 -73.7566 7560287 "UN_209183" "Eugene" "OR" "053" "Lane" 44.07368 -123.07876 4684891 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 7298820 "UN_1786" "Madison" "WI" "101" "Dane" 43.073 -89.3817 4164214 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 6251322 "UN_209442" "Clemson" "SC" "068" "Pickens" 34.68306 -82.825 5951881 "UN_208904" "Cambridge" "MA" "022" "Middlesex" 42.37704 -71.12561 7609220 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 4654598 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 7638300 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 5395521 "UN_1768" "Austin" "TX" "013" "Williamson" 30.50597 -97.74718 4043934 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 4666847 "UN_208986" "Bedford" "MA" "022" "Middlesex" 42.48429 -71.276794 6625135 "UN_999999" "Pittsburgh" "PA" "129" "Allegheny" 40.47454 -79.95252 5452824 "UN_209262" "Binghamton" "NY" "162" "Broome" 42.14631 -75.88652 7188559 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 5416115 "UN_999999" "College Park" "MD" "174" "Prince Georges" 38.9963 -76.92989 5888789 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 5719784 "UN_211213" "Seattle" "WA" "152" "King" 47.61143 -122.33046 5912466 "UN_1542" "Los Alamos" "NM" "147" "Los Alamos" 35.86632 -106.26762 5820678 "UN_1542" "Los Alamos" "NM" "147" "Los Alamos" 35.86632 -106.26762 5448513 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 4769326 "UN_1542" "" "CA" "OTHER_USA" "" . . 6605202 "UN_1542" "Los Alamos" "NM" "147" "Los Alamos" 35.86632 -106.26762 6448012 "UN_1786" "Madison" "WI" "101" "Dane" 43.073 -89.3817 5625137 "UN_1786" "Madison" "WI" "101" "Dane" 43.073 -89.3817 5362622 "UN_1768" "Austin" "TX" "013" "Williamson" 30.50597 -97.74718 5385541 "UN_209140" "Loma Linda" "CA" "097" "San Bernardino" 34.0524 -117.2618 5032519 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 6254865 "UN_208823" "Boulder" "CO" "045" "Boulder" 40.04973 -105.21426 5595761 "UN_1698" "Norman" "OK" "119" "Cleveland" 35.2212 -97.4448 6664137 "UN_207741" "Ewing" "NJ" "118" "Mercer" 40.23769 -74.78206 4409910 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 5530309 "UN_1768" "Austin" "TX" "013" "Williamson" 30.50597 -97.74718 5335598 "UN_209025" "Phoenix" "AZ" "128" "Maricopa" 33.451 -112.0685 5302609 "UN_1647" "Ann Arbor" "MI" "047" "Washtenaw" 42.27938 -83.784 5459235 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 4550187 "UN_1380" "Albany" "NY" "004" "Albany" 42.6525 -73.7566 5142559 "UN_1380" "Albany" "NY" "004" "Albany" 42.6525 -73.7566 5212072 "UN_1768" "Austin" "TX" "013" "Williamson" 30.50597 -97.74718 5272429 "UN_1786" "Madison" "WI" "101" "Dane" 43.073 -89.3817 4897444 "UN_1380" "Albany" "NY" "004" "Albany" 42.6525 -73.7566 4933639 "UN_1768" "Austin" "TX" "013" "Williamson" 30.50597 -97.74718 6318146 "UN_1786" "Madison" "WI" "101" "Dane" 43.073 -89.3817 7094205 "UN_999999" "Los Angeles" "CA" "097" "Los Angeles" 33.97309 -118.2479 4431263 "UN_208905" "Norwalk" "CT" "118" "Fairfield" 41.12222 -73.43583 5141851 "UN_1768" "Austin" "TX" "013" "Williamson" 30.50597 -97.74718 6562619 "UN_211119" "Baltimore" "MD" "174" "Baltimore City" 39.29463 -76.62521 5306447 "UN_1768" "Austin" "TX" "013" "Williamson" 30.50597 -97.74718 7476384 "UN_1647" "Ann Arbor" "MI" "047" "Washtenaw" 42.27938 -83.784 4427306 "UN_208905" "Norwalk" "CT" "118" "Fairfield" 41.12222 -73.43583 5490169 "UN_209046" "Huntsville" "AL" "076" "Madison" 34.726868 -86.56732 6730686 "UN_209142" "Kansas City" "KS" "084" "Wyandotte" 39.11573 -94.62714 4352864 "UN_209254" "Saugus" "CA" "097" "" . . 4701953 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 5416115 "UN_999999" "College Park" "MD" "174" "Prince Georges" 38.9963 -76.92989 5849719 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 6046925 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 6222209 "UN_1768" "Austin" "TX" "013" "Williamson" 30.50597 -97.74718 5856252 "UN_1542" "Oakland" "CA" "146" "Alameda" 37.780594 -122.21658 5747469 "UN_1768" "Austin" "TX" "013" "Williamson" 30.50597 -97.74718 5486636 "UN_1786" "Madison" "WI" "101" "Dane" 43.073 -89.3817 5238711 "UN_208904" "Cambridge" "MA" "022" "Middlesex" 42.37704 -71.12561 4043934 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 4871252 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 4076579 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 4874746 "UN_208904" "Cambridge" "MA" "022" "Middlesex" 42.37704 -71.12561 6689192 "UN_1542" "Los Alamos" "NM" "147" "Los Alamos" 35.86632 -106.26762 5049673 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 5757839 "UN_1647" "Ann Arbor" "MI" "047" "Washtenaw" 42.27938 -83.784 6677444 "UN_209122" "Birmingham" "AL" "019" "Shelby" 33.401558 -86.70551 4689399 "UN_209184" "Palo Alto" "CA" "146" "Santa Clara" 37.444324 -122.1497 5650392 "UN_1698" "Norman" "OK" "119" "Cleveland" 35.2212 -97.4448 4275270 "UN_1542" "Berkeley" "CA" "146" "Alameda" 37.8691 -122.2696 end
In addition, firm-level panel data looks like:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int gvkey double year long patent_num str100 city str8 state_code str33 county float(latitude longitude tag) 0 1950 123456 "Charlotte" "NC" "Mecklenburg" 35.410557 -80.84306 1 0 1950 123456 "Charlotte" "NC" "Mecklenburg" 35.410557 -80.84306 1 0 1950 123456 "Charlotte" "NC" "Mecklenburg" 35.410557 -80.84306 1 0 1950 123456 "Charlotte" "NC" "Mecklenburg" 35.410557 -80.84306 1 0 1950 123456 "Charlotte" "NC" "Mecklenburg" 35.410557 -80.84306 1 1 1950 123456 "Maumee" "OH" "Lucas" 41.70798 -83.70683 1 1 1950 123456 "Naperville" "IL" "DuPage" 41.79007 -88.20559 1 1 1950 123456 "Maumee" "OH" "Lucas" 41.70798 -83.70683 1 1 1950 123456 "Naperville" "IL" "DuPage" 41.79007 -88.20559 1 1 1950 123456 "Naperville" "IL" "DuPage" 41.79007 -88.20559 1 2 1950 123456 "Waukegan" "IL" "Lake" 42.16139 -88.13834 1 2 1950 123456 "Chicago" "IL" "Cook" 41.65819 -87.67947 1 2 1950 123456 "Naugatuck" "CT" "New Haven" 41.28052 -72.874146 1 2 1950 123456 "Waukegan" "IL" "Lake" 42.16139 -88.13834 1 2 1950 123456 "Downers Grove" "IL" "DuPage" 41.81107 -88.02453 1 3 1950 123456 "Irving" "TX" "Dallas" 32.9027 -96.5636 1 3 1950 123456 "Irving" "TX" "Dallas" 32.9027 -96.5636 1 3 1950 123456 "Irving" "TX" "Dallas" 32.9027 -96.5636 1 3 1950 123456 "Irving" "TX" "Dallas" 32.9027 -96.5636 1 3 1950 123456 "Irving" "TX" "Dallas" 32.9027 -96.5636 1 4 1950 123456 "Memphis" "TN" "Shelby" 35.0337 -89.9343 1 4 1950 123456 "Blue Bell" "PA" "Montgomery" 40.3128 -75.32134 1 4 1950 123456 "Blue Bell" "PA" "Montgomery" 40.3128 -75.32134 1 4 1950 123456 "Blue Bell" "PA" "Montgomery" 40.3128 -75.32134 1 4 1950 123456 "Memphis" "TN" "Shelby" 35.0337 -89.9343 1 5 1950 123456 "Dayton" "OH" "Montgomery" 39.84139 -84.41647 1 5 1950 123456 "New York" "NY" "New York" 40.74838 -73.996704 1 5 1950 123456 "Schaumburg" "IL" "Cook" 41.65819 -87.67947 1 5 1950 123456 "Chicago" "IL" "Cook" 41.65819 -87.67947 1 5 1950 123456 "La Jolla" "CA" "San Diego" 33.02935 -116.85355 1 6 1950 123456 "Stamford" "CT" "Fairfield" 41.25555 -73.43528 1 6 1950 123456 "Stamford" "CT" "Fairfield" 41.25555 -73.43528 1 6 1950 123456 "Stamford" "CT" "Fairfield" 41.25555 -73.43528 1 6 1950 123456 "Palo Alto" "CA" "Santa Clara" 37.444324 -122.1497 1 6 1950 123456 "Racine" "WI" "Racine" 42.69673 -87.90308 1 7 1950 123456 "Hoffman Estates" "IL" "Cook" 41.65819 -87.67947 1 7 1950 123456 "Cleveland" "OH" "Cuyahoga" 41.45346 -81.92177 1 7 1950 123456 "Pittsburgh" "PA" "Allegheny" 40.59417 -79.97028 1 7 1950 123456 "Denver" "CO" "Denver" 39.7507 -104.989 1 7 1951 123456 "Hoffman Estates" "IL" "Cook" 41.65819 -87.67947 1 8 1950 123456 "Culver City" "CA" "Los Angeles" 34.26187 -118.45866 1 8 1950 123456 "Chicago" "IL" "Cook" 41.65819 -87.67947 1 8 1950 123456 "Harrisburg" "PA" "Dauphin" 40.2782 -76.70937 1 8 1950 123456 "Houston" "TX" "Harris" 30.00409 -95.28248 1 8 1950 123456 "Chicago" "IL" "Cook" 41.65819 -87.67947 1 9 1950 123456 "Houston" "TX" "Harris" 30.00409 -95.28248 1 9 1950 123456 "Austin" "MN" "Freeborn" 43.75284 -93.16799 1 9 1950 123456 "Houston" "TX" "Harris" 30.00409 -95.28248 1 9 1950 123456 "Houston" "TX" "Harris" 30.00409 -95.28248 1 9 1950 123456 "Houston" "TX" "Harris" 30.00409 -95.28248 1 10 1950 123456 "Philadelphia" "PA" "Philadelphia" 40.1162 -75.0141 1 10 1950 123456 "Philadelphia" "PA" "Philadelphia" 40.1162 -75.0141 1 10 1950 123456 "Philadelphia" "PA" "Philadelphia" 40.1162 -75.0141 1 10 1950 123456 "Philadelphia" "PA" "Philadelphia" 40.1162 -75.0141 1 10 1950 123456 "Philadelphia" "PA" "Philadelphia" 40.1162 -75.0141 1 11 1950 123456 "" "TX" "" . . 1 11 1950 123456 "Downers Grove" "IL" "DuPage" 41.81107 -88.02453 1 11 1950 123456 "Buffalo" "NY" "Erie" 42.834 -78.63425 1 11 1950 123456 "Oak Brook" "IL" "DuPage" 41.8364 -87.95317 1 11 1950 123456 "White Pine" "MI" "Ontonagon" 46.73806 -89.17944 1 12 1950 123456 "Brentwood" "TN" "Davidson" 36.260387 -86.70456 1 12 1950 123456 "Pittsburgh" "PA" "Allegheny" 40.59417 -79.97028 1 12 1951 123456 "Brentwood" "TN" "Davidson" 36.260387 -86.70456 1 12 1951 123456 "Pittsburgh" "PA" "Allegheny" 40.59417 -79.97028 1 12 1952 123456 "Pittsburgh" "PA" "Allegheny" 40.59417 -79.97028 1 13 1950 123456 "Parsippany" "NJ" "Morris" 40.8819 -74.62099 1 13 1950 123456 "Saint Paul" "MN" "Ramsey" 44.96996 -93.08317 1 13 1950 123456 "Clarks Summit" "PA" "Lackawanna" 41.34319 -75.530136 1 13 1950 123456 "" "" "" . . 1 13 1951 123456 "Clarks Summit" "PA" "Lackawanna" 41.3731 -75.6841 1 14 1950 123456 "Richmond" "VA" "Richmond city" . . 1 14 1950 123456 "Atlanta" "GA" "DeKalb" 33.888504 -84.28954 1 14 1950 123456 "Atlanta" "GA" "Fulton" 34.040833 -84.3859 1 14 1950 123456 "Carteret" "NJ" "Middlesex" 40.33243 -74.56883 1 14 1950 123456 "Richmond" "VA" "Richmond city" . . 1 15 1950 123456 "" "" "" . . 1 15 1950 123456 "" "" "" . . 1 15 1950 123456 "" "" "" . . 1 15 1950 123456 "Dayton" "OH" "Warren" 39.37145 -84.21078 1 15 1950 123456 "" "" "" . . 1 16 1950 123456 "New Britain" "CT" "Hartford" 41.92608 -72.64576 1 16 1950 123456 "New York" "NY" "New York" 40.74838 -73.996704 1 16 1950 123456 "Columbus" "OH" "Franklin" 40.03219 -83.13834 1 16 1950 123456 "New York" "NY" "New York" 40.74838 -73.996704 1 16 1950 123456 "New York" "NY" "New York" 40.74838 -73.996704 1 17 1950 123456 "Houston" "TX" "Harris" 30.00409 -95.28248 1 17 1950 123456 "Houston" "TX" "Harris" 30.00409 -95.28248 1 17 1950 123456 "Philadelphia" "PA" "Philadelphia" 40.1162 -75.0141 1 17 1950 123456 "Houston" "TX" "Harris" 30.00409 -95.28248 1 17 1950 123456 "Philadelphia" "PA" "Philadelphia" 40.1162 -75.0141 1 18 1950 123456 "Upland" "IN" "Grant" 40.5112 -85.82655 1 18 1950 123456 "El Segundo" "CA" "Los Angeles" 34.26187 -118.45866 1 18 1950 123456 "El Segundo" "CA" "Los Angeles" 34.26187 -118.45866 1 18 1950 123456 "Mckeesport" "PA" "Allegheny" 40.59417 -79.97028 1 18 1950 123456 "Oakland" "CA" "" . . 1 19 1950 123456 "Yardley" "PA" "Bucks" 40.57824 -75.21906 1 19 1950 123456 "Yardley" "PA" "Bucks" 40.57824 -75.21906 1 19 1950 123456 "Yardley" "PA" "Bucks" 40.57824 -75.21906 1 19 1950 123456 "Yardley" "PA" "Bucks" 40.57824 -75.21906 1 19 1950 123456 "Cupertino" "CA" "Santa Clara" 37.444324 -122.1497 1 end
It should be noted that in the firm-level panel data, there are several observations for a given (gvkey year longitude latitude) since they are merged with patent-level data.
I have tried to do some steps as aforementioned, but actually I am having so much troubles..
Could someone please help me with this problems ?
Thank you in advance,
AC.
Comment