Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating cluster(group) with longitude/latitude information in panel analysis

    Dear Stata users,


    I have a question which might be very unclear since even I do not really have much idea how to design my research.
    I have firm-level panel data with geographical information such as longitude/latitude/county/city/state etc.
    And I would like to generate clusters or groups, for example, group A that is clustered by all the firms within 1 mile from reference location.
    However, I do not have clear ideas on: 1) how to decide the reference location and 2) how to generate this kind of cluster by stata or which command should I use.
    I am sorry to ask this unclear questions, however, given the situation that I should conduct my research all by myself (i.e. I have no one to ask), I would like to ask here.
    Is there someone who could kindly recommend me any good reference for this type of spatial analysis with stata command ?


    Thank you very much in advance,
    A.-C






  • #2
    For each firm, you can get a list of firms all within 1 mile using the command geonear.

    Code:
    ssc install geonear, replace
    help geonear

    An illustration is #5 of https://www.statalist.org/forums/for...-with-nearstat.
    The issue that you will have to deal with is what to do if the same firm is in more than one cluster.
    Last edited by Andrew Musau; 17 Jul 2022, 11:08.

    Comment


    • #3
      Andrew Musau Thank you very much for your information and reply !
      I have read the post that you have mentioned, however, I am a bit confused.
      In that case #5, it has two seperate identifier (gas station and brand) therefore, the code has been written with two identifiers (idiempianto and bandier).
      However, in my case, I would like to define clusters according to firm identifier - such as, gvkey (compustat).
      In other words, I would like to group firms (identified as gvkey) within %%km from a reference firm (gvkey).
      Therefore, my question is, when using - geonear - command, I should also define neighbor as well, but is it still possible to use the code in #5 with gvkey as neighbor as well?
      Sorry if my question was unclear, and thank you very much again !

      Comment


      • #4
        In detail,
        I have tried the following code:

        tempfile Master names names2
        preserve
        rename (gvkey latitude longitude) (nid nlat nlon)
        save `Master', replace
        keep nid conm
        save `names', replace
        rename (nid conm) (gvkey name)
        save `names2', replace
        restore

        geonear gvkey latitude longitude using "`Master'", n(nid nlat nlon) ign long within(1) near(0)
        merge m:1 nid using `names', keep(master match) nogen
        merge m:1 gvkey using `names2', keep(master match) nogen
        order gvkey name nid conm

        And I have the following error:
        latitude or longitude not constant within gvkey group

        Comment


        • #5
          It is not unusual for firms to move from one location to another over time, so my guess is that is what the error message is indicating. Therefore, you may need to use both firm and year to uniquely identify the physical address of a firm. If you do so and proceed to create clusters, you will have a situation where panels are not nested within clusters. To confirm whether the issue is firms moving, you can:

          Code:
          bys gvkey (latitude longitude): gen tag= latitude[1]!=latitude[_N]|longitude[1]!=longitude[_N]
          sort gvkey year
          browse if tag

          Comment


          • #6
            Andrew Musau Thanks ! I see that there are lots of firms that have changed their location.
            But in this case, I am not sure how to use the code in #5 with my four geographical variables (longitude_0 and latitude_0 for tag == 0; longitude_1 and latitude_1 for tag == 1)

            Comment


            • #7
              Present a data example using dataex. 3-4 observations of 4-5 firms, a couple of which have moved should do.

              Comment


              • #8
                Actually, I'm afraid that I would not be able to present the example since there is confidential issue here..
                But my dataset is from Compustat + USPTO (patent-level) merged by gvkey (firm identifier) variable.
                So since it's firm-level + patent-level panel data, it does not have unique firm year identifier (i.e. they have sometimes 2 observations for year 1987 for firm A).

                Comment


                • #9
                  The issue of confidential data is covered in FAQ Advice #12: https://www.statalist.org/forums/help#stata

                  Comment


                  • #10
                    Following is the example of my dataset:

                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input int gvkey double year long patent_num str100 city str8 state_code str33 county float(latitude longitude)
                    12 1974 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1975 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1975 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1976 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1977 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1978 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1979 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1980 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1981 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1982 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1983 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1984 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1985 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1986 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1987 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1988 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    12 1989 123456 "Lincoln"      "RI" "Providence" 41.96888 -71.47411
                    13 1974 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1975 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1976 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1977 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1978 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1978 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1979 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1980 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1980 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1980 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1981 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1981 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1982 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1982 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1982 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1983 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1983 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1983 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1983 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1984 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1984 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1984 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1984 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1984 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1984 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1984 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1984 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1984 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1985 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1985 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1985 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1985 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1985 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1985 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1985 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1985 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1985 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1986 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1986 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1986 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1986 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1986 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1986 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1986 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1986 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1986 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1987 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1987 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1987 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1987 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1988 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1988 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1988 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1988 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1988 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1988 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1989 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1989 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1989 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1989 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1989 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1990 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1991 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1991 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1991 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1991 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1991 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1991 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1991 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1991 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    13 1991 123456 "Eden Prairie" "MN" "Hennepin"   44.95306 -93.57611
                    end

                    Comment


                    • #11
                      Still this is not helpful to me because I need to do a lot of data preparation before demonstration.

                      Code:
                      preserve
                      bys gvkey (latitude longitude): gen tag= latitude[1]!=latitude[_N]|longitude[1]!=longitude[_N]
                      bys gvkey (year): keep if _n<=5 & tag
                      dataex
                      restore

                      Comment


                      • #12
                        I got this result:

                        Code:
                        * Example generated by -dataex-. For more info, type help dataex
                        clear
                        input int gvkey double year long patent_num str100 city str8 state_code str33 county float(latitude longitude tag)
                         0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
                         0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
                         0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
                         0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
                         0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
                         1 1950 123456 "Maumee"          "OH" "Lucas"          41.70798  -83.70683 1
                         1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1
                         1 1950 123456 "Maumee"          "OH" "Lucas"          41.70798  -83.70683 1
                         1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1
                         1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1
                         2 1950 123456 "Waukegan"        "IL" "Lake"           42.16139  -88.13834 1
                         2 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1
                         2 1950 123456 "Naugatuck"       "CT" "New Haven"      41.28052 -72.874146 1
                         2 1950 123456 "Waukegan"        "IL" "Lake"           42.16139  -88.13834 1
                         2 1950 123456 "Downers Grove"   "IL" "DuPage"         41.81107  -88.02453 1
                         3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
                         3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
                         3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
                         3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
                         3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
                         4 1950 123456 "Memphis"         "TN" "Shelby"          35.0337   -89.9343 1
                         4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1
                         4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1
                         4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1
                         4 1950 123456 "Memphis"         "TN" "Shelby"          35.0337   -89.9343 1
                         5 1950 123456 "Dayton"          "OH" "Montgomery"     39.84139  -84.41647 1
                         5 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1
                         5 1950 123456 "Schaumburg"      "IL" "Cook"           41.65819  -87.67947 1
                         5 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1
                         5 1950 123456 "La Jolla"        "CA" "San Diego"      33.02935 -116.85355 1
                         6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1
                         6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1
                         6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1
                         6 1950 123456 "Palo Alto"       "CA" "Santa Clara"   37.444324  -122.1497 1
                         6 1950 123456 "Racine"          "WI" "Racine"         42.69673  -87.90308 1
                         7 1950 123456 "Hoffman Estates" "IL" "Cook"           41.65819  -87.67947 1
                         7 1950 123456 "Cleveland"       "OH" "Cuyahoga"       41.45346  -81.92177 1
                         7 1950 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1
                         7 1950 123456 "Denver"          "CO" "Denver"          39.7507   -104.989 1
                         7 1951 123456 "Hoffman Estates" "IL" "Cook"           41.65819  -87.67947 1
                         8 1950 123456 "Culver City"     "CA" "Los Angeles"    34.26187 -118.45866 1
                         8 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1
                         8 1950 123456 "Harrisburg"      "PA" "Dauphin"         40.2782  -76.70937 1
                         8 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                         8 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1
                         9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                         9 1950 123456 "Austin"          "MN" "Freeborn"       43.75284  -93.16799 1
                         9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                         9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                         9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                        10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                        10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                        10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                        10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                        10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                        11 1950 123456 ""                "TX" ""                      .          . 1
                        11 1950 123456 "Downers Grove"   "IL" "DuPage"         41.81107  -88.02453 1
                        11 1950 123456 "Buffalo"         "NY" "Erie"             42.834  -78.63425 1
                        11 1950 123456 "Oak Brook"       "IL" "DuPage"          41.8364  -87.95317 1
                        11 1950 123456 "White Pine"      "MI" "Ontonagon"      46.73806  -89.17944 1
                        12 1950 123456 "Brentwood"       "TN" "Davidson"      36.260387  -86.70456 1
                        12 1950 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1
                        12 1951 123456 "Brentwood"       "TN" "Davidson"      36.260387  -86.70456 1
                        12 1951 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1
                        12 1952 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1
                        13 1950 123456 "Parsippany"      "NJ" "Morris"          40.8819  -74.62099 1
                        13 1950 123456 "Saint Paul"      "MN" "Ramsey"         44.96996  -93.08317 1
                        13 1950 123456 "Clarks Summit"   "PA" "Lackawanna"     41.34319 -75.530136 1
                        13 1950 123456 ""                ""   ""                      .          . 1
                        13 1951 123456 "Clarks Summit"   "PA" "Lackawanna"      41.3731   -75.6841 1
                        14 1950 123456 "Richmond"        "VA" "Richmond city"         .          . 1
                        14 1950 123456 "Atlanta"         "GA" "DeKalb"        33.888504  -84.28954 1
                        14 1950 123456 "Atlanta"         "GA" "Fulton"        34.040833   -84.3859 1
                        14 1950 123456 "Carteret"        "NJ" "Middlesex"      40.33243  -74.56883 1
                        14 1950 123456 "Richmond"        "VA" "Richmond city"         .          . 1
                        15 1950 123456 ""                ""   ""                      .          . 1
                        15 1950 123456 ""                ""   ""                      .          . 1
                        15 1950 123456 ""                ""   ""                      .          . 1
                        15 1950 123456 "Dayton"          "OH" "Warren"         39.37145  -84.21078 1
                        15 1950 123456 ""                ""   ""                      .          . 1
                        16 1950 123456 "New Britain"     "CT" "Hartford"       41.92608  -72.64576 1
                        16 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1
                        16 1950 123456 "Columbus"        "OH" "Franklin"       40.03219  -83.13834 1
                        16 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1
                        16 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1
                        17 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                        17 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                        17 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                        17 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                        17 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                        18 1950 123456 "Upland"          "IN" "Grant"           40.5112  -85.82655 1
                        18 1950 123456 "El Segundo"      "CA" "Los Angeles"    34.26187 -118.45866 1
                        18 1950 123456 "El Segundo"      "CA" "Los Angeles"    34.26187 -118.45866 1
                        18 1950 123456 "Mckeesport"      "PA" "Allegheny"      40.59417  -79.97028 1
                        18 1950 123456 "Oakland"         "CA" ""                      .          . 1
                        19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1
                        19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1
                        19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1
                        19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1
                        19 1950 123456 "Cupertino"       "CA" "Santa Clara"   37.444324  -122.1497 1
                        end

                        Comment


                        • #13
                          This is fine.

                          Comment


                          • #14
                            You have quite a number of duplicates, but I think that your firms (branches) are identified by gvkey, latitude and longitude. This will give you the list of firms within a mile of each firm. Then you will need some more work to create clusters, including resolving the fact that some firms are in more than one cluster. We treat a firm that has moved as a new firm, and the implication is that when forming clusters, firms that moved will be grouped in more than one cluster.

                            Code:
                            * Example generated by -dataex-. For more info, type help dataex
                            clear
                            input int gvkey double year long patent_num str100 city str8 state_code str33 county float(latitude longitude tag)
                             0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
                             0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
                             0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
                             0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
                             0 1950 123456 "Charlotte"       "NC" "Mecklenburg"   35.410557  -80.84306 1
                             1 1950 123456 "Maumee"          "OH" "Lucas"          41.70798  -83.70683 1
                             1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1
                             1 1950 123456 "Maumee"          "OH" "Lucas"          41.70798  -83.70683 1
                             1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1
                             1 1950 123456 "Naperville"      "IL" "DuPage"         41.79007  -88.20559 1
                             2 1950 123456 "Waukegan"        "IL" "Lake"           42.16139  -88.13834 1
                             2 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1
                             2 1950 123456 "Naugatuck"       "CT" "New Haven"      41.28052 -72.874146 1
                             2 1950 123456 "Waukegan"        "IL" "Lake"           42.16139  -88.13834 1
                             2 1950 123456 "Downers Grove"   "IL" "DuPage"         41.81107  -88.02453 1
                             3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
                             3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
                             3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
                             3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
                             3 1950 123456 "Irving"          "TX" "Dallas"          32.9027   -96.5636 1
                             4 1950 123456 "Memphis"         "TN" "Shelby"          35.0337   -89.9343 1
                             4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1
                             4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1
                             4 1950 123456 "Blue Bell"       "PA" "Montgomery"      40.3128  -75.32134 1
                             4 1950 123456 "Memphis"         "TN" "Shelby"          35.0337   -89.9343 1
                             5 1950 123456 "Dayton"          "OH" "Montgomery"     39.84139  -84.41647 1
                             5 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1
                             5 1950 123456 "Schaumburg"      "IL" "Cook"           41.65819  -87.67947 1
                             5 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1
                             5 1950 123456 "La Jolla"        "CA" "San Diego"      33.02935 -116.85355 1
                             6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1
                             6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1
                             6 1950 123456 "Stamford"        "CT" "Fairfield"      41.25555  -73.43528 1
                             6 1950 123456 "Palo Alto"       "CA" "Santa Clara"   37.444324  -122.1497 1
                             6 1950 123456 "Racine"          "WI" "Racine"         42.69673  -87.90308 1
                             7 1950 123456 "Hoffman Estates" "IL" "Cook"           41.65819  -87.67947 1
                             7 1950 123456 "Cleveland"       "OH" "Cuyahoga"       41.45346  -81.92177 1
                             7 1950 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1
                             7 1950 123456 "Denver"          "CO" "Denver"          39.7507   -104.989 1
                             7 1951 123456 "Hoffman Estates" "IL" "Cook"           41.65819  -87.67947 1
                             8 1950 123456 "Culver City"     "CA" "Los Angeles"    34.26187 -118.45866 1
                             8 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1
                             8 1950 123456 "Harrisburg"      "PA" "Dauphin"         40.2782  -76.70937 1
                             8 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                             8 1950 123456 "Chicago"         "IL" "Cook"           41.65819  -87.67947 1
                             9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                             9 1950 123456 "Austin"          "MN" "Freeborn"       43.75284  -93.16799 1
                             9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                             9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                             9 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                            10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                            10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                            10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                            10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                            10 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                            11 1950 123456 ""                "TX" ""                      .          . 1
                            11 1950 123456 "Downers Grove"   "IL" "DuPage"         41.81107  -88.02453 1
                            11 1950 123456 "Buffalo"         "NY" "Erie"             42.834  -78.63425 1
                            11 1950 123456 "Oak Brook"       "IL" "DuPage"          41.8364  -87.95317 1
                            11 1950 123456 "White Pine"      "MI" "Ontonagon"      46.73806  -89.17944 1
                            12 1950 123456 "Brentwood"       "TN" "Davidson"      36.260387  -86.70456 1
                            12 1950 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1
                            12 1951 123456 "Brentwood"       "TN" "Davidson"      36.260387  -86.70456 1
                            12 1951 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1
                            12 1952 123456 "Pittsburgh"      "PA" "Allegheny"      40.59417  -79.97028 1
                            13 1950 123456 "Parsippany"      "NJ" "Morris"          40.8819  -74.62099 1
                            13 1950 123456 "Saint Paul"      "MN" "Ramsey"         44.96996  -93.08317 1
                            13 1950 123456 "Clarks Summit"   "PA" "Lackawanna"     41.34319 -75.530136 1
                            13 1950 123456 ""                ""   ""                      .          . 1
                            13 1951 123456 "Clarks Summit"   "PA" "Lackawanna"      41.3731   -75.6841 1
                            14 1950 123456 "Richmond"        "VA" "Richmond city"         .          . 1
                            14 1950 123456 "Atlanta"         "GA" "DeKalb"        33.888504  -84.28954 1
                            14 1950 123456 "Atlanta"         "GA" "Fulton"        34.040833   -84.3859 1
                            14 1950 123456 "Carteret"        "NJ" "Middlesex"      40.33243  -74.56883 1
                            14 1950 123456 "Richmond"        "VA" "Richmond city"         .          . 1
                            15 1950 123456 ""                ""   ""                      .          . 1
                            15 1950 123456 ""                ""   ""                      .          . 1
                            15 1950 123456 ""                ""   ""                      .          . 1
                            15 1950 123456 "Dayton"          "OH" "Warren"         39.37145  -84.21078 1
                            15 1950 123456 ""                ""   ""                      .          . 1
                            16 1950 123456 "New Britain"     "CT" "Hartford"       41.92608  -72.64576 1
                            16 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1
                            16 1950 123456 "Columbus"        "OH" "Franklin"       40.03219  -83.13834 1
                            16 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1
                            16 1950 123456 "New York"        "NY" "New York"       40.74838 -73.996704 1
                            17 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                            17 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                            17 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                            17 1950 123456 "Houston"         "TX" "Harris"         30.00409  -95.28248 1
                            17 1950 123456 "Philadelphia"    "PA" "Philadelphia"    40.1162   -75.0141 1
                            18 1950 123456 "Upland"          "IN" "Grant"           40.5112  -85.82655 1
                            18 1950 123456 "El Segundo"      "CA" "Los Angeles"    34.26187 -118.45866 1
                            18 1950 123456 "El Segundo"      "CA" "Los Angeles"    34.26187 -118.45866 1
                            18 1950 123456 "Mckeesport"      "PA" "Allegheny"      40.59417  -79.97028 1
                            18 1950 123456 "Oakland"         "CA" ""                      .          . 1
                            19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1
                            19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1
                            19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1
                            19 1950 123456 "Yardley"         "PA" "Bucks"          40.57824  -75.21906 1
                            19 1950 123456 "Cupertino"       "CA" "Santa Clara"   37.444324  -122.1497 1
                            end
                            
                            drop if missing(latitude)| missing(longitude)
                            duplicates drop gvkey year lat lon, force
                            egen newid= group(gvkey lat lon)
                            tempfile firms names names2
                            save `names2', replace
                            bys newid (year): keep if _n==1
                            *ssc install geonear
                            preserve
                            rename * *2
                            save `firms', replace
                            save `names', replace
                            restore
                            geonear newid latitude longitude using "`firms'", n(newid2 latitude2 longitude2) ign long within(1) near(0) miles
                            merge m:1 newid2 using `names', keep(master match) nogen
                            joinby newid using `names2'
                            sort newid2 year
                            l gvkey year newid2 latitude2 longitude2 city2  newid latitude longitude city, sepby(newid2)
                            Res.:

                            Code:
                            . l gvkey year newid2 latitude2 longitude2 city2  newid latitude longitude city, sepby(newid2)
                            
                                 +-----------------------------------------------------------------------------------------------------------------+
                                 | gvkey   year   newid2   latitu~2   longitu~2             city2   newid   latitude   longitude              city |
                                 |-----------------------------------------------------------------------------------------------------------------|
                              1. |     5   1950        5   41.65819   -87.67947           Chicago      14   41.65819   -87.67947        Schaumburg |
                              2. |     7   1950        5   41.65819   -87.67947           Chicago      21   41.65819   -87.67947   Hoffman Estates |
                              3. |     8   1950        5   41.65819   -87.67947           Chicago      25   41.65819   -87.67947           Chicago |
                              4. |     7   1951        5   41.65819   -87.67947           Chicago      21   41.65819   -87.67947   Hoffman Estates |
                                 |-----------------------------------------------------------------------------------------------------------------|
                              5. |    11   1950        6   41.81107   -88.02453     Downers Grove      29   41.81107   -88.02453     Downers Grove |
                                 |-----------------------------------------------------------------------------------------------------------------|
                              6. |    16   1950       13   40.74838    -73.9967          New York      44   40.74838    -73.9967          New York |
                                 |-----------------------------------------------------------------------------------------------------------------|
                              7. |     8   1950       14   41.65819   -87.67947        Schaumburg      25   41.65819   -87.67947           Chicago |
                              8. |     7   1950       14   41.65819   -87.67947        Schaumburg      21   41.65819   -87.67947   Hoffman Estates |
                              9. |     2   1950       14   41.65819   -87.67947        Schaumburg       5   41.65819   -87.67947           Chicago |
                             10. |     7   1951       14   41.65819   -87.67947        Schaumburg      21   41.65819   -87.67947   Hoffman Estates |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             11. |    19   1950       15   37.44432   -122.1497         Palo Alto      51   37.44432   -122.1497         Cupertino |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             12. |    12   1950       19   40.59417   -79.97028        Pittsburgh      34   40.59417   -79.97028        Pittsburgh |
                             13. |    18   1950       19   40.59417   -79.97028        Pittsburgh      50   40.59417   -79.97028        Mckeesport |
                             14. |    12   1951       19   40.59417   -79.97028        Pittsburgh      34   40.59417   -79.97028        Pittsburgh |
                             15. |    12   1952       19   40.59417   -79.97028        Pittsburgh      34   40.59417   -79.97028        Pittsburgh |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             16. |     2   1950       21   41.65819   -87.67947   Hoffman Estates       5   41.65819   -87.67947           Chicago |
                             17. |     5   1950       21   41.65819   -87.67947   Hoffman Estates      14   41.65819   -87.67947        Schaumburg |
                             18. |     8   1950       21   41.65819   -87.67947   Hoffman Estates      25   41.65819   -87.67947           Chicago |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             19. |     9   1950       22   30.00409   -95.28248           Houston      26   30.00409   -95.28248           Houston |
                             20. |    17   1950       22   30.00409   -95.28248           Houston      46   30.00409   -95.28248           Houston |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             21. |    18   1950       23   34.26187   -118.4587       Culver City      48   34.26187   -118.4587        El Segundo |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             22. |     5   1950       25   41.65819   -87.67947           Chicago      14   41.65819   -87.67947        Schaumburg |
                             23. |     2   1950       25   41.65819   -87.67947           Chicago       5   41.65819   -87.67947           Chicago |
                             24. |     7   1950       25   41.65819   -87.67947           Chicago      21   41.65819   -87.67947   Hoffman Estates |
                             25. |     7   1951       25   41.65819   -87.67947           Chicago      21   41.65819   -87.67947   Hoffman Estates |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             26. |    17   1950       26   30.00409   -95.28248           Houston      46   30.00409   -95.28248           Houston |
                             27. |     8   1950       26   30.00409   -95.28248           Houston      22   30.00409   -95.28248           Houston |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             28. |    17   1950       28    40.1162    -75.0141      Philadelphia      47    40.1162    -75.0141      Philadelphia |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             29. |     2   1950       29   41.81107   -88.02453     Downers Grove       6   41.81107   -88.02453     Downers Grove |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             30. |    18   1950       34   40.59417   -79.97028        Pittsburgh      50   40.59417   -79.97028        Mckeesport |
                             31. |     7   1950       34   40.59417   -79.97028        Pittsburgh      19   40.59417   -79.97028        Pittsburgh |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             32. |     5   1950       44   40.74838    -73.9967          New York      13   40.74838    -73.9967          New York |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             33. |     9   1950       46   30.00409   -95.28248           Houston      26   30.00409   -95.28248           Houston |
                             34. |     8   1950       46   30.00409   -95.28248           Houston      22   30.00409   -95.28248           Houston |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             35. |    10   1950       47    40.1162    -75.0141      Philadelphia      28    40.1162    -75.0141      Philadelphia |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             36. |     8   1950       48   34.26187   -118.4587        El Segundo      23   34.26187   -118.4587       Culver City |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             37. |     7   1950       50   40.59417   -79.97028        Mckeesport      19   40.59417   -79.97028        Pittsburgh |
                             38. |    12   1950       50   40.59417   -79.97028        Mckeesport      34   40.59417   -79.97028        Pittsburgh |
                             39. |    12   1951       50   40.59417   -79.97028        Mckeesport      34   40.59417   -79.97028        Pittsburgh |
                             40. |    12   1952       50   40.59417   -79.97028        Mckeesport      34   40.59417   -79.97028        Pittsburgh |
                                 |-----------------------------------------------------------------------------------------------------------------|
                             41. |     6   1950       51   37.44432   -122.1497         Cupertino      15   37.44432   -122.1497         Palo Alto |
                                 +-----------------------------------------------------------------------------------------------------------------+

                            Comment


                            • #15
                              Andrew Musau Thank you very much for the code !
                              However, my dataset is identified as gvkey patent_num (patent number) year latitude longitude, therefore I have tried the following code:

                              drop if missing(latitude)| missing(longitude)
                              duplicates drop gvkey year patent_num lat lon, force
                              egen newid= group(gvkey patent_num lat lon)

                              tempfile firms names names2
                              save `names2', replace
                              bys newid (year): keep if _n==1

                              preserve
                              rename * *2
                              save `firms', replace
                              save `names', replace
                              restore

                              geonear newid latitude longitude using "`firms'", n(newid2 latitude2 longitude2) ign long within(1) near(0) miles
                              merge m:1 newid2 using `names', keep(master match) nogen
                              joinby newid using `names2'
                              sort newid2 year
                              l gvkey patent_num year newid2 latitude2 longitude2 city2 newid latitude longitude city, sepby(newid2)


                              But the error came up during -geonear- command:
                              no room to add more observations
                              Up to 2,147,483,620 observations are allowed if you have enough memory.

                              I am using Stata IC 16.1 and I am not sure if I could run the code.
                              For more information, before running geonear code (until egen newid= group(gvkey patent_num lat lon)), I have 2,062,772 observations.
                              Would there be any solutions for this issue ?

                              Comment

                              Working...
                              X