Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching over multiple rows and criteria

    Dear all,

    I kindly ask for assistance regarding the following problem:

    The main dataset contains the market capitalization for multiple companies in each year.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int date_yr double market_cap
    1990 4256868
    1990   33000
    1990  500232
    1990 2312312
    1990 5939291
    1990   93020
    1990 5026056
    1990 4427716
    1990 3345411
    1990  200210
    1990  598601
    1990 4915502
    1991 6344427
    1991  440192
    1991 8089008
    1991 6542580
    1991 5590932
    1991 4875835
    1991 5433912
    1991 6226970
    end
    The second dataset contains the following:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year long(small_cap large_cap) byte portfolio
    1990 3906029 61106381  1
    1990 1813476  3900492  2
    1990  971589  1809083  3
    1990  603473   968324  4
    1990  360715   600414  5
    1990  211269   360000  6
    1990  130207   210884  7
    1990   71825   129887  8
    1990   33118    71792  9
    1990     199    33116 10
    1991 4748562 74008361  1
    1991 2324613  4747429  2
    1991 1267569  2321976  3
    1991  786590  1259894  4
    1991  493636   784640  5
    1991  296154   492945  6
    1991  169608   295583  7
    1991   90317   169568  8
    1991   40500    90285  9
    1991     166    40480 10
    end
    Now, based on the year and the range of the market capitalization (small_cap <= market_cap <= large_cap) I want to generate a new variable in the first data set with the corresponding portfolio number. For instance, the first company should be assigned portfolio "1", the second "10", etc.

    Thank you in advance! I appreciate any help or suggestions.

    Best,
    Philipp




  • #2
    I believe the community-contributed command -rangejoin- available at ssc fits your problem well.
    Code:
    capture ssc install rangejoin
    // Data file
    clear
    input int date_yr double market_cap
    1990 4256868
    1990   33000
    1990  500232
    1990 2312312
    1990 5939291
    1990   93020
    1990 5026056
    1990 4427716
    1990 3345411
    1990  200210
    1990  598601
    1990 4915502
    1991 6344427
    1991  440192
    1991 8089008
    1991 6542580
    1991 5590932
    1991 4875835
    1991 5433912
    1991 6226970
    end
    rename date_yr year // must be same as in file of portfolio codings
    tempfile temp
    save `temp'
    //
    clear
    // File defining portfolio codings
    clear
    input int year long(small_cap large_cap) byte portfolio
    1990 3906029 61106381  1
    1990 1813476  3900492  2
    1990  971589  1809083  3
    1990  603473   968324  4
    1990  360715   600414  5
    1990  211269   360000  6
    1990  130207   210884  7
    1990   71825   129887  8
    1990   33118    71792  9
    1990     199    33116 10
    1991 4748562 74008361  1
    1991 2324613  4747429  2
    1991 1267569  2321976  3
    1991  786590  1259894  4
    1991  493636   784640  5
    1991  296154   492945  6
    1991  169608   295583  7
    1991   90317   169568  8
    1991   40500    90285  9
    1991     166    40480 10
    end
    //
    rangejoin market_cap small_cap large_cap using `temp', by(year)
    // Some portfolio/year combinations did not occur in the example
    drop if missing(market_cap)

    Comment


    • #3
      Thank you very much! It worked flawlessly

      Comment

      Working...
      X