Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Vlookup with no exact match

    I am trying to identify the point at which a cumulative distribution exceeds a value (for a set of 11 cutoff values I am interested in).
    I have data on how many goods were sold at each price point that looks something like (but with 15000 price points and 50 goods)::
    price Volvo Toyota Ford cumpct_Volvo cumpct_Toyota cumpct_Ford
    21 1745 892 569 2.0893964 .0145218 11.392102
    22 1325 786 1256 3.3054851 .01475554 12.579115
    23 765 543 1408 4.2243122 .01521645 14.298504
    24 2100 120 786 5.266155 .02214319 16.581708
    25 1389 111 1632 6.2830139 .02457282 18.804259


    The columns 2-4 are quantities, and the columns 5-7 are cumulative percentages of total volume sold.
    I need to return the price which accounts for 5% or 10% etc of the quantities, so in the above, if I was looking for 5% for Volvo, I want a return of price=24.
    First I tried installing vlookup, but it will only do exact matches. So I tried rounding down (floor) my cumulative percentages, but then it said that they were not unique.
    (key_percent was the list of percent cutoffs I am interested in.)
    . vlookup key_percent , gen(Volvo_cutoff) key(cumpct_Volvo) value(price)
    price is unique within cumpct_Volvo;
    there are multiple observations with different price within cumpct_Volvo.


    Is there a way that I can return the first instance of a number greater than 5%, for example?
    Or is there a way to manipulate the data so that i can just summarize to get out the numbers I need?

  • #2
    How about this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte price int(Volvo Toyota Ford) double(cumpct_Volvo cumpct_Toyota cumpct_Ford)
    21 1745 892  569 2.0893964  .0145218 11.392102
    22 1325 786 1256 3.3054851 .01475554 12.579115
    23  765 543 1408 4.2243122 .01521645 14.298504
    24 2100 120  786  5.266155 .02214319 16.581708
    25 1389 111 1632 6.2830139 .02457282 18.804259
    end
    
    sort cumpct_Volvo
    summ price if cumpct_Volvo >= 5 & cumpct_Volvo[_n-1] < 5
    Note: Evidently you can embed this code in loops over the percentage threshold or the brand of car.

    In the future, please do not post the kind of data table you used in #1. It took me much longer to wrestle that data into Stata than it did to solve your problem. Please do as I have done here, and use the -dataex- command to show example data. You can install -dataex- by running -ssc install dataex-. Read -help dataex- for the simple instructions for using it. By using -dataex- you make it possible for anyone who wants to test some code on your example data to create a completely faithful replica of your data with just a simple copy/paste operation. Be helpful to those who want to be helpful to you.


    Comment


    • #3
      Thanks Clyde. That was perfect.
      I need to turn them into a table, but I think I can google how to use the return list.
      I've installed dataex for next time.

      Comment

      Working...
      X