Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding Values in a Column that are "Close"

    I have panel data that looks like the following:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long siteid str45 standardized_bank_name float year double it_budget
      1110931 "PNC BANK" 2011 231095
      1612514 "PNC BANK" 2011 121470
    131106296 "PNC BANK" 2011 103732
    139171356 "PNC BANK" 2011 142475
      1610215 "PNC BANK" 2011 259069
      1364277 "PNC BANK" 2011  34054
    139219360 "PNC BANK" 2011  36203
    121102690 "PNC BANK" 2011 147145
       562701 "PNC BANK" 2011 147145
      1201673 "PNC BANK" 2011 219348
    139219285 "PNC BANK" 2011 142475
    121102670 "PNC BANK" 2011 147145
      1991944 "PNC BANK" 2011  34054
    136213797 "PNC BANK" 2011  13139
    134158486 "PNC BANK" 2011  37388
    136213796 "PNC BANK" 2011  13139
    139142583 "PNC BANK" 2011 142486
      1219261 "PNC BANK" 2011 122156
    109015244 "PNC BANK" 2011  37388
       561789 "PNC BANK" 2011 144899
    121077855 "PNC BANK" 2011 130982
    139142920 "PNC BANK" 2011  36203
    131127152 "PNC BANK" 2011  36203
    139136225 "PNC BANK" 2011 143505
    118059480 "PNC BANK" 2011  35659
      1613308 "PNC BANK" 2011 133430
       517440 "PNC BANK" 2011 121846
      2663804 "PNC BANK" 2011 106030
    131101785 "PNC BANK" 2011 467044
    131127149 "PNC BANK" 2011 142486
      1617538 "PNC BANK" 2011 133430
      2174347 "PNC BANK" 2011 142486
      1611753 "PNC BANK" 2011 253711
      1216546 "PNC BANK" 2011 147145
      1112422 "PNC BANK" 2011 140349
    139215908 "PNC BANK" 2011 869056
    131118598 "PNC BANK" 2011 103288
      1213252 "PNC BANK" 2011 101992
    139215033 "PNC BANK" 2011  36203
    139114911 "PNC BANK" 2011 259839
     19397249 "PNC BANK" 2011  57511
    114227847 "PNC BANK" 2011  34054
      2154716 "PNC BANK" 2011 103288
    139155242 "PNC BANK" 2011  36203
    136147273 "PNC BANK" 2011 134029
    147115456 "PNC BANK" 2011  68284
    136185587 "PNC BANK" 2011  34054
    139144128 "PNC BANK" 2011 133430
    139178250 "PNC BANK" 2011 142475
    131080125 "PNC BANK" 2011  36203
      1623866 "PNC BANK" 2011 142475
      2170849 "PNC BANK" 2011 100173
    144338695 "PNC BANK" 2011 101293
      1212811 "PNC BANK" 2011 138095
      2156847 "PNC BANK" 2011 142475
    139169545 "PNC BANK" 2011 274444
    108021329 "PNC BANK" 2011  37388
    131096852 "PNC BANK" 2011 121470
    139144346 "PNC BANK" 2011 218886
    121053431 "PNC BANK" 2011 141111
    139155241 "PNC BANK" 2011 137677
    105538861 "PNC BANK" 2011  38636
    139174377 "PNC BANK" 2011 142486
    105541397 "PNC BANK" 2011  32081
    139190442 "PNC BANK" 2011 142486
    132031769 "PNC BANK" 2011  35761
    131114261 "PNC BANK" 2011 143505
    121065697 "PNC BANK" 2011 100581
      1206246 "PNC BANK" 2011 138095
      1222169 "PNC BANK" 2011 147145
    133180716 "PNC BANK" 2011  36203
    139155240 "PNC BANK" 2011 142486
    131118660 "PNC BANK" 2011 142475
    109019483 "PNC BANK" 2011  37388
    139198228 "PNC BANK" 2011 142517
    131110370 "PNC BANK" 2011 218886
      1113554 "PNC BANK" 2011 140349
      2166461 "PNC BANK" 2011 142475
      2166818 "PNC BANK" 2011 142475
    139169735 "PNC BANK" 2011 142475
      1991896 "PNC BANK" 2011 114848
    118059484 "PNC BANK" 2011  35659
      2188301 "PNC BANK" 2011 142486
    133206889 "PNC BANK" 2011  36203
    139143870 "PNC BANK" 2011 133430
      1630358 "PNC BANK" 2011 142486
    139233439 "PNC BANK" 2011  36203
    139234476 "PNC BANK" 2011 142486
      1634286 "PNC BANK" 2011 142486
    136236249 "PNC BANK" 2011 134029
      2189054 "PNC BANK" 2011 137677
    131124327 "PNC BANK" 2011 137677
    118044046 "PNC BANK" 2011 111828
    118056318 "PNC BANK" 2011 133606
    105308028 "PNC BANK" 2011  32081
      1215045 "PNC BANK" 2011 147145
      2166472 "PNC BANK" 2011 121470
    114252276 "PNC BANK" 2011 170597
      1611747 "PNC BANK" 2011 265016
    139144286 "PNC BANK" 2011 148155
    end
    What I am trying to do is figure out which numbers in the column "it_budget" are "close" to each other. More specifically, I want to set a threshold that would be indicative of two numbers being close enough that I would call them the "same". However, I do not want to loop through the numbers and check the differences one-by-one. Is there a way to find which numbers in that column are within, let's say, 5% of each other without looping through each observation and mark them with a dummy called "close"?

  • #2
    I don't see how creating a variable close will work here. It will tell you that a given bank has some other bank whose it budget is close to its own, but it won't tell you how many, nor which one(s). It makes more sense to link each bank to all the others that have an it budget within 5% of theirs. So I would think this is your best bet:
    Code:
    tempfile copy
    save `copy'
    
    gen ll = 0.95*it_budget
    gen ul = 1.05*it_budget
    
    rangejoin it_budget ll ul using `copy'
    sort siteid it_budget_U
    -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    That said, if you really want a dichotomous indicator with no other information:
    Code:
    gen ll = 0.95*it_budget
    gen ul = 1.05*it_budget
    
    rangestat (count) close = it_budget, interval(it_budget ll ul) excludeself
    replace close == !!close

    Comment

    Working...
    X