Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to find the closest value?

    Hi Statalist,

    I have a dataset, which looks like below, and I am trying to find the closest value to GG from G. I tried joinby, but I didn't get what I expected. Any insight would be appreciated! Thanks in advance
    GG rank B Bpdf bidder G A Apdf
    0.0010542965028435 1 0.725714683532715 0.12416495019745 2 0.000522602582350373 0.0764176174998283 0.222250495918054
    0.00210970686748624 2 0.733580112457275 0.146774161154414 2 0.00104547862429172 0.0764176174998284 0.222496900726384
    0.00369491893798113 3 0.750326931476593 0.178342050027757 2 0.00209205248393118 0.0764176174998285 0.305401473548525
    0.00422388361766934 4 0.755616366863251 0.183626594542275 2 0.00523838074877858 0.0764176174998286 0.305401473548525

  • #2
    There is no obvious variable or set of variables to serve as a link between GG and G data, so I don't see how -joinby- would apply here. I think this requires the use of -cross-

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float gg byte rank float(b bpdf) byte bidder float(g a apdf)
    .0010542965 1 .7257147 .12416495 2 .0005226026 .07641762 .2222505
    .0021097069 2 .7335801 .14677416 2 .0010454786 .07641762 .2224969
     .003694919 3 .7503269 .17834204 2 .0020920525 .07641762 .3054015
    .0042238836 4 .7556164  .1836266 2  .005238381 .07641762 .3054015
    end
    
    gen long obs_no = _n
    preserve
    keep g obs_no
    rename * =_near
    tempfile to_join
    save `to_join'
    restore
    
    cross using `to_join'
    drop if obs_no == obs_no_near
    gen delta = abs(g-g_near)
    by obs_no (delta), sort: keep if _n == 1
    Note: I have taken closest value to mean the smallest absolute value of difference. However, sometimes by "closest" we mean a ratio closest to 1: I think the modifications to the code to do that are fairly apparent, so I don't show alternative code for that.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
    Last edited by Clyde Schechter; 20 May 2021, 22:23. Reason: Correct error in code

    Comment


    • #3
      Hi Clyde,

      Thank you for your help. I really appreciate it! One more question, once I found the closest value for GG from G, since I calculate A and Apdf using G, how can I move the corresponding A and Apdf to each G(i.e. delta in your code)?

      Sure, I will use dataex to input data. Thank you so much for you generous help again.

      Have a nice day!

      Best,
      Li
      Last edited by Li Zhang; 21 May 2021, 08:04.

      Comment


      • #4
        how can I move the corresponding A and Apdf to each G(i.e. delta in your code)?
        I don't understand what this means. Can you show what the results you want might look like in a hand-worked example of a small number of observations?

        Comment


        • #5
          Hi Clyde,

          Thank you for following up! I appreciate it!

          Also, my apologize for confusion. For example, the closest value for the GG in the 4th row in the below dataset (i.e. GG in rank 4---0.004860296) from G is the G in the 6th row(i.e. G in rank 6--0.004779) based on your previous code. The problem is that I would like to move the 6th row A and Apdf to the 4th row after G_near. Only insight would be appreciated! Thank you again for everything!

          Best,
          Li
          Code:
           Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(GG rank B) double Bpdf int nbidder float(G A) double Apdf long obs_no float G_near
          .0006062444  1 .56506693 .06985197661697763 2 .0005961253  .5158114 .23338879905282342  1 .0005961253
          .0018198377  2  .5656352 .07048425129395265 2 .0011926063 .51611066  .2342206035120657  2 .0011926063
           .003034908  3  .5714783 .07822072680483416 2 .0029841904 .51905143 .24169206890620565  3 .0029841904
           .004860296  4  .6007454 .09822996903142434 2  .003582101 .52316576  .2500061764658764  4     .004779
           .005469503  5  .6007454 .09822996903142434 2 .0041803704 .52316576  .2500061764658764  5   .00537799
           .008521154  6  .6150995 .09992972645282285 2     .004779  .5290755 .25758313289275747  6  .008378364
           .010969253  7  .6261876 .09217836630603929 2   .00537799  .5290755 .25758313289275747  7  .011387846
           .014037926  8  .6447056 .08317684287858709 2   .00597734 .52924937 .25772810704337557  8  .013802042
           .015883721  9  .6553226 .07270151660423853 2  .006577052 .52924937 .25772810704337557  9  .016828123
           .018350182 10  .6636449 .06532896861053977 2  .007177127  .5304375  .2585995719566514 10   .01925571
          end

          Comment


          • #6
            I'm still not sure what you mean by "move" the values. But take a look at this:

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float gg byte rank float(b bpdf) byte bidder float(g a apdf)
            .0010542965 1 .7257147 .12416495 2 .0005226026 .07641762 .2222505
            .0021097069 2 .7335801 .14677416 2 .0010454786 .07641762 .2224969
             .003694919 3 .7503269 .17834204 2 .0020920525 .07641762 .3054015
            .0042238836 4 .7556164  .1836266 2  .005238381 .07641762 .3054015
            end
            
            gen long obs_no = _n
            preserve
            keep g obs_no a apdf
            rename * =_near
            tempfile to_join
            save `to_join'
            restore
            
            cross using `to_join'
            drop if obs_no == obs_no_near
            gen delta = abs(g-g_near)
            by obs_no (delta), sort: keep if _n == 1
            It's a minor modification (bold face) of the earlier code and brings the values of a and apdf along with the value of g_near. Maybe this is what you are looking for? Or maybe you want to actually replace the original values of a and apdf with the values of a_near and apdf_near, respectively--if so, you can write the two replace statements to do that.

            Comment


            • #7
              Hi Clyde,

              Yes, this is what I am looking for! Thank you so much for your time and generosity. I really appreciate it!

              Have a good day!

              Best,
              Li

              Comment

              Working...
              X