Returning Cell Position Based on Other Variable

Mohana Mondal

Join Date: Oct 2019

Posts: 12
#1

Returning Cell Position Based on Other Variable

24 Oct 2019, 16:56

Hello,
I have the following example data set.

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(case_id audc11 audc12 audc13 audc14 randomnumber)
1 0 .28 .09 .07 .795571
2 .34 .28 .33 .26 .955492
3 .56 .59 .33 .81 .018467158
4 .66 .73 .7 .81 .030028736
5 1 1 1 1 .
end

The last column has the random numbers. I have to compare the random numbers row wise, with the columns audc11- audc14 value, so that Stata returns me the value in the case_id. I have to compare each row number with their respective columns.

For example, when random number is 0.795571 (1st row), I have to compare this value with those in the 1st column of audc11. If the values of the random number is the first near or equal to value to that contained in the columns, I need the corresponding case_id value to be returned.
For example, comparing 1st row of random value (0.799571), to that of those contained in 1st column, the nearest first value to 0.799571 is 0.66. Then the return value I should get is 4, because the corresponding case_id is 4.

Again,comparing 3rd row of random value (0.018467158), to that of those contained in 3rd column, the nearest first value to 0.018467158 is 0.09. Then the return value I should get is 1, because the corresponding case_id is 1.

Any help will be very helpful.

Thank you
Tags: None

David Benson

Join Date: Oct 2018
Posts: 489

24 Oct 2019, 22:36

Hi Mohana,

Others will probably come along with a much more elegant solution. But hopefully this will get you started.

I don't know if this solves your problem, but I ended up transposing the columns (variables) audc11 audc12 audc13 audc14 so that then the randomnumber variable would be on the same row (observation) that it needed to be matched to.

Code:

. list, noobs divider

  +--------------------------------------------------------+
  | case_id | audc11 | audc12 | audc13 | audc14 | random~r |
  |---------+--------+--------+--------+--------+----------|
  |       1 |      0 |    .28 |    .09 |    .07 |  .795571 |
  |       2 |    .34 |    .28 |    .33 |    .26 |  .955492 |
  |       3 |    .56 |    .59 |    .33 |    .81 | .0184672 |
  |       4 |    .66 |    .73 |     .7 |    .81 | .0300287 |
  |       5 |      1 |      1 |      1 |      1 |        . |
  +--------------------------------------------------------+

xpose, clear varname  // transposing the dataset

. list

     +-------------------------------------------------------------+
     |      v1        v2         v3         v4   v5       _varname |
     |-------------------------------------------------------------|
  1. |       1         2          3          4    5        case_id |
  2. |       0       .34        .56        .66    1         audc11 |
  3. |     .28       .28        .59        .73    1         audc12 |
  4. |     .09       .33        .33         .7    1         audc13 |
  5. |     .07       .26        .81        .81    1         audc14 |
     |-------------------------------------------------------------|
  6. | .795571   .955492   .0184672   .0300287    .   randomnumber |
     +-------------------------------------------------------------+

// Cleaning up some things
drop if _varname=="case_id"
gen case_id = _n
order case, first
drop if _varname=="randomnumber"
// At this point, you could merge the randomnumber variable in from your original dataset, matching on case_id

order randomnumber, after(case_id )
list, noobs

  +------------------------------------------------------------+
  | case_id   random~r    v1    v2    v3    v4   v5   _varname |
  |------------------------------------------------------------|
  |       1    .795571     0   .34   .56   .66    1     audc11 |
  |       2    .955492   .28   .28   .59   .73    1     audc12 |
  |       3    .018467   .09   .33   .33    .7    1     audc13 |
  |       4    .030029   .07   .26   .81   .81    1     audc14 |
  |       .          .     .     .     .     .    .            |
  +------------------------------------------------------------+

*** Calculating the absolute value of the differences
forvalues i = 1/5 {
    gen diff`i' = abs(randomnumber - v`i')
    }
    
egen min_diff = rowmin(diff*)

gen which_case = 1  // tells you which case_id matched, or whether it was audc11, audc12, etc
forvalues i = 1/5 {
    replace which_case = `i' if diff`i' == min_diff
    }


. list case_id randomnumber v* diff* min_diff which_case, noobs

  +------------------------------------------------------------------------------------------------------------------------------+
  | case_id   random~r    v1    v2    v3    v4   v5      diff1      diff2      diff3      diff4      diff5   min_diff   which_~e |
  |------------------------------------------------------------------------------------------------------------------------------|
  |       1    .795571     0   .34   .56   .66    1    .795571    .455571    .235571    .135571    .204429    .135571          4 |
  |       2    .955492   .28   .28   .59   .73    1    .675492    .675492    .365492    .225492    .044508    .044508          5 |
  |       3    .018467   .09   .33   .33    .7    1   .0715328   .3115328   .3115328   .6815328   .9815328   .0715328          1 |
  |       4    .030029   .07   .26   .81   .81    1   .0399713   .2299713   .7799713   .7799713   .9699713   .0399713          1 |
  |       .          .     .     .     .     .    .          .          .          .          .          .          .          5 |
  +------------------------------------------------------------------------------------------------------------------------------+

Note that in case_id==2, if the 0.28 had been the closest to the random number, it would chose v2 (it will choose the *last* of the variables that have the tie).

Hope that helps!

Last edited by David Benson; 24 Oct 2019, 22:45.

Comment

Mohana Mondal

Join Date: Oct 2019

Posts: 12
#3

28 Oct 2019, 16:43

Hi David,

Thank you so much for looking into this. Everything works good except for when the value has ties it chooses the last variable. For example, for case_id 2, I would want the return value 4, and not 5. However, I took your idea of taking the differences and trying it. I think this works. I am very new to Statalist. Hence, I do not have a perfect way to show you what happens to the data. If you provide some guidance, about what you used, I would be more than happy to provide you with the results I get after each step. However, I am posting the code that I use here. I am using the original data set, not transposing them for now.

gen id = _n
***Setting AUs as locals
local n 11
***Setting row numbers as locals
local a 1

#foreach i of numlist 1/4 {
summ randomnumber if id ==`i'
scalar k = r(min)
gen check_`a' = k- originau`n'
replace check_`a'= abs(check_`a')
summ check_`a'
di r(min)
gen final_`a' = id if check_`a'==r(min)
local n =`n'+1
local a =`a'+1
}

#capture drop final_fina
gen final_fina=.
foreach b of numlist 1/4 {
summ final_`b'
replace final_fina = r(min) if id ==`b'
}

Last edited by Mohana Mondal; 28 Oct 2019, 16:54.
Comment

Announcement