Use elements of a local as argument for inlist function

Roberto Liebscher

Join Date: Mar 2014

Posts: 92
#1

Use elements of a local as argument for inlist function

27 Jul 2014, 06:49

Dear Statalisters,

this might be obvious for a more advanced Stata user but still I cannot manage to solve the issue. I would like to check for any observation whether or not the value of a certain variable is element of a local. To make things clearer I add a hypothetical example here:

Code:

clear all input id var1 var2 1 3 2 1 4 5 2 6 4 2 2 1 2 4 1 3 4 5 4 9 3 end levelsof var1, local(mylev) gen var3 = inlist(var2, `mylev')

Here, I want to pass all elements of local mylev as an argument to the inlist function. Clearly, something is not working the right way here as var3 takes on value 0 in any case. In practise, I would like to run a similar code on a much richer dataset with 7,816 distinct values captured in the local. So an additional question on my side is whether this exceeds the limit of possible arguments of inlist?

Any help is highly appreciated.

Thanks,
Roberto
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30358
#2

27 Jul 2014, 09:35

The number of values would not be a problem in your particular example (though it might be in your real application) because, as the on-line help for inlist() says:

The number of arguments is between 2 and 255 for reals and between 2 and 10 for strings.

A more likely problem is that inlist() requires a list of arguments separated by commas, which -levelsof- does not provide. Now you could use the macro function -subinstr- to put those commas in there. But I would think you can do what you want more simply with the egen function anymatch() (as long as the values of var1 are all integers).
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#3

27 Jul 2014, 11:43

Are these 7,816 values always the same across computer runs? If so, maybe you could try an m:1 merge. Create a one variable dataset that has the 7,816 values. Then do something like

Code:

use mydata merge m:1 var2match using codesdata

If _merge = 3 then one of the 7,816 values is in your data set.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1878
#4

27 Jul 2014, 12:06

Roberto can use -isknown- command already available here: isknown.ado

The description is available here .
The example demo: do http://radyakin.org/stata/isknown/isknown_demo.do
Demo ends with an error message to illustrate what happens if vars are of different types. This is normal.

'Like it' if you want it to be posted to SSC.

Best, Sergiy Radyakin

Last edited by Sergiy Radyakin; 27 Jul 2014, 12:38. Reason: updated to reflect the homepage of -isknown- is already online
1 like
Comment
Roberto Liebscher

Join Date: Mar 2014

Posts: 92
#5

28 Jul 2014, 02:10

Thanks for all those comments. Clyde's suggestion using egen var3 = anymatch, local(`mylev') works perfectly in this particular example. So does Sergiy's isknown command which produces the same result. But the problem in my actual data is -- like already pointed at by Richard -- that the local itself is within a forvalues loop and changes with any iteration. I was hoping to achieve a more computational efficient (means less time consuming) solution by using inlist -- which I now changed in favour of the egen, newmatch command. My actual code is looking like this:

Code:

gen nearest = . sum id0 local a = `r(min)' local b = `r(max)' forvalues i=`a'/`b' { quietly count if id0 == `i' if(`r(N)'!=0) { quietly bysort id0 (distance): replace nearest = _n <= 3 if id0 == `i' quietly levelsof id if nearest == 1, local(levels) egen helpvar = anymatch(id) if id0 > `i', values(`levels') drop if helpvar == 1 drop helpvar } } }

The goal of this exercise is to achieve a matching without replacement -- this is why I delete all matches comprising id's already matched.

The dataset is itself the result of a joinby command which results in a dataset with all pairwise combinations, discussed here: http://www.statalist.org/forums/foru...idean-distance . In short, the dataset looks like this:
id0 id distance xvar0 xvar

1 101 0.25 ... ...

1 102 0.125

1 103 0.7

1 104 0

1 105 0.8

2 101 0.6

2 102 0.9

2 103 0.3

2 104 1.2

2 105 0

Does anyone have an idea how I can achieve the desired result in a more efficient manner?

Thanks again for taking the time for helping me through this issue.
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

28 Jul 2014, 10:04

I'm travelling so I missed the original joinby thread. While Roberto's solution works, the problem of forming all pairwise combinations can be handled better using cross. There's also no need to loop over each id and then use levelsof to target matching neighbors. The original problem (argument for inlist) in this post is therefore moot.

If a group == 0 id can only be neighbor to a single id from group == 1, then you must iterate until all ids have found 3 nearest neighbors.

Code:

clear
input id group xvar1 xvar2
1 1 0 1
2 1 0.5 1.2
3 0 0 1.9
4 0 0.25 1.3
5 0 0.15 1.1
6 0 0.1 0.7
7 0 0.6 1.7
8 0 0.8 0.5
9 0 0.5 0.8
10 0 0.8 1
end

tempfile main groupzero
save "`main'"

keep if group == 0
rename (id group xvar*) =0
save "`groupzero'"

* form all pairwise combinations of group 1 obs with group 0 obs
use "`main'"
keep if group == 1
cross using "`groupzero'"

gen distance = sqrt((xvar10 - xvar1)^2 + (xvar20 - xvar2)^2)

* iterate to match 3 nearest neighbors
gen nearest = 0
gen done = 0
local more 1
while `more' {
    // tag nearest obs, ignoring previous matches
    bysort id (done distance id0): replace nearest = 1 if _n == 1 & !done
    // allow only one match per id0
    bysort id0 (done distance id): replace nearest = 0 if _n > 1 & nearest
    // mark all obs of id0 as done if we have matched
    by id0: replace done = nearest[1]
    // mark all obs of id if we have found 3 matches
    bysort id: egen n = total(nearest)
    by id: replace done = 1 if n == 3
    // do we need another pass
    count if n < 3
    local more = r(N)
    drop n
}

sort id dist id0
list id id0 distance nearest, sepby(id) noobs

Comment

Roberto Liebscher

Join Date: Mar 2014

Posts: 92
#7

29 Jul 2014, 08:48

You're right Robert -- the name of the thread is somehow misleading and should be changed. But it seems to me that once you received a reply on your thread you have no chances to rename it.

Concerning your code -- it works like a charm. I only had to make minor adjustments for my particular need because in some cases I have two different id's with the same values on the xvars. Your solution is also much faster than the one I had before. I wish that one day codes like yours will cross my mind too. Thanks for sharing your expertise.
Comment

id0	id	distance	xvar0	xvar
1	101	0.25	...	...
1	102	0.125
1	103	0.7
1	104	0
1	105	0.8
2	101	0.6
2	102	0.9
2	103	0.3
2	104	1.2
2	105	0

Announcement