Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use elements of a local as argument for inlist function

    Dear Statalisters,

    this might be obvious for a more advanced Stata user but still I cannot manage to solve the issue. I would like to check for any observation whether or not the value of a certain variable is element of a local. To make things clearer I add a hypothetical example here:

    Code:
    clear all
    input id var1 var2
    1 3 2
    1 4 5
    2 6 4
    2 2 1
    2 4 1
    3 4 5
    4 9 3
    end
    
    levelsof var1, local(mylev)
    gen var3 = inlist(var2, `mylev')
    Here, I want to pass all elements of local mylev as an argument to the inlist function. Clearly, something is not working the right way here as var3 takes on value 0 in any case. In practise, I would like to run a similar code on a much richer dataset with 7,816 distinct values captured in the local. So an additional question on my side is whether this exceeds the limit of possible arguments of inlist?

    Any help is highly appreciated.

    Thanks,
    Roberto

  • #2
    The number of values would not be a problem in your particular example (though it might be in your real application) because, as the on-line help for inlist() says:
    The number of arguments is between 2 and 255 for reals and between 2 and 10 for strings.
    A more likely problem is that inlist() requires a list of arguments separated by commas, which -levelsof- does not provide. Now you could use the macro function -subinstr- to put those commas in there. But I would think you can do what you want more simply with the egen function anymatch() (as long as the values of var1 are all integers).

    Comment


    • #3
      Are these 7,816 values always the same across computer runs? If so, maybe you could try an m:1 merge. Create a one variable dataset that has the 7,816 values. Then do something like

      Code:
      use mydata
      merge m:1 var2match using codesdata
      If _merge = 3 then one of the 7,816 values is in your data set.
      -------------------------------------------
      Richard Williams
      Professor Emeritus of Sociology
      University of Notre Dame
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://academicweb.nd.edu/~rwilliam/

      Comment


      • #4
        Roberto can use -isknown- command already available here: isknown.ado

        The description is available here .
        The example demo: do http://radyakin.org/stata/isknown/isknown_demo.do
        Demo ends with an error message to illustrate what happens if vars are of different types. This is normal.

        'Like it' if you want it to be posted to SSC.

        Best, Sergiy Radyakin
        Last edited by Sergiy Radyakin; 27 Jul 2014, 12:38. Reason: updated to reflect the homepage of -isknown- is already online

        Comment


        • #5
          Thanks for all those comments. Clyde's suggestion using egen var3 = anymatch, local(`mylev') works perfectly in this particular example. So does Sergiy's isknown command which produces the same result. But the problem in my actual data is -- like already pointed at by Richard -- that the local itself is within a forvalues loop and changes with any iteration. I was hoping to achieve a more computational efficient (means less time consuming) solution by using inlist -- which I now changed in favour of the egen, newmatch command. My actual code is looking like this:
          Code:
          gen nearest    = .
          sum id0
          local a = `r(min)'
          local b = `r(max)'
          forvalues i=`a'/`b' {
          quietly count if id0 == `i'
          if(`r(N)'!=0) {
              quietly bysort id0 (distance): replace nearest = _n <= 3 if id0 == `i'
              quietly levelsof id if nearest == 1, local(levels)
              egen helpvar = anymatch(id) if id0 > `i', values(`levels')
              drop if helpvar == 1
              drop helpvar
                  }
              }
          }
          The goal of this exercise is to achieve a matching without replacement -- this is why I delete all matches comprising id's already matched.

          The dataset is itself the result of a joinby command which results in a dataset with all pairwise combinations, discussed here: http://www.statalist.org/forums/foru...idean-distance . In short, the dataset looks like this:
          id0 id distance xvar0 xvar
          1 101 0.25 ... ...
          1 102 0.125
          1 103 0.7
          1 104 0
          1 105 0.8
          2 101 0.6
          2 102 0.9
          2 103 0.3
          2 104 1.2
          2 105 0
          Does anyone have an idea how I can achieve the desired result in a more efficient manner?

          Thanks again for taking the time for helping me through this issue.

          Comment


          • #6
            I'm travelling so I missed the original joinby thread. While Roberto's solution works, the problem of forming all pairwise combinations can be handled better using cross. There's also no need to loop over each id and then use levelsof to target matching neighbors. The original problem (argument for inlist) in this post is therefore moot.

            If a group == 0 id can only be neighbor to a single id from group == 1, then you must iterate until all ids have found 3 nearest neighbors.

            Code:
            clear
            input id group xvar1 xvar2
            1 1 0 1
            2 1 0.5 1.2
            3 0 0 1.9
            4 0 0.25 1.3
            5 0 0.15 1.1
            6 0 0.1 0.7
            7 0 0.6 1.7
            8 0 0.8 0.5
            9 0 0.5 0.8
            10 0 0.8 1
            end
            
            tempfile main groupzero
            save "`main'"
            
            keep if group == 0
            rename (id group xvar*) =0
            save "`groupzero'"
            
            * form all pairwise combinations of group 1 obs with group 0 obs
            use "`main'"
            keep if group == 1
            cross using "`groupzero'"
            
            gen distance = sqrt((xvar10 - xvar1)^2 + (xvar20 - xvar2)^2)
            
            * iterate to match 3 nearest neighbors
            gen nearest = 0
            gen done = 0
            local more 1
            while `more' {
                // tag nearest obs, ignoring previous matches
                bysort id (done distance id0): replace nearest = 1 if _n == 1 & !done
                // allow only one match per id0
                bysort id0 (done distance id): replace nearest = 0 if _n > 1 & nearest
                // mark all obs of id0 as done if we have matched
                by id0: replace done = nearest[1]
                // mark all obs of id if we have found 3 matches
                bysort id: egen n = total(nearest)
                by id: replace done = 1 if n == 3
                // do we need another pass
                count if n < 3
                local more = r(N)
                drop n
            }
            
            sort id dist id0
            list id id0 distance nearest, sepby(id) noobs

            Comment


            • #7
              You're right Robert -- the name of the thread is somehow misleading and should be changed. But it seems to me that once you received a reply on your thread you have no chances to rename it.

              Concerning your code -- it works like a charm. I only had to make minor adjustments for my particular need because in some cases I have two different id's with the same values on the xvars. Your solution is also much faster than the one I had before. I wish that one day codes like yours will cross my mind too. Thanks for sharing your expertise.

              Comment

              Working...
              X