Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is there a command like -egen rowrandom()-?

    Hi all,

    I have data set that look like this:

    Code:
    input str1 id x1 x2 x3 x4 x5
    "A" 1 . 2 . 2
    "B" 5 1 3 5 5
    "C" 2 3 2 . 1
    "D" . . . . .
    "E" 6 6 1 4 1
    end
    Is there a command that creates a new variable x which contains a randomly chosen non-missing value of the variables x1 to x5. (And if all of x1 to x5 are missing (think D), x should also be missing.)

    Is there a simple solution to this problem, think -egen x = rowrandom(x1 x2 x3 x4 x5)-?

    Thanks
    Go
    Last edited by Gobinda Natak; 01 Nov 2019, 17:53.

  • #2
    I am not aware of any one command to do what you want. I think you'll have to use several commands, but it's not too hard. I would do something like this:
    Code:
    reshape long x, i(id) j(j)
    gen byte missing = missing(x)
    set seed 647896396
    gen double random = runiform()
    bysort id (missing random):  keep if _n == 1

    Comment


    • #3
      Thanks so much; I was afraid that this would be the solution. My data set is very large, i.e. reshaping takes quite some time, and the do-file is quite obfuscated already, but I guess that's as good as it gets.

      Have a good weekend
      Go

      Comment


      • #4
        I agree that reshaping large data sets can be painfully slow. You might want to consider faster, user-written versions of reshape, such as sreshape, fastreshape, and greshape (part of gtools). Perhaps there is a user-written command like your suggested egen x = rowrandom(), but I would think it pretty likely that the .ado file for such a command would would include a reshape.

        Have a good weekend!

        Comment


        • #5
          This is a counter-example to the idea that a reshape is needed. It does hinge on an observation that the example data are integers. rowsort is from the Stata Journal.

          Code:
          clear
          
          input str1 id x1 x2 x3 x4 x5
          "A" 1 . 2 . 2
          "B" 5 1 3 5 5
          "C" 2 3 2 . 1
          "D" . . . . .
          "E" 6 6 1 4 1
          end
          
          rowsort x1-x5, gen(X1-X5)
          egen all = concat(X?), p(" ")
          gen wanted = real(word(all, runiformint(1,5))) if X5 < .
          forval j = 4(-1)1 {  
              replace wanted = real(word(all, runiformint(1, `j'))) if X`j' < . & missing(wanted)
          }
          
          list
          
          
              +---------------------------------------------------------------------------+
               | id   x1   x2   x3   x4   x5   X1   X2   X3   X4   X5         all   wanted |
               |---------------------------------------------------------------------------|
            1. |  A    1    .    2    .    2    1    2    2    .    .   1 2 2 . .        2 |
            2. |  B    5    1    3    5    5    1    3    5    5    5   1 3 5 5 5        5 |
            3. |  C    2    3    2    .    1    1    2    2    3    .   1 2 2 3 .        1 |
            4. |  D    .    .    .    .    .    .    .    .    .    .   . . . . .        . |
            5. |  E    6    6    1    4    1    1    1    4    6    6   1 1 4 6 6        1 |
               +---------------------------------------------------------------------------+

          Comment


          • #6
            Here's another way to do it, which is just proof of concept rather than fully-fledged. This should be faster than my previous.

            Code:
            clear
            
            input str1 id x1 x2 x3 x4 x5
            "A" 1 . 2 . 2
            "B" 5 1 3 5 5
            "C" 2 3 2 . 1
            "D" . . . . .
            "E" 6 6 1 4 1
            end
            
            mata mata clear
            
            mata :
            
            void row_random(string scalar varnames, string scalar newname)
            {
                real matrix y
                real colvector work, wanted
                real scalar i, ncols  
            
                st_view(y, ., varnames)
                ncols = cols(y)
                wanted = J(rows(y), 1, .)
                
                for(i = 1; i <= rows(y); i++) {
                    work = sort(y[i,]', 1)  
                    nuse = ncols - missing(work)
                    if (nuse) wanted[i] = work[runiformint(1,1,1, nuse)]
                }
                
                (void) st_addvar("int", newname)
                st_store(., newname, wanted)
            }    
            
            end
            
            mata : row_random("x1 x2 x3 x4 x5", "another")
            
            list
            
                 +---------------------------------------+
                 | id   x1   x2   x3   x4   x5   another |
                 |---------------------------------------|
              1. |  A    1    .    2    .    2         2 |
              2. |  B    5    1    3    5    5         5 |
              3. |  C    2    3    2    .    1         2 |
              4. |  D    .    .    .    .    .         . |
              5. |  E    6    6    1    4    1         6 |
                 +---------------------------------------+
            Better code would support if and in and setting seed on the fly,

            Comment


            • #7
              Great, thank you so much!

              Go

              Comment

              Working...
              X