Is there a command like -egen rowrandom()-?

Gobinda Natak

Join Date: Sep 2016

Posts: 79
#1

Is there a command like -egen rowrandom()-?

01 Nov 2019, 17:48

Hi all,

I have data set that look like this:

Code:

input str1 id x1 x2 x3 x4 x5 "A" 1 . 2 . 2 "B" 5 1 3 5 5 "C" 2 3 2 . 1 "D" . . . . . "E" 6 6 1 4 1 end

Is there a command that creates a new variable x which contains a randomly chosen non-missing value of the variables x1 to x5. (And if all of x1 to x5 are missing (think D), x should also be missing.)

Is there a simple solution to this problem, think -egen x = rowrandom(x1 x2 x3 x4 x5)-?

Thanks
Go

Last edited by Gobinda Natak; 01 Nov 2019, 17:53.
Tags: None
West Addison

Join Date: Jun 2015

Posts: 13
#2

01 Nov 2019, 19:01

I am not aware of any one command to do what you want. I think you'll have to use several commands, but it's not too hard. I would do something like this:

Code:

reshape long x, i(id) j(j) gen byte missing = missing(x) set seed 647896396 gen double random = runiform() bysort id (missing random): keep if _n == 1
Comment
Gobinda Natak

Join Date: Sep 2016

Posts: 79
#3

01 Nov 2019, 19:37

Thanks so much; I was afraid that this would be the solution. My data set is very large, i.e. reshaping takes quite some time, and the do-file is quite obfuscated already, but I guess that's as good as it gets.

Have a good weekend
Go
Comment
West Addison

Join Date: Jun 2015

Posts: 13
#4

01 Nov 2019, 20:48

I agree that reshaping large data sets can be painfully slow. You might want to consider faster, user-written versions of reshape, such as sreshape, fastreshape, and greshape (part of gtools). Perhaps there is a user-written command like your suggested egen x = rowrandom(), but I would think it pretty likely that the .ado file for such a command would would include a reshape.

Have a good weekend!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35657

02 Nov 2019, 04:20

This is a counter-example to the idea that a reshape is needed. It does hinge on an observation that the example data are integers. rowsort is from the Stata Journal.

Code:

clear

input str1 id x1 x2 x3 x4 x5
"A" 1 . 2 . 2
"B" 5 1 3 5 5
"C" 2 3 2 . 1
"D" . . . . .
"E" 6 6 1 4 1
end

rowsort x1-x5, gen(X1-X5)
egen all = concat(X?), p(" ")
gen wanted = real(word(all, runiformint(1,5))) if X5 < .
forval j = 4(-1)1 {  
    replace wanted = real(word(all, runiformint(1, `j'))) if X`j' < . & missing(wanted)
}

list


    +---------------------------------------------------------------------------+
     | id   x1   x2   x3   x4   x5   X1   X2   X3   X4   X5         all   wanted |
     |---------------------------------------------------------------------------|
  1. |  A    1    .    2    .    2    1    2    2    .    .   1 2 2 . .        2 |
  2. |  B    5    1    3    5    5    1    3    5    5    5   1 3 5 5 5        5 |
  3. |  C    2    3    2    .    1    1    2    2    3    .   1 2 2 3 .        1 |
  4. |  D    .    .    .    .    .    .    .    .    .    .   . . . . .        . |
  5. |  E    6    6    1    4    1    1    1    4    6    6   1 1 4 6 6        1 |
     +---------------------------------------------------------------------------+

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35657

02 Nov 2019, 04:51

Here's another way to do it, which is just proof of concept rather than fully-fledged. This should be faster than my previous.

Code:

clear

input str1 id x1 x2 x3 x4 x5
"A" 1 . 2 . 2
"B" 5 1 3 5 5
"C" 2 3 2 . 1
"D" . . . . .
"E" 6 6 1 4 1
end

mata mata clear

mata :

void row_random(string scalar varnames, string scalar newname)
{
    real matrix y
    real colvector work, wanted
    real scalar i, ncols  

    st_view(y, ., varnames)
    ncols = cols(y)
    wanted = J(rows(y), 1, .)
    
    for(i = 1; i <= rows(y); i++) {
        work = sort(y[i,]', 1)  
        nuse = ncols - missing(work)
        if (nuse) wanted[i] = work[runiformint(1,1,1, nuse)]
    }
    
    (void) st_addvar("int", newname)
    st_store(., newname, wanted)
}    

end

mata : row_random("x1 x2 x3 x4 x5", "another")

list

     +---------------------------------------+
     | id   x1   x2   x3   x4   x5   another |
     |---------------------------------------|
  1. |  A    1    .    2    .    2         2 |
  2. |  B    5    1    3    5    5         5 |
  3. |  C    2    3    2    .    1         2 |
  4. |  D    .    .    .    .    .         . |
  5. |  E    6    6    1    4    1         6 |
     +---------------------------------------+

Better code would support if and in and setting seed on the fly,

Comment

Gobinda Natak

Join Date: Sep 2016

Posts: 79
#7

02 Nov 2019, 18:40

Great, thank you so much!

Go
Comment

Announcement

Is there a command like -egen rowrandom()-?

Comment

Comment

Comment

Comment

Comment

Comment