Assigning a new variable the value from a random group of variables.

Richard Stallman

Join Date: Feb 2015

Posts: 8
#1

Assigning a new variable the value from a random group of variables.

22 Jan 2018, 09:38

Hello,
I am looking for some guidance.

I have 100 subjects with, 20 variables, 10 are labeled X1, X2, X3, X4 ... X10, and 10 are labeled Y1, Y2, Y3, Y4 ... Y10.
For each subject, I want to create another variable Z that is equal to either X1 or X2 or X3 ... or X10 chosen at random.
Variables X1...X10 and Y1...Y10 can have missing data, labeled . , therefore I only want to assign Z when the X and corresponding Y are not missing for that subject.
I also want to create a variable "ran" that tracks which random X that was chosen.

I've tried to solve this problem multiple ways but can't quite figure it out. I'm using Stata 13.

gen ran = .
gen Z = .
forvalues x = 0/5 {
local i = `x' + ceil(5 * uniform())
replace ran = `i' if Z == .
replace Z = X`i' if Z == . & X`i' != . & Y`i' != .
}

This loop doesn't work as it will pick a random X variable, assign it to all subjects that don't have missing values for X and Y, then will pick another random X variable and assign it to all the subjects that had missing values the first time around, and so on...

Thanks for your suggestions.

Best,
Richard
Tags: None
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#2

22 Jan 2018, 09:58

The short answer is reshape to long.

If you want details, perhaps you could post the example data?

Best, Sergiy
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35612

22 Jan 2018, 10:03

This is one way to approach it. You don't give a data example, so I made one up. Clearly I don't need to use 10 pairs of variables; 3 or even 2 is enough to show the principle. .

There are no missing values here, but the code is there to cope with them if they existed.

Your code seems to imply equal probabilities for the candidate variables; that's not an explicit rule in the text.

But you'd need another loop if you want to repeat choices if missings make earlier choices inapplicable. You might need a stopping rule too. (What if all the X* or all the Y* were missing?)

Code:

clear  
set obs 10 
set seed 2803 
forval j = 1/3 { 
   gen X`j' = runiformint(1, 10) 
   gen Y`j' = runiformint(-5, -1) 
}


gen Z = . 
gen which = runiformint(1,3) 

forval j = 1/3 { 
   replace Z = X`j' if which == `j' & X`j' < . & Y`j' < . 
} 

list Y* X* which Z , sep(0) 
   
     +-----------------------------------------+
     | Y1   Y2   Y3   X1   X2   X3   which   Z |
     |-----------------------------------------|
  1. | -3   -4   -4    6    8    1       3   1 |
  2. | -5   -3   -4    2    3    8       3   8 |
  3. | -5   -4   -4    6    5    7       1   6 |
  4. | -1   -4   -1    1    2   10       1   1 |
  5. | -4   -4   -4    3    8    8       3   8 |
  6. | -4   -3   -1    7    2    1       1   7 |
  7. | -2   -3   -1    7    5    6       2   5 |
  8. | -2   -2   -1    2    1    4       3   4 |
  9. | -3   -1   -3    1    3    5       1   1 |
 10. | -2   -2   -4    2    3    5       3   5 |
     +-----------------------------------------+

Comment

Richard Stallman

Join Date: Feb 2015

Posts: 8
#4

22 Jan 2018, 10:29

Thank you for your quick replies and help!
Nick your solution helped me solve the problem. Thank you! I've been thinking about this for awhile.

Because I'm using Stata 13 I had to use:
gen which = .
replace which = floor((10-1+1)*runiform()+1)
To create a random integer from 1 to 10.

Best,
Richard
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35612
#5

22 Jan 2018, 11:02

Code:

ceil(10 * runiform())
Comment

Sergiy Radyakin

Join Date: Apr 2014
Posts: 1867

22 Jan 2018, 17:26

Building on top of the slightly modified Nick's data generation code:

Code:

clear  
set obs 10 
set seed 2201 
forval j = 1/10 { 
   gen X`j' = runiformint(1, 10)+0.5 
   gen Y`j' = runiformint(-5, -1) 
}
// add some missings
replace X5=. in 3
replace Y7=. in 4

generate sbj=_n


reshape long X Y,  i(sbj) j(idx)

generate r=cond(missing(X), ., runiform())
sort sbj r
by sbj: generate pickX=X[1]
by sbj: generate whichX=idx[1]

replace r=cond(missing(Y), ., runiform())
sort sbj r
by sbj: generate pickY=Y[1]
by sbj: generate whichY=idx[1]

contract sbj whichX pickX whichY pickY
drop _freq


list

The code should work for any number of X's and Y's and will automatically result in picking a random missing when all values (all X's or all Y's) are missing.

Code:

     +---------------------------------------+
     | sbj   pickX   whichX   pickY   whichY |
     |---------------------------------------|
  1. |   1     6.5        4      -4        7 |
  2. |   2     4.5        5      -2        8 |
  3. |   3     2.5       10      -3        2 |
  4. |   4     3.5        8      -4        2 |
  5. |   5     4.5        2      -1        3 |
     |---------------------------------------|
  6. |   6     1.5        9      -3        4 |
  7. |   7     3.5        4      -3        8 |
  8. |   8     9.5        6      -2        7 |
  9. |   9     6.5        9      -5        7 |
 10. |  10     4.5       10      -5       10 |
     +---------------------------------------+

Best, Sergiy

Announcement