Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assigning a new variable the value from a random group of variables.

    Hello,
    I am looking for some guidance.

    I have 100 subjects with, 20 variables, 10 are labeled X1, X2, X3, X4 ... X10, and 10 are labeled Y1, Y2, Y3, Y4 ... Y10.
    For each subject, I want to create another variable Z that is equal to either X1 or X2 or X3 ... or X10 chosen at random.
    Variables X1...X10 and Y1...Y10 can have missing data, labeled . , therefore I only want to assign Z when the X and corresponding Y are not missing for that subject.
    I also want to create a variable "ran" that tracks which random X that was chosen.

    I've tried to solve this problem multiple ways but can't quite figure it out. I'm using Stata 13.

    gen ran = .
    gen Z = .
    forvalues x = 0/5 {
    local i = `x' + ceil(5 * uniform())
    replace ran = `i' if Z == .
    replace Z = X`i' if Z == . & X`i' != . & Y`i' != .
    }

    This loop doesn't work as it will pick a random X variable, assign it to all subjects that don't have missing values for X and Y, then will pick another random X variable and assign it to all the subjects that had missing values the first time around, and so on...

    Thanks for your suggestions.

    Best,
    Richard




  • #2
    The short answer is reshape to long.

    If you want details, perhaps you could post the example data?

    Best, Sergiy

    Comment


    • #3
      This is one way to approach it. You don't give a data example, so I made one up. Clearly I don't need to use 10 pairs of variables; 3 or even 2 is enough to show the principle. .

      There are no missing values here, but the code is there to cope with them if they existed.

      Your code seems to imply equal probabilities for the candidate variables; that's not an explicit rule in the text.

      But you'd need another loop if you want to repeat choices if missings make earlier choices inapplicable. You might need a stopping rule too. (What if all the X* or all the Y* were missing?)

      Code:
      clear  
      set obs 10 
      set seed 2803 
      forval j = 1/3 { 
         gen X`j' = runiformint(1, 10) 
         gen Y`j' = runiformint(-5, -1) 
      }
      
      
      gen Z = . 
      gen which = runiformint(1,3) 
      
      forval j = 1/3 { 
         replace Z = X`j' if which == `j' & X`j' < . & Y`j' < . 
      } 
      
      list Y* X* which Z , sep(0) 
         
           +-----------------------------------------+
           | Y1   Y2   Y3   X1   X2   X3   which   Z |
           |-----------------------------------------|
        1. | -3   -4   -4    6    8    1       3   1 |
        2. | -5   -3   -4    2    3    8       3   8 |
        3. | -5   -4   -4    6    5    7       1   6 |
        4. | -1   -4   -1    1    2   10       1   1 |
        5. | -4   -4   -4    3    8    8       3   8 |
        6. | -4   -3   -1    7    2    1       1   7 |
        7. | -2   -3   -1    7    5    6       2   5 |
        8. | -2   -2   -1    2    1    4       3   4 |
        9. | -3   -1   -3    1    3    5       1   1 |
       10. | -2   -2   -4    2    3    5       3   5 |
           +-----------------------------------------+

      Comment


      • #4
        Thank you for your quick replies and help!
        Nick your solution helped me solve the problem. Thank you! I've been thinking about this for awhile.

        Because I'm using Stata 13 I had to use:
        gen which = .
        replace which = floor((10-1+1)*runiform()+1)
        To create a random integer from 1 to 10.

        Best,
        Richard

        Comment


        • #5
          Code:
          ceil(10 * runiform())

          Comment


          • #6

            Building on top of the slightly modified Nick's data generation code:

            Code:
            clear  
            set obs 10 
            set seed 2201 
            forval j = 1/10 { 
               gen X`j' = runiformint(1, 10)+0.5 
               gen Y`j' = runiformint(-5, -1) 
            }
            // add some missings
            replace X5=. in 3
            replace Y7=. in 4
            
            generate sbj=_n
            
            
            reshape long X Y,  i(sbj) j(idx)
            
            generate r=cond(missing(X), ., runiform())
            sort sbj r
            by sbj: generate pickX=X[1]
            by sbj: generate whichX=idx[1]
            
            replace r=cond(missing(Y), ., runiform())
            sort sbj r
            by sbj: generate pickY=Y[1]
            by sbj: generate whichY=idx[1]
            
            contract sbj whichX pickX whichY pickY
            drop _freq
            
            
            list
            The code should work for any number of X's and Y's and will automatically result in picking a random missing when all values (all X's or all Y's) are missing.

            Code:
                 +---------------------------------------+
                 | sbj   pickX   whichX   pickY   whichY |
                 |---------------------------------------|
              1. |   1     6.5        4      -4        7 |
              2. |   2     4.5        5      -2        8 |
              3. |   3     2.5       10      -3        2 |
              4. |   4     3.5        8      -4        2 |
              5. |   5     4.5        2      -1        3 |
                 |---------------------------------------|
              6. |   6     1.5        9      -3        4 |
              7. |   7     3.5        4      -3        8 |
              8. |   8     9.5        6      -2        7 |
              9. |   9     6.5        9      -5        7 |
             10. |  10     4.5       10      -5       10 |
                 +---------------------------------------+

            Best, Sergiy

            Comment

            Working...
            X