Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with reshape

    I have a dataset with 1000 rows and 100 columns. The rows represent y-axis coordinates (y1 to y1000) and the columns x-axis coordinates (x1 to x100). Each data point shows the thickness of tissue at the given x,y coordinate for a single patient. I want to compare tissue thickness between different patients for each location. I am trying to reshape the data so there is a single row for each patient with 100,000 variables (x1,y1) (x1,y2) to (x1, y1000) etc. I can then append data from other patients.
    I have tried using reshape to do this but without success. Any advice would be much appreciated.
    I am using Stata 12.1 (Mac).

  • #2
    Even in the biggest versions of Stata 14 you can't have more than 32767 variables. So, you certainly need a different data structure to do what you want in Stata.

    On your machine go

    Code:
    help limits
    to see the limits. # of variables is mentioned early.

    I'd reshape to three variables, x, y, z

    From your description I would guess at one y variable with 1000 observations and 100 x variables. Here is what I would do with a toy dataset with the same structure. First I just invent some data.

    Code:
    clear
    
    set obs 5
    
    gen y = _n
    
    forval j = 1/10 {
         gen x`j' = ceil(10 * runiform()^2)
    }
    
     
    list
    
         +------------------------------------------------------+
         | y   x1   x2   x3   x4   x5   x6   x7   x8   x9   x10 |
         |------------------------------------------------------|
      1. | 1    2    2    1    2    9    6    8    1    9     9 |
      2. | 2    1    1    8    1    3    1    1    1    1     5 |
      3. | 3    1    2    4    6    1    6    3    6    8     1 |
      4. | 4    1    4    2    5    2    3    8   10    8     7 |
      5. | 5    8    8    8    5   10    6    6    6    6     1 |
         +------------------------------------------------------+
    
    * now I have a sandbox
    
    reshape long x, i(y)
    rename x z
    rename _j x
    
    list
    
         +-------------+
         | y    x    z |
         |-------------|
      1. | 1    1    2 |
      2. | 1    2    2 |
      3. | 1    3    1 |
      4. | 1    4    2 |
      5. | 1    5    9 |
         |-------------|
      6. | 1    6    6 |
      7. | 1    7    8 |
      8. | 1    8    1 |
      9. | 1    9    9 |
     10. | 1   10    9 |
         |-------------|
     11. | 2    1    1 |
     12. | 2    2    1 |
     13. | 2    3    8 |
     14. | 2    4    1 |
     15. | 2    5    3 |
         |-------------|
     16. | 2    6    1 |
     17. | 2    7    1 |
     18. | 2    8    1 |
     19. | 2    9    1 |
     20. | 2   10    5 |
         |-------------|
     21. | 3    1    1 |
     22. | 3    2    2 |
     23. | 3    3    4 |
     24. | 3    4    6 |
     25. | 3    5    1 |
         |-------------|
     26. | 3    6    6 |
     27. | 3    7    3 |
     28. | 3    8    6 |
     29. | 3    9    8 |
     30. | 3   10    1 |
         |-------------|
     31. | 4    1    1 |
     32. | 4    2    4 |
     33. | 4    3    2 |
     34. | 4    4    5 |
     35. | 4    5    2 |
         |-------------|
     36. | 4    6    3 |
     37. | 4    7    8 |
     38. | 4    8   10 |
     39. | 4    9    8 |
     40. | 4   10    7 |
         |-------------|
     41. | 5    1    8 |
     42. | 5    2    8 |
     43. | 5    3    8 |
     44. | 5    4    5 |
     45. | 5    5   10 |
         |-------------|
     46. | 5    6    6 |
     47. | 5    7    6 |
     48. | 5    8    6 |
     49. | 5    9    6 |
     50. | 5   10    1 |
         +-------------+
    In your case,

    You would use a more informative name for the third variable. Tissue thickness measured at different times can't be given the same name in the same dataset,

    You would merge with other datasets with the same structure. merge on the x and y coordinates.

    I've used this structure happily with satellite imagery.

    NB: Although what you want is just not possible in Stata, in general you should display code that you tried and the exact results of that code. "without success" is not a good error report; if you are a medic, it means no more than "I don't feel well". Diagnosis is hard on both sides.
    Last edited by Nick Cox; 19 Jul 2015, 09:09.

    Comment


    • #3
      Thanks Nick, much appreciated..that's solved the problem.

      Comment

      Working...
      X