Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fadzai Chikwava
    started a topic Reshape long to wide

    Reshape long to wide

    Please advise syntax to reshape long to wide format for variables "fs_b4exit fsY1 fsY2 fsY3 fsY4" each coded 0/1 for the time period "delay". ID is "personid"

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double personid float(delay10 fs_b4exit fsY1 fsY2 fsY3 fsY4)
    2337 3 0 0 0 1 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 3 0 0 0 0 0
    2337 4 0 0 0 0 1
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 4 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    2337 5 0 0 0 0 0
    end

  • Fadzai Chikwava
    replied
    Thanks. I saw the complication and thought there would be a faster way without having to delete the rows I many not need for each variable and data needed.

    For instance for ID "28548". "The value I am interested in for "tot_mh_any" = 11 which is total mental health across all the years. Hence I mentioned I would need syntax which can select ".....if seq_tot_mh_any==1" meaning the other rows wont be needed where "seq_tot_my_any > 1"

    This applies to other variables e.g. if "delay10"=1-2 years. The value I am interested in is =4 i.e. "tot_mh_anyYR = 4 if "seq_tot_mh_anyYR"==1. The other rows "seq_tot_mh_anyYR >1 are not needed.

    It seems I would have to do this for each variable, deleting rows I don't need, but this may take time and I thought there would be a much easier way to do it.

    Thanks

    Leave a comment:


  • Clyde Schechter
    replied
    Your example data is missing some key variable that is necessary. You can't go long on personid and wide on delay10 because there care sometimes multiple observations with the same value of personid and delay10. So there would be no way for Stata to know which of those observations provides the correct values of those *tot_mh* variables for that combination. For example consider all of the observaiotions with personid = 28548 and delay10 = 1-2 years. If you want wide, what would the correct value of seq_tot_mh_any be: as 5, 6, 8, 7, and . are all possibilities in your example.

    There must be some additional variable(s) involved that you have not shown.

    I should add that if the driving factor here is to merge with other data that are in wide layout, a possible solution is to first change those to long layout as well, and then -merge- them, though without knowing what those other data sets are like, I can't be sure this would be viable for your particular situation.

    Leave a comment:


  • Fadzai Chikwava
    replied
    Dear Clyde
    Apologies, I think I am getting confused as to which method I should choose and as you said above, I tend to agree, I will have too many variables. Maybe if possible you may supply two types of syntax for the above data:

    1. long on personid, but wide on the value of delay10.

    2. Fully wide.

    Leave a comment:


  • Fadzai Chikwava
    replied
    Sorry just to add again, I would like to go fully wide.

    Leave a comment:


  • Fadzai Chikwava
    replied
    Thanks. I am at a point where I need to merge some derived variables from various datasets and would like to have the data in wide format showing the data at different time points as well (i.e at "delay10"==1 or 2 or 3............It would be difficult for me to derive new variables where I have to merge some variables in long formats.

    Another example is shown below. I would like to have the following variables in wide format.

    1. One variable: tot_mh_any if seq_tot_mh_any==1

    2. Six variables of tot_mh_anyYR if seq_tot_mh_anyYR==1 for each year i.e. delay10=1,2,....6

    3. Can I merge with multiple other variables with the same format e.g. tot_mh_organic, seq_tot_mh_organic, etc etc


    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double personid float(delay10 tot_mh_any seq_tot_mh_any tot_mh_anyYR seq_tot_mh_anyYR tot_mh_organic seq_tot_mh_organic tot_mh_organicYR seq_tot_mh_organicYR)
     2337 1  .  .  .  . . . . .
     2337 2  .  .  .  . . . . .
     2337 4  .  .  .  . . . . .
     2337 4  .  .  .  . . . . .
     2337 4  .  .  .  . . . . .
     2337 4  .  .  .  . . . . .
     2337 5  .  .  .  . . . . .
     2337 5  .  .  .  . . . . .
     3172 1  .  .  .  . . . . .
     3172 2  4  1  1  1 . . . .
     3172 3  .  .  .  . . . . .
     3172 4  4  2  2  1 . . . .
     3172 4  4  3  2  2 . . . .
     3172 5  4  4  1  1 . . . .
     3653 1  .  .  .  . . . . .
     3653 2  .  .  .  . . . . .
     3653 2  .  .  .  . . . . .
     3653 3  .  .  .  . . . . .
     3653 4  .  .  .  . . . . .
     3653 5  .  .  .  . . . . .
     3653 5  .  .  .  . . . . .
     3653 6  .  .  .  . . . . .
     5625 1  6  1  4  1 . . . .
     5625 1  6  2  4  2 . . . .
     5625 1  6  3  4  3 . . . .
     5625 1  6  4  4  4 . . . .
     5625 3  6  5  1  1 . . . .
     5625 5  6  6  1  1 . . . .
    10045 5  .  .  .  . . . . .
    11758 1 20  1 10  1 . . . .
    11758 1 20  2 10  2 . . . .
    11758 1 20  3 10  3 . . . .
    11758 1 20  5 10  5 . . . .
    11758 1 20  4 10  4 . . . .
    11758 1 20  6 10  6 . . . .
    11758 1 20  7 10  7 . . . .
    11758 1 20  8 10  8 . . . .
    11758 1 20  9 10  9 . . . .
    11758 1 20 10 10 10 . . . .
    11758 2 20 12  9  2 . . . .
    11758 2 20 11  9  1 . . . .
    11758 2 20 13  9  3 . . . .
    11758 2 20 14  9  4 . . . .
    11758 2 20 15  9  5 . . . .
    11758 2 20 16  9  6 . . . .
    11758 2 20 17  9  7 . . . .
    11758 2 20 18  9  8 . . . .
    11758 2 20 19  9  9 . . . .
    11758 3 20 20  1  1 . . . .
    14193 4  .  .  .  . . . . .
    17206 4  .  .  .  . . . . .
    17206 5  .  .  .  . . . . .
    17353 1  .  .  .  . . . . .
    19252 4  1  1  1  1 . . . .
    19949 1  1  1  1  1 . . . .
    19949 1  .  .  .  . . . . .
    19949 2  .  .  .  . . . . .
    19949 3  .  .  .  . . . . .
    19949 4  .  .  .  . . . . .
    22038 3  1  1  1  1 . . . .
    23662 2  .  .  .  . . . . .
    23662 4  1  1  1  1 . . . .
    23662 5  .  .  .  . . . . .
    23662 5  .  .  .  . . . . .
    23662 5  .  .  .  . . . . .
    24787 1  1  1  1  1 . . . .
    24787 1  .  .  .  . . . . .
    24787 1  .  .  .  . . . . .
    24787 1  .  .  .  . . . . .
    24787 2  .  .  .  . . . . .
    24787 4  .  .  .  . . . . .
    24787 4  .  .  .  . . . . .
    24787 4  .  .  .  . . . . .
    24787 4  .  .  .  . . . . .
    26920 5  1  1  1  1 . . . .
    27533 1  .  .  .  . . . . .
    27533 4  .  .  .  . . . . .
    27533 4  .  .  .  . . . . .
    28539 1  .  .  .  . . . . .
    28539 2  1  1  1  1 . . . .
    28539 4  .  .  .  . . . . .
    28548 1 11  1  4  1 . . . .
    28548 1 11  2  4  2 . . . .
    28548 1 11  3  4  3 . . . .
    28548 1 11  4  4  4 . . . .
    28548 2 11  5  4  1 . . . .
    28548 2  .  .  .  . . . . .
    28548 2 11  6  4  2 . . . .
    28548 2 11  7  4  3 . . . .
    28548 2 11  8  4  4 . . . .
    28548 3 11  9  2  1 . . . .
    28548 3  .  .  .  . . . . .
    28548 3 11 10  2  2 . . . .
    28548 4  .  .  .  . . . . .
    28548 4 11 11  1  1 . . . .
    28548 4  .  .  .  . . . . .
    28818 2  .  .  .  . . . . .
    28818 5  .  .  .  . . . . .
    28818 6  .  .  .  . . . . .
    28818 6  .  .  .  . . . . .
    end
    label values delay10 delay10
    label def delay10 1 "0-1 years", modify
    label def delay10 2 "1-2 years", modify
    label def delay10 3 "2-3 years", modify
    label def delay10 4 "3-4 years", modify
    label def delay10 5 "4-5 years", modify
    label def delay10 6 "5-6 years", modify

    Leave a comment:


  • Clyde Schechter
    replied
    Well your existing data is in a doubly long layout. (Which is good, and before you do anything like you ask, you should really stop and consider whether that's a good idea--it probably isn't). Your question does not make clear whether you want it to remain long on personid, but wide on the value of delay, or whether you want to go fully wide. I'll assume the former:

    Code:
    gen long obs_no = _n
    rename fs* =_
    reshape wide fs*_, i(obs_no) j(delay10)
    Again, before doing this, make sure there is a good reason to, as the longer your layout, the easier it is to work with in Stata for most purposes.

    If you want to go double wide, I would say really just don't. The resulting data set would have so many variables as to be almost useless.

    Leave a comment:

Working...
X