Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • for loops

    hello,

    im trying to split a huge data set into about 400 files so that i can run an analysis on them with stata using collapse and reshape. the larger the number of files i can split into, the faster it will run.

    since i have panel data that consists of multiple entries for each patient_id, i have to make sure that identical id's will stay together. so i am trying to form groups of Id's.

    my data looks like this:
    patient_id x y z
    1
    1
    1
    2
    3
    3
    3
    4
    4

    I d like to group the patient_id's like this:
    patient_id group x y z
    1 1
    1 1
    1 1
    2 1
    3 2
    3 2
    3 2
    4 2
    4 2
    i m looking for a way to automate these commands to group the patient_ids:
    gen group=1 if patient_id<=2
    replace group=2 if patient_id >2 & patient_id<=4
    replace group=3 ....... and so on for 400 different groups.


    i need to make sure that patient_id's are not split into different groups (ie patient_id=1 is not split at 2nd observation and thus results in patient 1 in group 1 and 2)

    any feedback or alternative methods would be much appreciated.

    thx
    vishal

  • #2
    Code:
    gen group = mod(id, 400) + 1
    This is not quite what you asked for because the groups will not consist of consecutive id's, but it sounds like for your ultimate purpose that doesn't matter. All you need is 400 groups of roughly equal numbers of patients, correct?

    Comment


    • #3
      I believe so. i will sort the data by patient_id so they will be in order anyways, but the main thing is that each unique patient_id will be in one group and not split up.

      thanks!!!

      Comment

      Working...
      X