for loops

Vishal Sharma

Join Date: Sep 2018

Posts: 60
#1

for loops

24 May 2019, 15:40

hello,

im trying to split a huge data set into about 400 files so that i can run an analysis on them with stata using collapse and reshape. the larger the number of files i can split into, the faster it will run.

since i have panel data that consists of multiple entries for each patient_id, i have to make sure that identical id's will stay together. so i am trying to form groups of Id's.

my data looks like this:

patient_id x y z

1

1

1

2

3

3

3

4

4

I d like to group the patient_id's like this:
patient_id group x y z

1 1

1 1

1 1

2 1

3 2

3 2

3 2

4 2

4 2

i m looking for a way to automate these commands to group the patient_ids:
gen group=1 if patient_id<=2
replace group=2 if patient_id >2 & patient_id<=4
replace group=3 ....... and so on for 400 different groups.

i need to make sure that patient_id's are not split into different groups (ie patient_id=1 is not split at 2nd observation and thus results in patient 1 in group 1 and 2)

any feedback or alternative methods would be much appreciated.

thx
vishal
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

24 May 2019, 16:17

Code:

gen group = mod(id, 400) + 1

This is not quite what you asked for because the groups will not consist of consecutive id's, but it sounds like for your ultimate purpose that doesn't matter. All you need is 400 groups of roughly equal numbers of patients, correct?
Comment
Vishal Sharma

Join Date: Sep 2018

Posts: 60
#3

24 May 2019, 18:10

I believe so. i will sort the data by patient_id so they will be in order anyways, but the main thing is that each unique patient_id will be in one group and not split up.

thanks!!!
Comment

patient_id	x	y	z
1
1
1
2
3
3
3
4
4

patient_id	group	x	y	z
1	1
1	1
1	1
2	1
3	2
3	2
3	2
4	2
4	2

Announcement

Comment

Comment