Dear Statalisters,

I generated a dataset and want to randomly divide it into 70/30. In the beginning of the do-file I used the command set seed, but somehow the division changes. Therefore a regression I ran on the 70% gives different estimates everytime. How do I get a consistent division based on a random variable?

The code is below. ZV2 is the random variable I generated and OOS should divide the data set OOS=0 (70% of the observations) and OOS=1 (30% of the observations). However, if I sum my dependent variable, the summary statistics of the subgroups are different. How can I divide the data set randomly, but with the same observations in the same subgroups everytime I run the do-file?

I really appreciate your help.

Kind regards

Steffen

I generated a dataset and want to randomly divide it into 70/30. In the beginning of the do-file I used the command set seed, but somehow the division changes. Therefore a regression I ran on the 70% gives different estimates everytime. How do I get a consistent division based on a random variable?

The code is below. ZV2 is the random variable I generated and OOS should divide the data set OOS=0 (70% of the observations) and OOS=1 (30% of the observations). However, if I sum my dependent variable, the summary statistics of the subgroups are different. How can I divide the data set randomly, but with the same observations in the same subgroups everytime I run the do-file?

I really appreciate your help.

Code:

set seed 100 gen ZV2 = runiform() label var ZV2 "random variable" xtile OOS = ZV2, nquantiles(10) replace OOS = 0 if OOS <=7 replace OOS = 1 if OOS >7 label var OOS "indicator" global y depvar sum $y if OOS==1 sum $y if OOS==0

Steffen

## Comment