Append within a loop

Don Richardson

Join Date: Oct 2019

Posts: 11
#1

Append within a loop

19 Oct 2019, 23:15

Hello,

I've used the following loop command to generate two data sets that I will be using to run a number of analysis:

The idea is to have two random samples of obs 200. I then plan to simulate how different statistical tests perform by simulation (2000 reps). I've only included 10 here to save computation time. The main issue I'm having is that I have to clear the x and y variables each time to run the loop, but I would like to add them to each other so that I have complete data set at the end of all the sample variables.

I tried using append in the loop (see below), but it just adds the same sample of 200 obs on top of each other

local a = 100
local c = 3

local b = 90
local d = 3

local obs = 200
local nsets = 10

set seed 987654321

forvalues i = 1/`nsets'{

clear

set obs `obs'

generate x = rnormal(`a',`c')

generate y =rnormal(`b',`d')

ttest x=y, unpaired

ttest x=y, unpaired unequal

save x, replace

save y, replace

*I tried using append here by adding:
*append using x

}

Thank you,

Don
Attached Files

Append_in_loop.do (381 Bytes, 1 view)
Tags: None
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#2

20 Oct 2019, 01:51

Code:

local a = 100 local c = 3 local b = 90 local d = 3 local obs = 200 local nsets = 10 set seed 987654321 forvalues i = 1/`nsets'{ clear set obs `obs' generate x = rnormal(`a',`c') generate y =rnormal(`b',`d') ttest x=y, unpaired ttest x=y, unpaired unequal preserve keep x save x, replace restore keep y save y, replace ren y x append using x }

doing

Code:

save x, replace

does not save variable x. It saves the entire dataset with name x.dta

Also, for append, assuming you want a single variable/column with all values, you need to make sure variable names are consistent over different sets. If that is not what you want you can leave out the rename y x
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#3

20 Oct 2019, 01:52

I can't follow all of this, but I can't see that you need any machinery for creating separate datasets when you know you want to combine them. Isn't this equivalent?

Code:

clear local a = 100 local c = 3 local b = 90 local d = 3 local obs = 200 local nsets = 10 set obs `=`obs' * `nsets'' set seed 987654321 set obs `obs' generate x = rnormal(`a',`c') generate y = rnormal(`b',`d') gen block = ceil(_n/`obs') su block, meanonly forval j = 1/`r(max) { ttest x=y if block == `b', unpaired ttest x=y if block == `b', unpaired unequal }

To unravel the confusion in your appending, think through: What is in memory before you try to append? Just the current dataset. What are you trying to append? Another copy of the same. What is in memory at the end of the loop? Just a doubled copy of the last dataset, I think. But -- as above -- just work with one dataset divided into blocks.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#4

20 Oct 2019, 06:10

In #3 the loop should be a loop over b — not over j.
Comment
Don Richardson

Join Date: Oct 2019

Posts: 11
#5

21 Oct 2019, 04:55

Thank you Jorrit and Nick,

I really appreciate your help. I think I might need to add more information (I thought it might be extraneous so I hadn't included it originally) to clear up my intentions.

I'm trying to compare through simulation when an ordinary t test becomes unreliable c.f. with the Welch test. I plan to do this by creating two random samples with a normal distribution (I'll call them X1 and Y1). X1 and Y1 both contain 200 observations from a normal distribution. My plan had been to compare X1 and Y1 using a Student and Welch t test, then alter the variances in one group sequentially and re-run the Student and Welch t test.

Such that it would like:

X1 and Y1 analyse with Student/Welch t test

X2 and Y2 analyse with Student/Welch t test

X3 and Y3 analyse with Student/Welch t test

...

X2000 and Y2000 analyse with student/Welch t test

(I only included 10 runs in this code to save computation time)

Where X_n and Y_n are random samples consisting of 200 observations drawn from a normal distribution. From what you've told me so far, I think I need to go back to the drawing board. I had previously tried to use the rclass command, but realised I was comparing 2000 means as opposed to 2000 t tests.

Thanks again,

Don

Attached Files

rclass_loop.do (938 Bytes, 1 view)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#6

21 Oct 2019, 06:26

Thanks for the extra detail.

I glanced at the attachment but it's quite hard work. An excess of blank lines and arbitrary indentation don't help, but more crucially, it's hard to follow what you're trying to do.

Creating a variable with one name, copying it to another and then dropping the first is not needed or helpful. Just use the name you want in the first place.

Jorrit Gosens already explained an apparent confusion about save, which you are ignoring. I think you're still very confused about save and append generally. Again: a save of the same dataset under different names can't help you. Overwriting a previous dataset with a current dataset can't help either, if your aim to combine them.

Backing up: you are, it seems, running two approaches simultaneously in your code, one in which you (try to) simulate different datasets and combine them, and one in which you use simulate, which doesn't have the same approach. I could keep going with #3 -- which in turn was an attempt to correct and push forward what you showed in #1 -- but using simulate directly seems likely to be more helpful. Note that your notional means and SDs imply microscopic P-values, regardless.

Code:

clear local a = 100 local c = 3 local b = 90 local d = 3 set seed 987654321 program drop wanted program wanted, rclass args a b c d drop _all set obs 200 generate x = rnormal(`a',`c') generate y = rnormal(`b',`d') ttest x=y, unpaired return scalar t1 = r(t) return scalar Pvalue1 = r(p) ttest x=y, unpaired unequal return scalar t2 = r(t) return scalar Pvalue2 = r(p) end simulate, nodots reps(2000): wanted `a' `b' `c' `d'

I have code for keeping all the datasets too, but I have to doubt that you really need it for this kind of exercise. This is unlikely to be exactly what you want, which remains vague, but it may help you see a way forward.
1 like
Comment
Don Richardson

Join Date: Oct 2019

Posts: 11
#7

30 Oct 2019, 04:37

Thanks Nick,

That really helped. I managed to get exactly what I wanted after that. I had one question about setting the seed (I acknowledge that my current seed isn't a particularly random iteration, but it's what I started with so I haven't changed it yet), I'm using the same seed for both of my samples (x and y), do you think I would be better off using two separate seeds to generate the samples x and y?
I've attached my code purely for completion in case someone else comes across this post.

Kind regards,

Don
Attached Files

postfile .do (1.6 KB, 1 view)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#8

30 Oct 2019, 06:18

I can't imagine any advantage in using two seeds rather than one.
Comment

Announcement

Append within a loop

Comment

Comment

Comment

Comment

Comment

Comment

Comment