Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Append within a loop

    Hello,

    I've used the following loop command to generate two data sets that I will be using to run a number of analysis:

    The idea is to have two random samples of obs 200. I then plan to simulate how different statistical tests perform by simulation (2000 reps). I've only included 10 here to save computation time. The main issue I'm having is that I have to clear the x and y variables each time to run the loop, but I would like to add them to each other so that I have complete data set at the end of all the sample variables.

    I tried using append in the loop (see below), but it just adds the same sample of 200 obs on top of each other

    local a = 100
    local c = 3

    local b = 90
    local d = 3

    local obs = 200
    local nsets = 10

    set seed 987654321

    forvalues i = 1/`nsets'{

    clear

    set obs `obs'

    generate x = rnormal(`a',`c')

    generate y =rnormal(`b',`d')

    ttest x=y, unpaired

    ttest x=y, unpaired unequal

    save x, replace

    save y, replace

    *I tried using append here by adding:
    *append using x

    }

    Thank you,

    Don
    Attached Files

  • #2
    Code:
    local a = 100
    local c = 3
    
    local b = 90
    local d = 3
    
    local obs = 200
    local nsets = 10
    
    set seed 987654321
    
    forvalues i = 1/`nsets'{
    clear
    set obs `obs'
    generate x = rnormal(`a',`c')
    generate y =rnormal(`b',`d')
    
    ttest x=y, unpaired
    ttest x=y, unpaired unequal
    
    preserve
    keep x
    save x, replace
    
    restore
    keep y
    save y, replace
    
    ren y x
    append using x
    
    }
    doing
    Code:
    save x, replace
    does not save variable x. It saves the entire dataset with name x.dta

    Also, for append, assuming you want a single variable/column with all values, you need to make sure variable names are consistent over different sets. If that is not what you want you can leave out the rename y x

    Comment


    • #3
      I can't follow all of this, but I can't see that you need any machinery for creating separate datasets when you know you want to combine them. Isn't this equivalent?

      Code:
      clear
      
      local a = 100
      local c = 3
      
      local b = 90
      local d = 3
      
      local obs = 200
      local nsets = 10
      set obs `=`obs' * `nsets''
      
      set seed 987654321
      
      set obs `obs'
      
      generate x = rnormal(`a',`c')
      
      generate y = rnormal(`b',`d')
      
      gen block = ceil(_n/`obs')
      su block, meanonly
      
      forval j = 1/`r(max) {
          ttest x=y if block == `b', unpaired
          ttest x=y if block == `b', unpaired unequal
      }
      To unravel the confusion in your appending, think through: What is in memory before you try to append? Just the current dataset. What are you trying to append? Another copy of the same. What is in memory at the end of the loop? Just a doubled copy of the last dataset, I think. But -- as above -- just work with one dataset divided into blocks.

      Comment


      • #4
        In #3 the loop should be a loop over b — not over j.

        Comment


        • #5
          Thank you Jorrit and Nick,

          I really appreciate your help. I think I might need to add more information (I thought it might be extraneous so I hadn't included it originally) to clear up my intentions.

          I'm trying to compare through simulation when an ordinary t test becomes unreliable c.f. with the Welch test. I plan to do this by creating two random samples with a normal distribution (I'll call them X1 and Y1). X1 and Y1 both contain 200 observations from a normal distribution. My plan had been to compare X1 and Y1 using a Student and Welch t test, then alter the variances in one group sequentially and re-run the Student and Welch t test.

          Such that it would like:

          X1 and Y1 analyse with Student/Welch t test

          X2 and Y2 analyse with Student/Welch t test

          X3 and Y3 analyse with Student/Welch t test

          ...

          X2000 and Y2000 analyse with student/Welch t test

          (I only included 10 runs in this code to save computation time)

          Where X_n and Y_n are random samples consisting of 200 observations drawn from a normal distribution. From what you've told me so far, I think I need to go back to the drawing board. I had previously tried to use the rclass command, but realised I was comparing 2000 means as opposed to 2000 t tests.

          Thanks again,

          Don

          Attached Files

          Comment


          • #6
            Thanks for the extra detail.

            I glanced at the attachment but it's quite hard work. An excess of blank lines and arbitrary indentation don't help, but more crucially, it's hard to follow what you're trying to do.

            Creating a variable with one name, copying it to another and then dropping the first is not needed or helpful. Just use the name you want in the first place.

            Jorrit Gosens already explained an apparent confusion about save, which you are ignoring. I think you're still very confused about save and append generally. Again: a save of the same dataset under different names can't help you. Overwriting a previous dataset with a current dataset can't help either, if your aim to combine them.

            Backing up: you are, it seems, running two approaches simultaneously in your code, one in which you (try to) simulate different datasets and combine them, and one in which you use simulate, which doesn't have the same approach. I could keep going with #3 -- which in turn was an attempt to correct and push forward what you showed in #1 -- but using simulate directly seems likely to be more helpful. Note that your notional means and SDs imply microscopic P-values, regardless.

            Code:
            clear
            
            local a = 100
            local c = 3
            
            local b = 90
            local d = 3
            
            
            set seed 987654321
            
            program drop wanted 
            
            program wanted, rclass 
                args a b c d 
                drop _all 
                set obs 200 
                generate x = rnormal(`a',`c')
                generate y = rnormal(`b',`d')
                ttest x=y, unpaired
                return scalar t1 = r(t)
                return scalar Pvalue1 = r(p)
                
                ttest x=y, unpaired unequal
                return scalar t2 = r(t)
                return scalar Pvalue2 = r(p)
            end 
            
            simulate, nodots reps(2000): wanted `a' `b' `c' `d'
            I have code for keeping all the datasets too, but I have to doubt that you really need it for this kind of exercise. This is unlikely to be exactly what you want, which remains vague, but it may help you see a way forward.

            Comment


            • #7
              Thanks Nick,

              That really helped. I managed to get exactly what I wanted after that. I had one question about setting the seed (I acknowledge that my current seed isn't a particularly random iteration, but it's what I started with so I haven't changed it yet), I'm using the same seed for both of my samples (x and y), do you think I would be better off using two separate seeds to generate the samples x and y?
              I've attached my code purely for completion in case someone else comes across this post.

              Kind regards,

              Don
              Attached Files

              Comment


              • #8
                I can't imagine any advantage in using two seeds rather than one.

                Comment

                Working...
                X