Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Macros & loops, Putexcel and crash

    Dear statalist,

    I am working on a survey data from a small group of people (n=281). Among this group of people there are subgroups, identified by characteristics such as race, education level, income source etc. Subgroups can be mutually exclusive binary or categorical groups. For example, race (black, white, others categorical), education (with GED/without GED binary). There are about 35 subgroups.

    All survey question selections are binary yes and no. There are 6 question sets, with 15~20 binary question selections in each question set. For example: question set income source has 15 question selections (q1 source of income from self-employment Y/N; q2 source of income from full-time employment Y/N, q3 source of income from governmental assistance Y/N etc.). Each person can check multiple responses in each question set.

    I want to look at the associations between subgroups, and their answer to each of the question selections. Tests used include plain chi-square, exact, or mtest in mrtab. For instance: subgroup X (X=0 such as without GED, X=1 those with GED), compare the proportion (in a sense, prevalence) who answered yes to Y question (Y=0 no my income is not from self-employment, Y=1 yes I have income that is from self-employment). Basically 2 by 2 table.

    Because of the volume, I exported them to excel files, separated into different sheets by subgroups or sheets by question sets. Originally it worked out fine. But recently after I added a few codes, stata started to crash at times, or stopped suddenly with error r(603) "workbook filename.xlsx could not be loaded". It can occur at different timing after putexcels. But next time it might ran through the same code non-stop. Since it did not happen previously I think there must be problems with my code. I don't think Internet connection is a problem. I use Stata/MP14.2 on a distal server.


    Code:
      
    
    /*locals for question sets QS1 to QS6, each with binary Y/N question selections*/
    /*for clarity, use qx variables to represent original var names which contained no numbers*/
    local QS1 q1 q2 q3 q4 q5 q6 q7 [...] q12 q13 q14 q15  
    local QS2 a1 a2 a3 [...] a18 a19 a20
    local QS3 b1 b2 b3 [...] b17 b18 b19
    local QS4 c1 c2 c3 [...] c17 c18 c19
    local QS5 d1 d2 d3 [...] d17 d18 d19
    local QS6 e1 e2 e3 [...] e17 e18 e19
    
    /*subgroups were replaced by A B C D E, here for example 5 binary groups.
    ex: A==0 means no GED, A==1 means with GED diploma; B==0 no kids B==1 with kids etc.*/
    foreach subgroups in A B C D E {        
            foreach list in QS1 QS2 QS3 QS4 QS5 QS6 {
            putexcel set "date_questions.xlsx", sheet(`subgroups'_`list') modify
    
    /*the mrtab displays table with numbers who answered yes to all question selections in one question set, by subgroups*/
                        mrtab ``list'',by(`subgroups') include col chi2 mtest(bonferroni) title(`list') width(50)    
                        return list
                            matrix Am = r(responses)
                            matrix Bm = r(cases)
                            matrix Cm = r(mchi2)                      
                            putexcel L19= matrix(Am),names
                            putexcel L55= matrix(Bm),names
                            putexcel P19= matrix(Cm),names
                            putexcel L19= "`subgroups'"
                            putexcel L55= "`subgroups'"
                            putexcel P19= "`subgroups' chi square"
    
    /*Then local numbers for each subgroups, and how many of them selected yes to the question selection Q
    in a question set (ex. percentage checked yes to q2 in QS1 question set);
    s0 is the number of people of subgroup==0, s1 is numbers of people in subgroup==1;
    s0x1 how many among subgroup==0 who answered yes to question selection Q*/                  
                        local row=20
                        foreach Q in ``list''{
                            forval s=0/1{
                                count if `subgroups'==`s'
                                local s`s'=r(N)
                                    forval x=0/1{
                                        count if `subgroups'==`s' & `Q'==`x'
                                        local s`s'x`x'= r(N)                            
                                        }
                                }
    
    /*Then local proportion/prevalence separately for subgroup==0 and subgroup==1 who answered yes to question selection Q,
    and how they be displayed in excel, such as 82 (34.00%)*/    
                            local pv0: di `s0x1' " (" %5.2f ((`s0x1')/`s0')*100 "%)"
                            local pv1: di `s1x1' " (" %5.2f ((`s1x1')/`s1')*100 "%)"
                            local prvr= ((`s1x1')/`s1')/((`s0x1')/`s0')
                            local prvrdisp: di %5.2f `prvr'
    
    /*Obtain the chi square statistics, significance, and use macro to represent the format I want them to be in excel.
    Different significance can have different asterisks, such as p<0.001**. */    
                            tab `subgroups' `Q',chi exact
            
                                if r(p)<0.001{
                                    local p001: di "p<0.001**"
                                    putexcel I`row'=("`p001'")
                                    }
                                else if r(p)<0.01{
                                    local p01: di %5.3f r(p) "*"
                                    putexcel I`row'=("`p01'")
                                    }
                                else{
                                    local p: di %5.3f r(p)
                                    putexcel I`row'=("`p'")
                                    }
                                if r(p_exact)<0.001{
                                    local pex001: di %5.3f r(p_exact) "**"
                                    putexcel J`row'=("`pex001'")
                                    }
                                else if r(p_exact)<0.01{
                                    local pex01: di %5.3f r(p_exact) "*"
                                    putexcel J`row'=("`pex01'")
                                    }
                                else{
                                    local pex: di %5.3f r(p_exact)
                                    putexcel J`row'=("`pex'")
                                    }
                            local varlabel: var label `Q'
                            putexcel A`row'=("`Q'")
                            putexcel B`row'=("`varlabel'")
                            putexcel C`row'=("`s0x1'")
                            putexcel D`row'=("`pv0'")
                            putexcel E`row'=("`s1x1'")
                            putexcel F`row'=("`pv1'")
                            putexcel G`row'=("`prvrdisp'")
                            putexcel H`row'=(r(chi2))
                        local ++row
                        }
                            putexcel A19=("`list'")
                            putexcel B19=("`list' Label")
                            putexcel C19=("`subgroups'==0")
                            putexcel D19=("Prevalence Among `subgroups'==0")
                            putexcel E19=("`subgroups'==1")
                            putexcel F19=("Prevalence Among `subgroups'==1")
                            putexcel G19=("Prevalence Ratio")
                            putexcel H19=("Chi-square")
                            putexcel I19=("Chi-square Test")
                            putexcel J19=("Fisher's Exact Test")
                    }
                }
    }

    Each type of the variable (categorical, coded binary 1 and 2 etc.) has one different code set. After I break some of the loops the same thing still happened (crash or could not be loaded...). Is it that I should not use too many macros or putexcels?
    I apologize if these are confusing!!!! I really need advice... Thank you so much for the time and patience!!!



    *Codes I downloaded by searching:
    mrtab (by Ben Jann, Hilde Schaeper)
    *Without this forum, Stata manual or Stata journal I can never put these together..

    Last edited by Teresa Chiang; 07 May 2019, 21:22.

  • #2
    Your code is very heavy with -putexcel- commands and has very little computation. Computation is fast, writing to disk is slow. I'm pretty sure that what is happening is that sometimes, depending on other demands on the computer and the operating system, your code is sending data to the operating system for writing to disk faster than the operating system can keep up with. When that happens, the operating system tells Stata that the request to write to the file failed, and Stata responds to you with the error messages you are getting. They occur at seemingly random times because the pace at which your code sends requests to write and the pace at which the operating system can fulfill them depends on what else is going on in the computer (and, if the disk you are writing to is on a network drive, the traffic on the network.)

    The solution is to force your Stata code to slow down by putting -sleep- commands in. That will give the operating system time to catch up before your program places new demands on it. See -help sleep-. I'm not sure where the best place to put these commands is, nor exactly how long you should sleep, but I would probably start by putting a -sleep 500- command at the end of the loop that begins -foreach list in QS1 QS2 QS3 QS4 QS5 QS6-. If that doesn't get you through without problems, try increasing to -sleep 750- or -sleep 1000-. The idea is to give the operating system some time to catch up after it has received a large number of -putexcel- commands. There are so many in your code that you may need to have more than one -sleep- inside that loop, so if that still doesn't work, add another -sleep 500- somewhere in the middle of that loop, etc.

    This will slow down execution, but, if done enough, that will enable the code to run without breaking.

    Comment


    • #3
      See also Bill Rising's post within https://www.statalist.org/forums/for...-error-message

      Comment


      • #4
        You could reduce the amount of issued -putexcel- commands by changing for example
        Code:
        putexcel A`row'=("`Q'")
                                putexcel B`row'=("`varlabel'")
                                putexcel C`row'=("`s0x1'")
                                putexcel D`row'=("`pv0'")
                                putexcel E`row'=("`s1x1'")
                                putexcel F`row'=("`pv1'")
                                putexcel G`row'=("`prvrdisp'")
                                putexcel H`row'=(r(chi2))
        to
        Code:
        putexcel A`row'=("`Q'")  B`row'=("`varlabel'")  C`row'=("`s0x1'")  D`row'=("`pv0'")  E`row'=("`s1x1'")  F`row'=("`pv1'")  G`row'=("`prvrdisp'")  H`row'=(r(chi2))
        This should reduce the amount of write commands and therefore reduce the likelihood of time outs. You can still issue -sleep- commands, but they should be less needed now.

        Comment


        • #5
          Hi Clyde, hi Nick,

          Thank you both so much! It took hours to run. But it WORKED!

          I saw Bill Rising's post in the link (https://www.statalist.org/forums/for...-error-message). But the open option does not seem to work with MP 14.2 even after I updated it. Is it a 15.1 specific command? It will be great to use it in updated versions.

          Clyde, thank you so much for the code. I eventually adapted it into this, (basically, put it to sleep repetitively) and replaced all putexcel with putexcel_wait:
          Code:
          program define putexcel_wait
              capture putexcel `0'
              if _rc == 603 {
                  sleep 6000
                  putexcel `0'
              }
              if _rc == 603 {
                  sleep 6000
                  putexcel `0'
              }
              if _rc == 603 {
                  sleep 6000
                  putexcel `0'
              }
              else if _rc != 0 {
                  putexcel `0'
              }
          end
          I think Stata is satisfied with this amount of sleep. Wonder how many times it was used. Still I added a few sleep 50 in very few places but probably not needed after this adaptation.

          Comment


          • #6
            You did specify that you are using Stata 14.2, so it's possible that the update doesn't apply to it. See

            Code:
            help whatsnew
            in your Stata and search for references to putexcel.

            Comment


            • #7
              Hi Sven-Kristian,

              Thank you so much for the code!! I did work out non-stop and really is unbelievably fast! Thanks!

              Hi Nick,

              Thanks for the code! Yes I didn't see the open option for putexcel in either updates. But glad to learn how to search for it. Thanks!!


              Comment


              • #8
                Hi,

                I'm a new Stata user and have inherited code with the same issue. Error r(603) *file outputs/test.xlsx could not be saved*. It's very -putexcel- heavy and in a very long loop. I've tried the code below as suggested in a separate post but same issue
                Code:
                 putexcel set ..., open
                 putexcel close
                I would like to try out the code Teresa has specified. My question is where to insert this? My code looks a bit like this:
                Code:
                   forval reg=1(1)4 {
                use "data/webtable.dta", clear
                
                local reg_name="`: label cres `reg''"
                disp ("`region_name'")
                keep if cres==`reg'
                
                putexcel set "outputs/Ctable_test_${date}.xlsx", sheet("`region_name'") modify
                
                if `reg'==1 { 
                tabcount var1 ryar, v1(1/4) v2(2011/2019) matrix(cells)
                putexcel J22=matrix(cells)
                
                tabcount dummy ryar if inrange(s_pp,5,6), v1(1) v2(2011/2019) matrix(cells)
                putexcel J26=matrix(cells)
                
                }
                }
                I'm using Stata 15.4

                Comment


                • #9
                  To use the code in #5, place it up near the top of your do-file, before any -putexcel- commands. Then change all the -putexcel- commands in the rest of the do-file to -putexcel_wait-. So it would be something like this:

                  Code:
                  program define putexcel_wait
                      capture putexcel `0'
                      if _rc == 603 {
                          sleep 6000
                          putexcel `0'
                      }
                      if _rc == 603 {
                          sleep 6000
                          putexcel `0'
                      }
                      if _rc == 603 {
                          sleep 6000
                          putexcel `0'
                      }
                      else if _rc != 0 {
                          putexcel `0'
                      }
                  end
                  
                  //  PERHAPS OTHER STATA CODE INTERVENES HERE.
                  
                  forval reg=1(1)4 {
                      use "data/webtable.dta", clear
                  
                      local reg_name="`: label cres `reg''"
                      disp ("`region_name'")
                      keep if cres==`reg'
                  
                      putexcel_wait set "outputs/Ctable_test_${date}.xlsx", sheet("`region_name'") modify
                  
                      if `reg'==1 {
                          tabcount var1 ryar, v1(1/4) v2(2011/2019) matrix(cells)
                          putexcel_wait J22=matrix(cells)
                  
                          tabcount dummy ryar if inrange(s_pp,5,6), v1(1) v2(2011/2019) matrix(cells)
                          putexcel_wait J26=matrix(cells)
                  
                      }
                  }
                  That said, I have a few concerns:

                  1. Most important, if using the -open- option to -putexcel- is not solving your problem, I am worried that the cause of your difficulty is some other problem and won't be resolved by this either. Maybe that file really is read-only, or in a read-only directory or something like that. Also, this code doesn't look all that heavy on -putexcel-. The individual outputs don't appear to be particularly large,

                  2. I think that a 6000 millisecond sleep, as chosen by Teresa Chiang, is very long and really would only be needed in pretty extreme cases. It will slow you down enormously. May I suggest experimenting with changing all those 6000's down to, say, 500 or 1,000. That should suffice for most purposes.

                  3. Although it probably isn't related to the problem you're having, if webtable.dta is a large data set, reading it in four times followed by dropping most of its observations is wasteful of I/O time. You can do better by reading it in just once and conditioning the -tabcount- commands on cres == `reg'. Thus:

                  Code:
                  use "data/webtable.dta", clear // TAKEN OUTSIDE LOOP
                  forval reg=1(1)4 {
                  
                      local reg_name="`: label cres `reg''"
                      disp ("`region_name'")
                      // -keep if cres == `reg' COMMAND DELETED
                  
                      putexcel_wait set "outputs/Ctable_test_${date}.xlsx", sheet("`region_name'") modify
                  
                      if `reg'==1 {
                          tabcount var1 ryar if cres == `reg', v1(1/4) v2(2011/2019) matrix(cells)
                          putexcel_wait J22=matrix(cells)
                  
                          tabcount dummy ryar if inrange(s_pp,5,6) & cres == `reg', v1(1) v2(2011/2019) matrix(cells)
                          putexcel_wait J26=matrix(cells)
                  
                      }
                  }
                  (If webdata.dta is really massive, then really the best way to do this is with the -runby- command, by Robert Picard and me, from SSC.)

                  Comment


                  • #10
                    Thanks Clyde! I'll try out #5 in the first instance with smaller sleeps as suggested and see how that goes. Sorry I haven't done a great job of exemplifying my code, there is a lot more -putexcel-.

                    And to respond to your 1st concern. The file is definitely not a read-only or in a ready-only directory. Some of the outputs save but then the next output in the code does not. With the -open-, -close- putexcel option, I'd place this as such

                    Code:
                    forval reg=1(1)4 {
                    use "data/webtable.dta", clear
                    
                    local reg_name="`: label cres `reg' ' "
                    disp ("`region_name'")
                    
                    putexcel set "outputs/Ctable_test_${date}.xls", sheet("`region_name'") modify open
                    
                    if `reg'==1{
                    tabcount var1 ryar, v1(1/4) v2(2011/2019) matrix(cells)
                    putexcel J26=matrix(cells)
                    
                    tabcount dummy ryar if inrange(s_pp,5,6), v1(1) v2(2011/2019) matrix(cells)
                    
                    more -putexcel- commands
                    
                    putexcel close
                    Have I used this correctly?

                    I will need to update the code at some point so will also explore the other options you have suggested.

                    Thank you

                    Comment


                    • #11
                      You do appear to be using the -open- option of -putexcel- correctly.

                      Hope this all works out.

                      Comment

                      Working...
                      X