Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Insert values from a cycle into a dataset

    Hello,

    I am trying to run the following cycle in Stata:

    levels clinic_code, local(clinic_code_levels)
    foreach p of local clinic_code_levels {
    levels month, local(month_levels)
    foreach m of local month_levels {
    conindex nr_events if clinic==`"p'"' & month==`m', rankvar(person_age) true zero generalized
    noi disp "`p' `m': " ("r(CI)-1.96*r(CIse) ", " r(CI)+1.96*r(CIse) ")"
    }
    }

    Right now I am able to see all the output in the main window, but I will end up with too many lines with the different clinics and months, almost close to 30k.

    Does anyone know if it is possible to automatically insert those values into a new dataset like this:
    Clinic `p' Month `m' concentration index
    r(CI)
    95% confidence interval (lower)
    r(CI)-1.96*r(CIse)
    95% confidence interval (upper)
    r(CI)+1.96*r(CIse)

    Many thanks for your help.

  • #2
    Try statsby, here is an example:

    Code:
    sysuse nlsw88, clear
    
    * Start with the simplest command without if:
    conindex wage, truezero
    * Check what are returned
    return list
    
    * Set up the statsby command.
    * Married and Collgrad can be replaced by clinic and month
    statsby conint = r(CI) conse = r(CIse), by(married collgrad) clear: conindex wage, truezero
    
    * Check data
    list
    Result
    Code:
         +--------------------------------------------------+
         | married           collgrad     conint      conse |
         |--------------------------------------------------|
      1. |  Single   Not college grad   .3403089   .0132737 |
      2. |  Single       College grad   .3000356   .0132422 |
      3. | Married   Not college grad   .3052576   .0087177 |
      4. | Married       College grad   .2890811   .0097662 |
         +--------------------------------------------------+
    Then, using generate command, compute the lower and upper bounds after the 30k-line data is finished.

    Comment


    • #3
      It works perfectly, thanks very much Ken Chui
      Last edited by Daniela Rodrigues; 22 Oct 2022, 12:40.

      Comment


      • #4
        Ken Chui,

        Can I ask one last question about this please? I managed to run the code for 6 groups of clinics and months in seconds, but now when I run the code for the whole dataset, it is taking so long. Do you know what it means the +1+2+3+4+5 ..................... 50 .... that appear in the Stata window while running the statsby command? Would this mean that only 50 groups out of 30k were processed in half day?

        Many thanks.

        Comment


        • #5
          Originally posted by Daniela Rodrigues View Post
          Ken Chui,

          Can I ask one last question about this please? I managed to run the code for 6 groups of clinics and months in seconds, but now when I run the code for the whole dataset, it is taking so long. Do you know what it means the +1+2+3+4+5 ..................... 50 .... that appear in the Stata window while running the statsby command? Would this mean that only 50 groups out of 30k were processed in half day?

          Many thanks.
          I believe so. I am not very knowledgeable with CPU time management. It may be worthy to make a new post about this and see if anyone can help you with that.

          Given what is presented here, I'd perhaps try creating some subsets data files so that you can process them in small batches and save the results batch by batch.

          Comment


          • #6
            The use of loops over levels of variables containing commands that then use -if- conditions restricting to those levels can be very slow in large data sets. And, internally, -statsby- uses that same approach. By contrast, -runby- speeds up the process considerably and can be used for most of these situations. In your case:
            Code:
            capture program drop one_clinic_month
            program define one_clinic_month
                conindex nr_events, rankvar(person_age) true zero generalized
                gen con_index = r(CI)
                gen ll95 = r(CI) = 1.96*r(CIse)
                gen ul95 = r(CI) + 1.96*r(CIse)
                exit
            end
            
            runby one_clinic_month, by(clinic month) verbose
            should do the trick. I suggest you try it first with a subset of your data containing only a few clinics and a few months to be sure that program one_clinic_month runs without errors and produces sensible results. (I am not familiar with the -conindex- program, which is not an official Stata command, so I can't be sure that my code is completely compatible with the way it works.) If you are satisfied that it is working properly, then eliminate the -verbose- option (so you won't get thousands of pages of output with the full data set) and add the -status- option (which will give periodic progress reports on how much of the data has been processed and an estimate of the time remaining to completion.)

            -runby- is written by Robert Picard and me, and is available from SSC.

            Added: Crossed with #5.
            Last edited by Clyde Schechter; 26 Oct 2022, 11:16.

            Comment


            • #7
              Many thanks both for your input on this. -runby- is now installed in my database and I just checked for a couple of clinics and months and it gives the same results. I will now run this program for the whole dataset.

              Once again - thank you.
              Last edited by Daniela Rodrigues; 31 Oct 2022, 07:26.

              Comment


              • #8
                Clyde Schechter,

                I just run your code against my whole dataset and it has already finished. Fantastic, thanks very much again.

                I just got some non-zero "by-group errors" from a particular month onwards, is there any way to inspect what these errors might be / what might be causing them?

                Many thanks.

                Comment


                • #9
                  Those will be combinations of clinic and month that appear in the original data but do not appear in the results. So if you

                  Code:
                  use original_data, clear
                  keep clinic month
                  duplicates drop
                  merge 1:1 clinic month using results_from_runby, keep(master) keepusing(clinic month) nogenerate
                  list, noobs clean
                  Stata will show them to you. (Replace the italicized parts of the code with the actual names of the original data set and the data set containing the results from -runby-.)

                  I don't know what -conindex- does or how it works. But what you should find when you delve more deeply into the findings here is that for those combinations of clinic and month, the data were in some way unsuitable for -conindex- to run. This kind of thing comes up commonly when -runby- is used with a program that does a regression: there are often -runby-groups that don't have enough observations to carry out the regression. Perhaps it will be something like that. But you'll have to look to see.

                  To see what the errors actually are, you can use the original data set, and keep only the clinic-month combinations that produced errors, and then re-do -runby-, adding the -verbose- option. That way you will see the error messages that program one_clinic_month threw. So, following the code above, it would be like this:

                  Code:
                  merge 1:m clinic month using original_data, keep(match) nogenerate
                  runby one_clinic_month, by(clinic month) verbose
                  Last edited by Clyde Schechter; 31 Oct 2022, 10:31.

                  Comment


                  • #10
                    This is very helpful, thank you Clyde Schechter.

                    Comment

                    Working...
                    X