Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating a graph with volume of procedures and average line

    I would like to create a graph depicting volume of use of the procedure: hysteroscopy vs ID

    I have an extremely large database, tried to plot a histogram and it crashed stata. The problem being as I have a million records.

    With this data I would like to plot a graph (see picture) showing the volume of procedures (hysteroscopy) done per individual (represented by ID)
    with an average line representing the average number of hysteroscopies (procedures) between them all individuals are done

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(ID hyseteroscopy)
    1 1
    2 0
    3 1
    2 0
    2 0
    2 1
    3 1
    2 1
    1 1
    end

    Click image for larger version

Name:	Capture.PNG
Views:	2
Size:	298.0 KB
ID:	1686293

    Attached Files

  • #2
    I do think you need a histogram rather than the graph you have drawn. Here is some sample code. You can ignore much of the initial part, which is only designed to create a dataset with 100 million observations.

    Code:
    clear
    set obs 1000000
    set seed 12345
    
    gen `c(obs)' ID = _n
    expand 100
    gen byte hysteroscopy = rbinomial(1,0.1)
    
    * start your code here
    collapse (sum) hysteroscopy,  by(ID)
    
    sum hysteroscopy, meanonly
    local mean = r(mean)
    
    #delimit ;
    twoway histogram hysteroscopy,
            discrete frac horizontal ||
            pci `mean' 0 `mean' 0.15,
            legend(off)
            ytitle("Volume of hysteroscopies")
            xtitle("Fraction")
            text(`mean' 0.14 "Average", place(6))
            scheme(s1color)
            ;
    #delimit cr
    Click image for larger version

Name:	Screenshot 2022-10-22 at 7.51.22 AM.png
Views:	1
Size:	99.7 KB
ID:	1686311

    Last edited by Hemanshu Kumar; 21 Oct 2022, 20:51.

    Comment


    • #3
      I will look into this

      having a read through your code

      expand 100 what does this mean? gen byte hysteroscopy = rbinomial(1,0.1) What does rbinomial (1,0.1) — represent ?

      Comment


      • #4
        Again, those are bits of code that you can ignore, because I just wanted to create a large dataset of the same scale you have.

        But to explain:

        I first create a database with 1 million observations, each with a unique ID. Then I expand it so that there are 100 observations for each ID. Then I create a random variable hysteroscopy by drawing from a Binomial distribution with 1 trial and 10% probability of success (i.e. a binary variable with a 10% chance of being 1, else 0).

        Comment


        • #5
          Thanks for this, three other questions, with reference to the code below. What do they refer to ....

          pci `mean' 0 `mean' 0.15, // what does pci mean ? why mean 0 and mean 0.15?
          text(`mean' 0.14 "Average", place(6)) // Here I understand you're inserting average line, but the average line is at 10 so why is it being reference to place 6? mean 0.14?
          LAST QUESTION:
          Step 1:
          collapse(sum) hyseteroscopy, by (ID) ///creates a sum of all the ID numbers --> all ok

          But I can't get the following code below to work , the values remain the same as (step 1)

          STEP 2: CREATING THE AVERAGE

          . sum hyseteroscopy, meanonly


          . dataex in 1/3

          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float ID double hyseteroscopy
          1 2
          2 2
          3 2
          end
          ------------------ copy up to and including the previous line ------------------

          Listed 3 out of 3 observations

          . local mean = r(mean)

          . dataex in 1/3

          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float ID double hyseteroscopy
          1 2
          2 2
          3 2
          end
          ------------------ copy up to and including the previous line ------------------

          Listed 3 out of 3 observations

          The sum code is working , but stata is not interpreting the mean code . As you know The values should appear as below, but I remain with the sum only

          1 - Sum 2 --> Average 1
          2- Sum of 2 --> Average 0.4
          3 - Sum of 2 --> Average 1

          Comment


          • #6
            Ah this is a local macro using r scalars needs to run en block and not line by line

            sum hyseteroscopy, meanonly
            local mean = r(mean)

            // i think i found the reason why, the above is a local macro whereby meanonly is a quick command to generate the mean. This is then stored as a local macro ‘mean’ generated by code r(mean)

            the histogram code should be highlighed incorporating with the local macro
            however the pci i supose refers to the numbers you would like the y axis to start with ie 0 (min) and 0.15 is the max no?

            however, i don’t know what place(6) refers to….

            pci `mean' 0 `mean' 0.15, // what does pci mean ? why mean 0 and mean 0.15?
            text(`mean' 0.14 "Average", place(6)) // Here I understand you're inserting average line, but the average line is at 10 so why is it being reference to place 6? mean 0.14?
            Last edited by Martin Imelda Borg; 23 Oct 2022, 06:07.

            Comment


            • #7
              Some of this is about the way locations are specified in different contexts in Stata graphs. See
              Code:
              help twoway pci
              help added_text_options
              within the text location, you can specify the placement around the coordinates provided using compass- or clock-style directions:

              Code:
              help compassdirstyle
              "6" is effectively centre-bottom / south.

              Comment

              Working...
              X