Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need assistance with histogram or bar graph

    I am having difficulty getting a histogram that was fairly easy to do in python with pyplot. I have a large data set that looks like this:
    v1 age gender test result count
    0 50 M Glu 188 183892
    1 80 F Cl 102 48907
    2 50 M Glu 42 13683
    3 30 M Glu 272 2859
    4 80 M Urea 68 37053
    5 40 M Glu 574 467
    6 60 M Cl 118 50066
    7 100 M Glu 749 1
    8 50 F Glu 393 714
    9 60 M Urea 140 2568

    There are 33,024 rows. The count represents the number of time a particular result, such as 188 in row 0, was recorded by a lab. I'd like to plot histogram of the tests (eventually broken down by ages, gender) with the count on the yaxis and the result (188) on the x-axis. It should look something like below. I want to do with every analytes - so glucose, urea, chloride, etc.


    Click image for larger version

Name:	glucose_m_f.png
Views:	1
Size:	33.0 KB
ID:	1588881



    I'm rather new to Stata - taking a course now - and have tried
    histogram count if test=="Glu", by(result)
    [also the 2way option]
    But get an error "too many sersets".

    Any advice?

    Thanks,
    Nathan

  • #2
    The graph posted I did with Python, by the way, but I'd like to move my analysis to Stata.

    Comment


    • #3
      Code:
      histogram result [fw=count] if test == "Glu"
      Cross-posted at https://stackoverflow.com/questions/...ram-assistance

      Please note our policy on cross-posting, which is that you are asked to tell us about it. https://www.statalist.org/forums/help#crossposting
      Last edited by Nick Cox; 08 Jan 2021, 09:18.

      Comment


      • #4
        Okay, thanks!
        I've played around and now have this:
        histogram result if test=="Glu" [fweight = count], bin(800) frequency xscale(range(0 800)) extend xlabel(, labels) by(, title(Glucose)) by(, legend(on)) by(gender)


        Click image for larger version

Name:	Picture1.jpg
Views:	1
Size:	25.2 KB
ID:	1588902


        But I don't know how to limit my axis to only include the the range 0 to 800. It goes all the way up to include, I assume, some outliers (6000). Tried the command noextend, but it says the option is not allowed.
        I've attached a figure of my output.
        I appreciate your patience and help!
        Nathan

        Comment


        • #5
          Code:
          if test=="Glu"  & Result < 800

          Comment

          Working...
          X