Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making a scatterplot less busy

    Hello, I've created a scatterplot with two variables, however my data set is extremely large which makes it extremely difficult to read my graph as there's data points everywhere. What I would like to do, is to have the scatterplot only show 50 quantiles of my variables. This way its easily readable. Is there any way of doing this in Stata?

    Thanks,
    Matt

  • #2
    What does "extremely large" mean? I have seen people use such adjectives for data sets with a few thousand observations ... or millions and millions.

    Univariate reductions are problematic for seeing bivariate structure. I would try something like

    Code:
    webuse nlswork, clear 
    scatter ln_wage age, ms(Oh) mcolor(stc1%20) || lpoly ln_wage age, degree(1) bwidth(5) legend(off) ytitle(`: var label ln_wage') xtitle(`: var label age')
    That is:

    open circles to cut down on scatter plot ink

    transparency ditto

    a scatter plot smoother to get a sense of overall relationship -- your bandwidth might need to be quite different

    the colour here stc1 assumes Stata 18, as is our default here, but choose any colour you like and can get

    Comment


    • #3
      Here are some possibilities:

      Code:
      webuse nhanes2, clear
      
      * Transparency:
      scatter bmi bpsystol, mcolor(%5)
      
      * Smaller symbol:
      scatter bmi bpsystol, msymbol(point)
      
      * A combination:
      scatter bmi bpsystol, msymbol(plus) mcolor(%5)
      
      * Add jittering if one or both are discrete
      scatter bmi houssiz, jitter(4 0) mcolor(%5)
      
      * Heat map
      * Use "search heatplot" to find and install
      heatplot bmi bpsystol, colors(plasma)
      
      * Hex plot
      hexplot bmi bpsystol, colors(plasma)
      
      * Sample 50% then plot
      set seed 894
      sample 50
      scatter bmi bpsystol, mcolor(%5)
      See here to learn more about the heatplot package.

      Comment

      Working...
      X