Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Binscatter2 - faster, enhanced binned scatterplots in Stata

    Hi all,

    I wanted to make note of a program that I've had available on GitHub for a while now to generate binned scatterplots in Stata, like Michael Stepner's excellent -binscatter- package. Binscatter2 inherits all of the usage and syntax of binscatter, but runs substantially faster in large datasets by leveraging the functionality of -gtools-. Furthermore, binscatter2 offers a handful of new bells and whistles -- expanded options for saving, fit lines, plotting quantile intervals of the data, etc.

    This project is still very much ongoing, and I hope to submit it to SSC very soon.

    You can read more it here: https://github.com/mdroste/stata-binscatter2

  • #2
    I also want to note that another binscatter alternative has popped up in the last few days, binsreg: https://sites.google.com/site/nppackages/binsreg. Binsreg offers some features that binscatter2 also adds - like more flexible fit lines. Binsreg has more robust options for inference (i.e. overlaying confidence intervals), but less support for looking at the conditional variance of y given x (i.e. overlaying quantiles). One key difference that distinguishes binscatter2 with big datasets is runtime:

    Code:
    . clear all 
    
    . set obs 25000000
    number of observations (_N) was 0, now 25,000,000
    
    . gen x = runiform()
    
    . gen y = 1 + 2*x + 3*x^2 + rnormal()
    
    . set rmsg on
    r; t=0.00 16:47:33
    
    . binscatter y x
    r; t=93.44 16:49:07
    
    . binscatter2 y x
    r; t=20.70 16:49:28
    
    . binsreg y x, nbins(20)
    
    Binscatter plot
    Bin selection method: User-specified
    Placement: Quantile-spaced
    Derivative: 0
    
    ----------------------------------------------
    # of observations             | 25000000
    # of distinct values          | 16455446
    # of clusters                 |       .
    ------------------------------+---------------
    Bin selection:                | 
             Degree of polynomial |       .
      # of smoothness constraints |       .
                        # of bins |      20
    ----------------------------------------------
    
    ----------------------------------------
             |      p       s       df
    ---------+------------------------------
     dots    |      0       0       20
    ----------------------------------------
    r; t=281.96 16:54:10
    Binsreg also offers an alternative method of residualizing a set of controls that the authors argue is more principled than what binscatter and binscatter2 currently incorporate. I will look into adding this alternative residualization in the near future.

    Comment

    Working...
    X