Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can Stata plot a binned scatterplot with LOWESS and linear spline fitted values?


    Hi
    I have a panel dataset with T=46 and N=302
    I want to generate a figure that presents binned scatterplots (100 bins) for variations in two variables (namely rainfall and output). To outline the effect of rainfall on output I want to include Locally Weighted Scatterplot Smoothing (LOWESS) and linear spline fitted values generated using the full dataset.

    The figure that I am trying to replicate is included below. It is included in the study that I am following.

    Click image for larger version

Name:	fig.png
Views:	1
Size:	70.6 KB
ID:	1461222

    Thankyou

  • #2
    Yes, by running each part separately and then combining them in the final graph. I am using Michael Stepner's binscatter (ssc desc binscatter) but you could create your own binning.

    Code:
    clear*
    
    sysuse auto
    binscatter mpg weight, n(10) savedata(binned) replace line(none)
    clear
    qui do binned
    save binned,replace
     
    sysuse auto,clear
    mkspline w1 3000 w2 = weight
    qui reg mpg w1 w2
    local b0 = _b[_cons]
    local b1 = _b[w1]
    local b2 = _b[w2]
    
    sysuse auto,clear
    lowess mpg weight, gen(lowess)
    append using binned, gen(binned)
    line lowess weight,sort lp(dash) lc(black)  /// 
      || scatter mpg weight if binned == 1 , mc(blue) /// 
      || function x*`b1' + `b0' , range(1800 3000)  lc(black)  /// 
      ||  function x*`b2' + `b0' + 3000*`b1' - 3000*`b2', range(3000 4820)  lc(black)  /// 
      ||, legend(order(1 "Lowess" 2 "Binned scatter" 3 "Linear spline") pos(6) row(1) )
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	26.7 KB
ID:	1461282

    Comment


    • #3
      Thank you very much for your reply
      I will try it with my dataset

      Comment


      • #4
        Sir I tried to replicate the steps you specified. I am able to get the results for the example file but when I try it on my data the final graph is very different.
        It seems that I am doing something wrong in the final command to combine the figures together

        this is a portion of my data
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte STCODE int(DIST YEAR) float(logvop r)
         6 1 1966 5.677259  -1.1779529
         6 1 1967 6.283907    .9828553
         6 1 1968 6.400692  -.18393935
         6 1 1969 6.247097   -.7743905
         6 1 1970 6.448767   1.7449456
         6 1 1971 6.329062    .7415161
         6 1 1972 6.309865  -.27629706
         6 1 1973 6.399722   1.1444086
         6 1 1974 6.006107  -1.8205005
         6 1 1975  6.37343    .5367606
         6 1 1976 5.932044  -.10219114
         6 1 1977 6.482227   .06214114
         6 1 1978 6.115711    .2488333
         6 1 1979 5.972185  -1.3439512
         6 1 1980 6.416317   1.0611725
         6 1 1981 6.409436   -.1846914
         6 1 1982 6.391391   -.8723552
         6 1 1983 6.670462   -.2342865
         6 1 1984 6.517255   .07086942
         6 1 1985    6.692  -.10699667
         6 1 1986 6.526544    .6282512
         6 1 1987 6.499274  -1.1994483
         6 1 1988 6.564115  -1.3942657
         6 1 1989 6.448647   -.6411176
         6 1 1990 6.862115   1.9549978
         6 1 1991 6.861378   -.5843463
         6 1 1992 6.838742  -.18972655
         6 1 1993 6.934593  -.12059698
         6 1 1994 7.032726    2.449871
         6 1 1995 7.051254  -.03518692
         6 1 1996 6.974533   -.8119386
         6 1 1997 6.807908    .4041088
         6 1 1998 6.685565   -.3942535
         6 1 1999 7.019169   .08690574
        14 1 2000 6.358608  -1.5058796
        14 1 2001 7.025798   1.2669252
        14 1 2002 6.482667   -.9645327
        14 1 2003 7.226005      2.4493
        14 1 2004 7.014306   -.9560327
        14 1 2005 7.011805   1.1271302
        14 1 2006  7.22119 -.070854336
        14 1 2007 7.178465   .19920576
        14 1 2008 6.815814   -.8079501
        14 1 2009 6.778997   -.6651635
        14 1 2010 7.384879   .08527038
        14 1 2011        .   .17337847
         6 2 1966 6.028281   -.7572779
         6 2 1967 6.184847    .4431139
         6 2 1968 6.084245    -.555818
         6 2 1969 6.173935    .4710631
         6 2 1970 6.150467   .54937965
         6 2 1971 6.284922  -.25100642
         6 2 1972 6.103499  -1.0700845
         6 2 1973 6.264443   -.3649358
         6 2 1974 6.098851  -1.9029367
         6 2 1975 6.334315      .76828
         6 2 1976 5.990272    .9626099
         6 2 1977 6.320644    .7810539
         6 2 1978 6.256342    .9499493
         6 2 1979   6.0175   -.8267344
         6 2 1980 6.197495   .54008275
         6 2 1981 6.191246   -.7939219
         6 2 1982 6.133012   -.7449553
         6 2 1983 6.215141    .3787305
         6 2 1984 6.196711   -.3558205
         6 2 1985 6.832435   -.6071193
         6 2 1986 6.183384  .023146177
         6 2 1987 6.145241   -.9593923
         6 2 1988 6.473642   -.6139163
         6 2 1989 6.466487  .017055646
         6 2 1990 6.462101   2.5522425
         6 2 1991 6.557223   -.3533959
         6 2 1992 6.414104    .1258135
         6 2 1993 6.504313   -.3159244
         6 2 1994 6.495754   1.7836902
         6 2 1995 6.477999    .6945031
         6 2 1996 6.606445   -.6948602
         6 2 1997 6.237485   -.7846626
         6 2 1998 6.479868   -.8081186
         6 2 1999 6.649133   .13751909
        14 2 2000 6.373062  -1.0268312
        14 2 2001 6.881413    1.238238
        14 2 2002 5.955156  -1.7278012
        14 2 2003 7.028116    1.121923
        14 2 2004 6.763348   -.4383818
        14 2 2005 6.764399   -.0814719
        14 2 2006 6.815685   2.5424116
        14 2 2007   6.9569   1.0208739
        14 2 2008 6.721241   -.3963105
        14 2 2009 6.675034  -1.4405812
        14 2 2010 7.164248   1.3986948
        14 2 2011        .   -.6281134
         6 3 1966 6.366374   -.4680728
         6 3 1967 6.655726   1.4315002
         6 3 1968 6.644485  -.05696872
         6 3 1969 6.695407   -.8590334
         6 3 1970 6.977482   1.4325273
         6 3 1971 6.732531    .6923737
         6 3 1972 6.708264    -.364983
         6 3 1973 6.671087   1.2204392
        end

        I replicated the steps as below

        Code:
        use 1,clear
        binscatter logvop r, n(100) savedata(binned) replace line(none)
        clear
        qui do binned
        save binned,replace
         
        clear
        use 1,
        mkspline w1 0.5 w2=r
        qui reg logvop w1 w2
        local b0 = _b[_cons]
        local b1 = _b[w1]
        local b2 = _b[w2]
         
        use 1,clear
        lowess logvop r, gen(lowess)
        append using binned, gen(binned)
        
        line lowess r,sort lp(dash) lc(black) ///
          || scatter logvop r if binned == 1 , mc(blue) ///
          || function x*`b1' + `b0' , range(-3 0.5)  lc(black)  ///
          ||  function x*`b2' + `b0' + `b1' - 0.5*`b2', range(0.5 6)  lc(black)  ///
          ||, legend(order(1 "Lowess" 2 "Binned scatter" 3 "Linear spline") pos(6) row(1) )

        however my final graph looks as below


        Click image for larger version

Name:	Untitled.png
Views:	1
Size:	5.4 KB
ID:	1461428



        What Am I doing wrong?

        Thankyou







        Comment


        • #5
          The second function call needs to be

          Code:
          ||  function x*`b2' + `b0' + .5*`b1' - 0.5*`b2',

          Comment


          • #6
            Thank you very much for the clarification
            I was able to generate the graph (below)


            Click image for larger version

Name:	Untitled.png
Views:	1
Size:	10.5 KB
ID:	1461884




            Just one more doubt though.
            I try to generate the binned scatterplots for variations in the dependent (output) and independent variables (r) with regards to the different quantiles of another independent variable (input_ir).

            At present I have done this as follows


            Code:
            *(a) input_ir<.151032
             
            use 1
            keep if input_ir<.151032
            save 2, replace
            binscatter output r, n(100) savedata(binneda) replace line(none)
            clear
            qui do binneda
            save binneda,replace
             
            use 2,clear
            mkspline w1 0 w2 = r
            qui reg output w1 w2
            local b0 = _b[_cons]
            local b1 = _b[w1]
            local b2 = _b[w2]
             
            use 2,clear
            lowess output r, gen(lowess)
            append using binneda, gen(binneda)
            line lowess r,sort lp(dash) lc(black)  || scatter output r if binneda == 1 , mc(blue) || function x*`b1' + `b0' , range(-4 0)  lc(black) ||  function x*`b2' + `b0' + 0*`b1' - 0*`b2', range(0 4.5)  lc(black) ||, legend(order(1 "Lowess" 2 "Binned scatter" 3 "Linear spline") pos(6) row(1) ) legend(off)  ytitle("Log agricultural output (Rupees/ha.)") xtitle("Rainfall in district s.d.")  title("0-15% Irrigation-Share") name(f2, replace)
             
            *(b) input_ir<.15 & input_ir<.30
             
            clear
            use 1
            keep if input_ir>.15 & input_ir<.30
            save 3, replace
            binscatter output r, n(100) savedata(binnedb) replace line(none)
            clear
            qui do binnedb
            save binnedb,replace
            use 3,clear
            mkspline w1 0 w2 = r
            qui reg output w1 w2
            local b0 = _b[_cons]
            local b1 = _b[w1]
            local b2 = _b[w2]
            use 3,clear
            lowess output r, gen(lowess)
            append using binnedb, gen(binnedb)
            line lowess r,sort lp(dash) lc(black)  || scatter output r if binnedb == 1 , mc(blue) || function x*`b1' + `b0' , range(-4 0)  lc(black) ||  function x*`b2' + `b0' + 0*`b1' - 0*`b2', range(0 4.5)  lc(black) ||, legend(order(1 "Lowess" 2 "Binned scatter" 3 "Linear spline") pos(6) row(1) ) legend(off)  ytitle("Log agricultural output (Rupees/ha.)") xtitle("Rainfall in district s.d.")  title("15-30% Irrigation-Share") name(f3, replace)
             
             
            *(c) input_ir>.30 & input_ir<.45
            clear
            use 1
            keep if input_ir>.30 & input_ir<.45
            save 4, replace
            binscatter output r, n(100) savedata(binnedc) replace line(none)
            clear
            qui do binnedc
            save binnedc,replace
            use 4,clear
            mkspline w1 0 w2 = r
            qui reg output w1 w2
            local b0 = _b[_cons]
            local b1 = _b[w1]
            local b2 = _b[w2]
            use 4,clear
            lowess output r, gen(lowess)
            append using binnedc, gen(binnedc)
            line lowess r,sort lp(dash) lc(black)  || scatter output r if binnedc == 1 , mc(blue) || function x*`b1' + `b0' , range(-4 0)  lc(black) ||  function x*`b2' + `b0' + 0*`b1' - 0*`b2', range(0 4.5)  lc(black) ||, legend(order(1 "Lowess" 2 "Binned scatter" 3 "Linear spline") pos(6) row(1) ) legend(off) ytitle("Log agricultural output (Rupees/ha.)") xtitle("Rainfall in district s.d.")  title("30-45% Irrigation-Share") name(f4, replace)
             
            *(d) input_ir >45
            clear
            use 1
            keep if input_ir>.45
            save 5, replace
            binscatter output r, n(100) savedata(binnedd) replace line(none)
            clear
            qui do binnedd
            save binnedd,replace
            use 5,clear
            mkspline w1 0 w2 = r
            qui reg output w1 w2
            local b0 = _b[_cons]
            local b1 = _b[w1]
            local b2 = _b[w2]
            use 5,clear
            lowess output r, gen(lowess)
            append using binnedd, gen(binnedd)
            line lowess r,sort lp(dash) lc(black)  || scatter output r if binnedd == 1 , mc(blue) || function x*`b1' + `b0' , range(-3 0)  lc(black) ||  function x*`b2' + `b0' + 0*`b1' - 0*`b2', range(0 4.5)  lc(black) ||, legend(order(1 "Lowess" 2 "Binned scatter" 3 "Linear spline") pos(6) row(1) ) legend(off) ytitle("Log agricultural output (Rupees/ha.)") xtitle("Rainfall in district s.d.")  title("30-45% Irrigation-Share") name(f5, replace)
            graph combine f2 f3 f4 f5
            graph export FIGURE3.tif, replace width(800) height(600)

            So i essentially retain data pertaining to the values of input_ir for which i want to generate the graph and then run the same code four times.



            Is there any other means of achieving the same result or is my procedure correct??
            Thank you

            Comment


            • #7
              Hi, I am facing the same challenge. Still, it is not clear to me how you defined the range (-3 0.5) in function 1?

              Thanks in advance

              Comment

              Working...
              X