Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Plotting the kernel density of two variables

    Dear all,

    I am having trouble plotting the information I want.

    I have re-estimated an effect many times. Each time I have a coefficient and a p-value.
    Now I want to plot these two variables in the same graph. In particular, in the x-axis, I want to have the estimates. For the p-values they should correspond to their estimates while we see the values of p-values on the y-axis.

    Exactly I want to do something like this graph:


    Click image for larger version

Name:	kdensity.png
Views:	1
Size:	88.1 KB
ID:	1736483



    I tried the following, but did not work

    Code:
    twoway (kdensity bt yaxis(1)) (kdensity p_value yaxis(2)), ytitle(p-values)  xtitle(estimates)

    Reference:
    The graph is from:

    Cai, Xiqian, et al. "Does environmental regulation drive away inbound foreign direct investment? Evidence from a quasi-natural experiment in China." Journal of development economics 123 (2016): 73-85.









  • #2

    start here twoway (kdensity bt , yaxis(1)) (kdensity p_value , yaxis(2)), ytitle(p-values) xtitle(estimates)

    Comment


    • #3
      George Ford Thank you for your reply, but still did not work. p-values do not appear matched to the correponding estimates.

      Comment


      • #4
        Both your code and that of George Ford superimpose distributions, thus using the same scale. They won't be a matching at all. The implication of your example is that estimates can be positive and negative, but P-values are probabilities and aren't defined on the same support. I am not clear on exactly what you want, but a scatter plot of P-values versus estimates superimposed on a kernel density plot for estimates may be closer to what you seek.

        Comment


        • #5
          Nick Cox Thank you Nick for your valuable advice.

          Can you please suggest a code for how to do that:
          a scatter plot of P-values versus estimates superimposed on a kernel density plot for estimates
          Here is an example of what I have:
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float(bt p_value)
              .01336577   .4282892
             .021560743  .21537307
              .03502954  .07857537
              .01371774   .4258528
            -.022858405  .21152744
            -.004581473    .777771
              .00599894   .7083483
          -.00033208705   .9854289
            -.008083826   .6586059
            -.003713943   .8099309
              .02959704  .07729103
             .027934123  .11081823
            -.016084246  .34941545
             .023376163   .1984488
             .003092426    .867321
            -.015827673   .3777966
            -.011847218   .4632585
           -.0041433685   .8046878
            -.016890263  .27887505
             -.02094775  .22842145
             .005139845   .7389457
             .016214073   .3312222
            -.017489368   .3072372
            -.012045377   .4284746
            -.006244012   .7057787
             -.02172245  .15569894
             -.01914644  .23653756
              .02635433  .15090448
             .020387206  .22116593
            -.028617896  .07391637
           -.0016000004   .9229169
            -.002221337   .8974027
             .015534726   .3665278
             -.02292221   .2040977
             .013722046  .43374985
             -.03286881  .05570652
             .026982967  .13234204
             .001887578   .9147738
           .00005161876   .9975573
          -.00009462597   .9953542
               .0236893   .1814558
           -.0038717235   .8194008
            -.007435174   .6649306
             .019785725    .242389
             -.01984618   .2287324
              .02123988  .22247948
            .0002771266    .987209
             .015704906   .4215138
             -.02713231   .0932861
              .02774211  .12623751
            -.009935722   .5166992
             .019293023   .2430474
              .02324441   .1276388
             .007652088   .6540172
             -.00393161   .8261919
             .003095272   .8547215
             -.03703181 .033372622
              .02956998  .09853233
            .0019936757   .9039059
             -.03109947  .03803364
             -.00146657   .9333706
            -.005478648   .7672248
             .005534613   .7525676
            -.004545614   .7922928
              .02646708  .14619012
            .0007896366   .9658796
            .0008054378   .9623679
             -.01113071   .5416935
            -.018252024   .2920395
             .011158213  .52548945
            -.024388663  .12953149
             .016923096   .2992715
             .023845255  .16449974
             -.01237254   .4501697
             .016821215   .3761768
            -.007201594     .63915
             -.00985737  .58971345
             -.02256499   .1730924
            -.004933096   .7724282
             .029514384 .065749034
              .00948432    .565971
            -.024434466   .1635976
             .019852454   .2070752
             .003888288    .826704
             -.02943363 .067501575
             .003766531   .8207598
             -.02496421   .1183079
             .034611292  .05093163
            -.019339005   .2908824
             .005543359    .762997
            -.005688902   .7005327
            -.011471675  .50640994
             .016946334  .28846654
            -.017090684   .3340638
             .014523406   .3961129
          -.00008244145   .9959278
            -.005272575   .7793804
            -.007714209   .6347569
            -.011057503  .51290584
            -.011187763  .51452714
          end
          Last edited by Marry Lee; 10 Dec 2023, 05:38.

          Comment


          • #6
            Code:
            twoway kdensity estimates || scatter Pvalue estimates
            That said, I don't see any great advantage in superimposition as compared with juxtaposition, two panels or facets vertically aligned.

            A slavish copy of #1 is likely to confuse naive readers and even more experienced readers will have to work at it.

            The y axis is labelled P-value and indeed the values shown for P-values don't exceed 1, as should be true. But the scale goes up to 2, which experienced readers will grasp is to accommodate the probability densities, which are on a different scale. Some text should appear on a vertical axis.

            Either way, the axis labels on the x axis should surely include 0. There was obviously some specific reason for a line at -0.504, but otherwise better labels might have been -0.8(0.2)0.8 or -0.75(0.25)0.75.

            I realise the graph is from a published paper, but similar advice could well apply to your own project.

            Comment


            • #7
              Thank you Nick Cox, your suggested code gives exactly what I want. But my graph turned out not that much good as the density went up to 20, is that normal?

              Comment


              • #8
                Probability density is (here) not probability at all but has units that are the inverse of the units of your estimate. Values can be anything that is zero or positive so long as the density integrates to 1 over the support of the variable.

                This is my point amplified. P-values and probability density have quite different units. Evidently P-value will peak at 1 for zero estimate and decline away in either direction. The distribution of estimates may be different but to show both legibly in the same graph you need two separate scales and two y-axes.

                Code:
                Here's a silly example with some technique. 
                
                sysuse auto, clear
                
                gen price2 = price/100000
                
                twoway kdensity mpg || scatter price2 mpg, ms(Oh) yaxis(1 2) yla(.15 "15000" .1 "10000" .05 "5000" 0, axis(1) labcolor(stc2)) yla(, axis(2) labcolor(stc1)) ytitle(Probability density, axis(2) color(stc1)) ytitle(Price (USD), axis(1) color(stc2)) legend(off) xtitle(Miles per gallon)
                The probability density for mpg over a range of about 30 must have an average of the order of 0.03 (per mpg, or gallons/mi); in contrast prices are in thousands of dollars, so are some orders of magnitude different.

                Click image for larger version

Name:	twoscales.png
Views:	1
Size:	58.1 KB
ID:	1736667


                Your example makes more sense than that, but I still suspect you'd be better off with two graphs. If you do choose one graph with two scales, I strongly recommend colour coding to make clear which scale goes with what. (I could and should have gone further with colouring the axis ticks.)

                Comment


                • #9
                  Nick Cox Thank you so much. This is very helpful!

                  Comment


                  • #10
                    I should add that each axis should itself be coloured within a yscale() call.

                    Comment


                    • #11
                      Thank you Nick Cox, I will do that.

                      Comment

                      Working...
                      X