Plotting the kernel density of two variables

Marry Lee

Join Date: Nov 2020

Posts: 189
#1

Plotting the kernel density of two variables

08 Dec 2023, 09:29

Dear all,

I am having trouble plotting the information I want.

I have re-estimated an effect many times. Each time I have a coefficient and a p-value.
Now I want to plot these two variables in the same graph. In particular, in the x-axis, I want to have the estimates. For the p-values they should correspond to their estimates while we see the values of p-values on the y-axis.

Exactly I want to do something like this graph:

I tried the following, but did not work

Code:

twoway (kdensity bt yaxis(1)) (kdensity p_value yaxis(2)), ytitle(p-values) xtitle(estimates)

Reference:
The graph is from:

Cai, Xiqian, et al. "Does environmental regulation drive away inbound foreign direct investment? Evidence from a quasi-natural experiment in China." Journal of development economics 123 (2016): 73-85.
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3152
#2

08 Dec 2023, 12:15

start here twoway (kdensity bt , yaxis(1)) (kdensity p_value , yaxis(2)), ytitle(p-values) xtitle(estimates)
Comment
Marry Lee

Join Date: Nov 2020

Posts: 189
#3

10 Dec 2023, 03:13

George Ford Thank you for your reply, but still did not work. p-values do not appear matched to the correponding estimates.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35709
#4

10 Dec 2023, 03:38

Both your code and that of George Ford superimpose distributions, thus using the same scale. They won't be a matching at all. The implication of your example is that estimates can be positive and negative, but P-values are probabilities and aren't defined on the same support. I am not clear on exactly what you want, but a scatter plot of P-values versus estimates superimposed on a kernel density plot for estimates may be closer to what you seek.
Comment

Marry Lee

Join Date: Nov 2020
Posts: 189

10 Dec 2023, 05:35

Nick Cox Thank you Nick for your valuable advice.

Can you please suggest a code for how to do that:

a scatter plot of P-values versus estimates superimposed on a kernel density plot for estimates

Here is an example of what I have:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(bt p_value)
    .01336577   .4282892
   .021560743  .21537307
    .03502954  .07857537
    .01371774   .4258528
  -.022858405  .21152744
  -.004581473    .777771
    .00599894   .7083483
-.00033208705   .9854289
  -.008083826   .6586059
  -.003713943   .8099309
    .02959704  .07729103
   .027934123  .11081823
  -.016084246  .34941545
   .023376163   .1984488
   .003092426    .867321
  -.015827673   .3777966
  -.011847218   .4632585
 -.0041433685   .8046878
  -.016890263  .27887505
   -.02094775  .22842145
   .005139845   .7389457
   .016214073   .3312222
  -.017489368   .3072372
  -.012045377   .4284746
  -.006244012   .7057787
   -.02172245  .15569894
   -.01914644  .23653756
    .02635433  .15090448
   .020387206  .22116593
  -.028617896  .07391637
 -.0016000004   .9229169
  -.002221337   .8974027
   .015534726   .3665278
   -.02292221   .2040977
   .013722046  .43374985
   -.03286881  .05570652
   .026982967  .13234204
   .001887578   .9147738
 .00005161876   .9975573
-.00009462597   .9953542
     .0236893   .1814558
 -.0038717235   .8194008
  -.007435174   .6649306
   .019785725    .242389
   -.01984618   .2287324
    .02123988  .22247948
  .0002771266    .987209
   .015704906   .4215138
   -.02713231   .0932861
    .02774211  .12623751
  -.009935722   .5166992
   .019293023   .2430474
    .02324441   .1276388
   .007652088   .6540172
   -.00393161   .8261919
   .003095272   .8547215
   -.03703181 .033372622
    .02956998  .09853233
  .0019936757   .9039059
   -.03109947  .03803364
   -.00146657   .9333706
  -.005478648   .7672248
   .005534613   .7525676
  -.004545614   .7922928
    .02646708  .14619012
  .0007896366   .9658796
  .0008054378   .9623679
   -.01113071   .5416935
  -.018252024   .2920395
   .011158213  .52548945
  -.024388663  .12953149
   .016923096   .2992715
   .023845255  .16449974
   -.01237254   .4501697
   .016821215   .3761768
  -.007201594     .63915
   -.00985737  .58971345
   -.02256499   .1730924
  -.004933096   .7724282
   .029514384 .065749034
    .00948432    .565971
  -.024434466   .1635976
   .019852454   .2070752
   .003888288    .826704
   -.02943363 .067501575
   .003766531   .8207598
   -.02496421   .1183079
   .034611292  .05093163
  -.019339005   .2908824
   .005543359    .762997
  -.005688902   .7005327
  -.011471675  .50640994
   .016946334  .28846654
  -.017090684   .3340638
   .014523406   .3961129
-.00008244145   .9959278
  -.005272575   .7793804
  -.007714209   .6347569
  -.011057503  .51290584
  -.011187763  .51452714
end

Last edited by Marry Lee; 10 Dec 2023, 05:38.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35709
#6

10 Dec 2023, 05:50

Code:

twoway kdensity estimates || scatter Pvalue estimates

That said, I don't see any great advantage in superimposition as compared with juxtaposition, two panels or facets vertically aligned.

A slavish copy of #1 is likely to confuse naive readers and even more experienced readers will have to work at it.

The y axis is labelled P-value and indeed the values shown for P-values don't exceed 1, as should be true. But the scale goes up to 2, which experienced readers will grasp is to accommodate the probability densities, which are on a different scale. Some text should appear on a vertical axis.

Either way, the axis labels on the x axis should surely include 0. There was obviously some specific reason for a line at -0.504, but otherwise better labels might have been -0.8(0.2)0.8 or -0.75(0.25)0.75.

I realise the graph is from a published paper, but similar advice could well apply to your own project.
1 like
Comment
Marry Lee

Join Date: Nov 2020

Posts: 189
#7

10 Dec 2023, 07:39

Thank you Nick Cox, your suggested code gives exactly what I want. But my graph turned out not that much good as the density went up to 20, is that normal?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35709
#8

10 Dec 2023, 09:10

Probability density is (here) not probability at all but has units that are the inverse of the units of your estimate. Values can be anything that is zero or positive so long as the density integrates to 1 over the support of the variable.

This is my point amplified. P-values and probability density have quite different units. Evidently P-value will peak at 1 for zero estimate and decline away in either direction. The distribution of estimates may be different but to show both legibly in the same graph you need two separate scales and two y-axes.

Code:

Here's a silly example with some technique. sysuse auto, clear gen price2 = price/100000 twoway kdensity mpg || scatter price2 mpg, ms(Oh) yaxis(1 2) yla(.15 "15000" .1 "10000" .05 "5000" 0, axis(1) labcolor(stc2)) yla(, axis(2) labcolor(stc1)) ytitle(Probability density, axis(2) color(stc1)) ytitle(Price (USD), axis(1) color(stc2)) legend(off) xtitle(Miles per gallon)

The probability density for mpg over a range of about 30 must have an average of the order of 0.03 (per mpg, or gallons/mi); in contrast prices are in thousands of dollars, so are some orders of magnitude different.

Your example makes more sense than that, but I still suspect you'd be better off with two graphs. If you do choose one graph with two scales, I strongly recommend colour coding to make clear which scale goes with what. (I could and should have gone further with colouring the axis ticks.)
1 like
Comment
Marry Lee

Join Date: Nov 2020

Posts: 189
#9

12 Dec 2023, 01:31

Nick Cox Thank you so much. This is very helpful!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35709
#10

12 Dec 2023, 05:10

I should add that each axis should itself be coloured within a yscale() call.
1 like
Comment
Marry Lee

Join Date: Nov 2020

Posts: 189
#11

13 Dec 2023, 02:20

Thank you Nick Cox, I will do that.
Comment

Announcement

Plotting the kernel density of two variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment