Graphs with svy

Alina Faruk

Join Date: Oct 2018

Posts: 96
#1

Graphs with svy

15 Oct 2018, 17:27

Dear all,

I am trying to plot the following line chart with the svy command but getting an error message:

Code:

svy: twoway (kdensity lnwage if female == 0) (kdensity lnwage if female == 1)

It says: multiple if conditions not allowed

I couldn't find any helpful resource online. Any idea how to work around this?

Thanks in advance.
Tags: graph, kdensity, svy, twoway, weights
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

15 Oct 2018, 20:09

The only survey information needed for such plots is the weights. Unfortunately, kdensity doesn't take probability weights (pweights), but it does take frequency weights (fweights). A similar question about graphing histograms for survey data came up in 2007. The answer is to create frequency weights proportional to the pweights. See this response to that question and Austin Nichols's improved answer following.

If you call the new frequency weight freqwt,your code will be:

Code:

twoway (kdensity lnwage if female == 0 [fw = freqwt]) /// (kdensity lnwage if female == 1 [fw = freqwt])

Last edited by Steve Samuels; 15 Oct 2018, 20:13.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Alina Faruk

Join Date: Oct 2018

Posts: 96
#3

16 Oct 2018, 03:46

Originally posted by Steve Samuels View Post

The only survey information needed for such plots is the weights. Unfortunately, kdensity doesn't take probability weights (pweights), but it does take frequency weights (fweights). A similar question about graphing histograms for survey data came up in 2007. The answer is to create frequency weights proportional to the pweights. See this response to that question and Austin Nichols's improved answer following.

If you call the new frequency weight freqwt,your code will be:

Code:

twoway (kdensity lnwage if female == 0 [fw = freqwt]) /// (kdensity lnwage if female == 1 [fw = freqwt])

Thank you so much for your brilliant response. I just saw kdensity does take aweights. Can I use that or do I need to calculate fweights as you suggested?

This link says it's fine to do so:

https://www.stata.com/statalist/arch.../msg01383.html

Last edited by Alina Faruk; 16 Oct 2018, 04:03.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

16 Oct 2018, 05:24

I think you need to compute the frequency weights. aweights don't give the same result.

Code:

sysuse auto, clear . twoway (kdensity weight [fw = mpg])(kdensity weight [aw = mpg]) ,saving(gwtry) (file gwtry.gph saved) . graph export gwtry.png

Last edited by Steve Samuels; 16 Oct 2018, 05:27.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

16 Oct 2018, 08:10

Correction: you can get identical graphs with aweights and fweights if you set the same bandwidth for each. Below you see that the estimated densities are indistinguishable.

Left to itself, as in the code above, kdensity set different bandwidths for the fweights and aweights for the commands above. You can see the bandwidths on the graphs if you issue two different kdensity commands instead of one twoway command. The choices were bw = 140 for the fweight graph and 289.9 for the aweight graph, a much greater amount of smoothing.
This is a bit disturbing. Bandwidth choice is always a bit subjective unless some cross-validation procedure does the choosing.

Code:

. twoway (kdensity weight [fw = mpg], bw(140)) (kdensity weight [aw = mpg], bw(140)) . graph export g2.png

Last edited by Steve Samuels; 16 Oct 2018, 08:28.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Alina Faruk

Join Date: Oct 2018

Posts: 96
#6

16 Oct 2018, 08:23

Originally posted by Steve Samuels View Post

Correction: you can get identical graphs with aweights and fweights if you set the same bandwidth for each. For the command in the last post,, kdensity set different bandwidths for the fweights and aweights. Below the figures are indistiguishable.

Code:

. twoway (kdensity weight [fw = mpg], bw(140)) (kdensity weight [aw = mpg], bw(140)) . graph export g2.png

[ATTACH=CONFIG]n1466095[/ATTACH]

Thanks a lot again.

From your first reply, should my calculation for fweight go like:

codebook wgt
g double wt=round(wgt*(1/units)) and then set fw=wt? The units is the one from -codebook- output.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

16 Oct 2018, 09:18

That should work, but I now recommend that you go with aweights. The Manual doesn't say how kdensity selects the bandwidths, but it's likely to be related to sample size: larger sample size, narrower bandwidth and less smoothing; smaller samples, wider bandwidths and more smoothing.

Analytic weights are scaled to sum to the original sample size n. Ordinary probability weights, if not rescaled, sum to the population size N. With the formula in your post, the resulting frequency weights will sum to about 100 N. For real populations, this could be in the millions. Ordinarily, frequency weights sum to sample size.. kdensity has no way of knowing that this isn't the case here. It sees a huge "sample size" and and innocently chooses a very small bandwidth. As a result, the density shows more detail then the data can support.

Last edited by Steve Samuels; 16 Oct 2018, 09:22.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Alina Faruk

Join Date: Oct 2018

Posts: 96
#8

16 Oct 2018, 11:55

Originally posted by Steve Samuels View Post

That should work, but I now recommend that you go with aweights. The Manual doesn't say how kdensity selects the bandwidths, but it's likely to be related to sample size: larger sample size, narrower bandwidth and less smoothing; smaller samples, wider bandwidths and more smoothing.

Analytic weights are scaled to sum to the original sample size n. Ordinary probability weights, if not rescaled, sum to the population size N. With the formula in your post, the resulting frequency weights will sum to about 100 N. For real populations, this could be in the millions. Ordinarily, frequency weights sum to sample size.. kdensity has no way of knowing that this isn't the case here. It sees a huge "sample size" and and innocently chooses a very small bandwidth. As a result, the density shows more detail then the data can support.

Thank you so much for your insightful suggestions! I'll go for aweight then.
Comment
Alvaro Gallegos

Join Date: Jul 2016

Posts: 1
#9

07 Aug 2019, 07:15

Originally posted by Steve Samuels View Post

That should work, but I now recommend that you go with aweights. The Manual doesn't say how kdensity selects the bandwidths, but it's likely to be related to sample size: larger sample size, narrower bandwidth and less smoothing; smaller samples, wider bandwidths and more smoothing.

Originally posted by Steve Samuels View Post

Analytic weights are scaled to sum to the original sample size n. Ordinary probability weights, if not rescaled, sum to the population size N. With the formula in your post, the resulting frequency weights will sum to about 100 N. For real populations, this could be in the millions. Ordinarily, frequency weights sum to sample size.. kdensity has no way of knowing that this isn't the case here. It sees a huge "sample size" and and innocently chooses a very small bandwidth. As a result, the density shows more detail then the data can support.

Hello Steve,

How is it that the formula Alina refers to (based on the post by Austin Nichols) results in frequency weights that sum to about 100 N? Alina does not provide details about her data so I am left wondering how do you know that the sum would be 100 N and not, for instance, another factor of N.

Thank you, Alvaro
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment