Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphs with svy

    Dear all,

    I am trying to plot the following line chart with the svy command but getting an error message:

    Code:
    svy: twoway (kdensity lnwage if female == 0) (kdensity lnwage if female == 1)
    It says: multiple if conditions not allowed

    I couldn't find any helpful resource online. Any idea how to work around this?

    Thanks in advance.

  • #2
    The only survey information needed for such plots is the weights. Unfortunately, kdensity doesn't take probability weights (pweights), but it does take frequency weights (fweights). A similar question about graphing histograms for survey data came up in 2007. The answer is to create frequency weights proportional to the pweights. See this response to that question and Austin Nichols's improved answer following.

    If you call the new frequency weight freqwt,your code will be:
    Code:
    twoway (kdensity lnwage if female == 0 [fw = freqwt])  ///
           (kdensity lnwage if female == 1  [fw = freqwt])
    Last edited by Steve Samuels; 15 Oct 2018, 20:13.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Originally posted by Steve Samuels View Post
      The only survey information needed for such plots is the weights. Unfortunately, kdensity doesn't take probability weights (pweights), but it does take frequency weights (fweights). A similar question about graphing histograms for survey data came up in 2007. The answer is to create frequency weights proportional to the pweights. See this response to that question and Austin Nichols's improved answer following.

      If you call the new frequency weight freqwt,your code will be:
      Code:
      twoway (kdensity lnwage if female == 0 [fw = freqwt]) ///
      (kdensity lnwage if female == 1 [fw = freqwt])
      Thank you so much for your brilliant response. I just saw kdensity does take aweights. Can I use that or do I need to calculate fweights as you suggested?

      This link says it's fine to do so:

      https://www.stata.com/statalist/arch.../msg01383.html
      Last edited by Alina Faruk; 16 Oct 2018, 04:03.

      Comment


      • #4
        I think you need to compute the frequency weights. aweights don't give the same result.
        Code:
        sysuse auto, clear
        . twoway (kdensity weight [fw = mpg])(kdensity weight [aw = mpg]) ,saving(gwtry)
        (file gwtry.gph saved)
        . graph export gwtry.png
        Click image for larger version

Name:	gwtry.png
Views:	1
Size:	48.3 KB
ID:	1466070


        Last edited by Steve Samuels; 16 Oct 2018, 05:27.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Correction: you can get identical graphs with aweights and fweights if you set the same bandwidth for each. Below you see that the estimated densities are indistinguishable.

          Left to itself, as in the code above, kdensity set different bandwidths for the fweights and aweights for the commands above. You can see the bandwidths on the graphs if you issue two different kdensity commands instead of one twoway command. The choices were bw = 140 for the fweight graph and 289.9 for the aweight graph, a much greater amount of smoothing.
          This is a bit disturbing. Bandwidth choice is always a bit subjective unless some cross-validation procedure does the choosing.

          Code:
          . twoway (kdensity weight [fw = mpg], bw(140)) (kdensity weight [aw = mpg], bw(140))
          . graph export g2.png
          Click image for larger version

Name:	g2.png
Views:	1
Size:	40.9 KB
ID:	1466095



          Last edited by Steve Samuels; 16 Oct 2018, 08:28.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            Originally posted by Steve Samuels View Post
            Correction: you can get identical graphs with aweights and fweights if you set the same bandwidth for each. For the command in the last post,, kdensity set different bandwidths for the fweights and aweights. Below the figures are indistiguishable.
            Code:
            . twoway (kdensity weight [fw = mpg], bw(140)) (kdensity weight [aw = mpg], bw(140))
            . graph export g2.png
            [ATTACH=CONFIG]n1466095[/ATTACH]

            Thanks a lot again.

            From your first reply, should my calculation for fweight go like:

            codebook wgt
            g double wt=round(wgt*(1/units)) and then set fw=wt? The units is the one from -codebook- output.

            Comment


            • #7
              That should work, but I now recommend that you go with aweights. The Manual doesn't say how kdensity selects the bandwidths, but it's likely to be related to sample size: larger sample size, narrower bandwidth and less smoothing; smaller samples, wider bandwidths and more smoothing.

              Analytic weights are scaled to sum to the original sample size n. Ordinary probability weights, if not rescaled, sum to the population size N. With the formula in your post, the resulting frequency weights will sum to about 100 N. For real populations, this could be in the millions. Ordinarily, frequency weights sum to sample size.. kdensity has no way of knowing that this isn't the case here. It sees a huge "sample size" and and innocently chooses a very small bandwidth. As a result, the density shows more detail then the data can support.
              Last edited by Steve Samuels; 16 Oct 2018, 09:22.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Originally posted by Steve Samuels View Post
                That should work, but I now recommend that you go with aweights. The Manual doesn't say how kdensity selects the bandwidths, but it's likely to be related to sample size: larger sample size, narrower bandwidth and less smoothing; smaller samples, wider bandwidths and more smoothing.

                Analytic weights are scaled to sum to the original sample size n. Ordinary probability weights, if not rescaled, sum to the population size N. With the formula in your post, the resulting frequency weights will sum to about 100 N. For real populations, this could be in the millions. Ordinarily, frequency weights sum to sample size.. kdensity has no way of knowing that this isn't the case here. It sees a huge "sample size" and and innocently chooses a very small bandwidth. As a result, the density shows more detail then the data can support.
                Thank you so much for your insightful suggestions! I'll go for aweight then.

                Comment


                • #9
                  Originally posted by Steve Samuels View Post
                  That should work, but I now recommend that you go with aweights. The Manual doesn't say how kdensity selects the bandwidths, but it's likely to be related to sample size: larger sample size, narrower bandwidth and less smoothing; smaller samples, wider bandwidths and more smoothing.
                  Originally posted by Steve Samuels View Post
                  Analytic weights are scaled to sum to the original sample size n. Ordinary probability weights, if not rescaled, sum to the population size N. With the formula in your post, the resulting frequency weights will sum to about 100 N. For real populations, this could be in the millions. Ordinarily, frequency weights sum to sample size.. kdensity has no way of knowing that this isn't the case here. It sees a huge "sample size" and and innocently chooses a very small bandwidth. As a result, the density shows more detail then the data can support.
                  Hello Steve,
                  How is it that the formula Alina refers to (based on the post by Austin Nichols) results in frequency weights that sum to about 100 N? Alina does not provide details about her data so I am left wondering how do you know that the sum would be 100 N and not, for instance, another factor of N.
                  Thank you, Alvaro

                  Comment

                  Working...
                  X