Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Svyset for multilevel analysis with two PSU variables

    This is my first time posting so I am sorry for any formatting errors. I am using the National Victimization Crime Survey to look at changes in crime reporting over time (a difference in difference analysis). The NCVS recommends using subgroups when conducting analyses and so the data needs to be svyset (that is, I can't include the weighting in the model code itself) and I am having trouble svysetting the data for a multilevel analysis.

    The data is incidents nested in people nested in households, which are the primary sample unit and my analysis has incidents nested within women.

    NCVS actually provides 3 separate data files at each level and I am using the incident level file, because my outcome is at the incident level (proportion of incidents reported to police).

    For reference, the svyset used for non-multilevel analyses with this particular (incident-level) dataset is: svyset V2118 [pweight=SERIES_WEIGHT], strata(strat) vce(linearized)


    The model I am trying to run: svy, subpop(sub): mepoisson prop_report TRUMP##LATINAW i.YEAR V3040 i.EDU i.MARITAL || IDPER:


    I have come across two errors when trying different ways of setting the data:

    1. svyset IDHH, weight(WGTHHCY) || _n, weight(WGTPERCY)

    This code produced this error: all observations in subpop() subpopulation have zero weights.

    After going back through all of my data management, I realized that the way NCVS has one set up the data to calculate incident totals and proportions sets the household (WGTHHCY) and person (WGTPERCY) weights to zero, so I think this is why I am getting this error. I don't think I am able to set up the data any other way based on the NCVS documentation and so it seems like this way of svysetting the data may not be feasible.

    2. I then came across another post on this forum where the poster was doing something similar so I played around with their code and came up with this:

    svyset V2117 V2118, strata(strat) weight(SERIES_WEIGHT) vce(linearized)

    The code produced this error: too many variables specified

    I think the issue here is that NCVS has two PSU variables and so I am not sure how to incorporate both into the svyset code. I tried using just one or other but I received this error: hierarchical groups are not nested within V2117 an error occurred when svy executed mepoisson.

    Since the issue is with the weights, I am not sure sharing a data sample would be helpful but I can try to use dataex, if needed.

    Any help would be greatly appreciated!


  • #2
    I'm not sure I follow all the details here, but regarding your second question, have you tried:
    Code:
    egen `c(obs_t)' psu = group(V2117 V2118)
    svyset psu, strata(strat) weight(SERIES_WEIGHT) vce(linearized)
    This code says, in effect, that the primary sampling unit is given by the pair of values of V2117 and V2118. Is that what you want to do?

    Added: I am troubled by
    I tried using just one or other but I received this error: hierarchical groups are not nested within V2117 an error occurred when svy executed mepoisson.
    Your only hierarchical group in this model is IDPER, which I take to be the person ID. The error message is calling your attention to the requirement that persons must be nested within PSUs. If this is failing with psu = V2117 by itself, it will only get worse with a more restrictive definition of PSU, such as the one I have proposed here. If V2118 is the actual primary sampling unit used in data collection, then that is what you should be using in your -svyset- command: the psu (and higher sampling unit) structure does not change with the analysis.

    I think the key issue here is with
    the way NCVS has one set up the data to calculate incident totals and proportions sets the household (WGTHHCY) and person (WGTPERCY) weights to zero, so I think this is why I am getting this error.
    The error in calculating weights is not something that can or should be corrected by tampering with the sampling design specification in svyset. It's a matter of calculating the weights properly. Unfortunately, I can't advise you on that based on what is shown here (and might not be able to do so even with more information). But I think you would be well advised to post back with an explanation, and probably with the code and example data (-dataex-, of course), that currently calculates these weights, and perhaps I, or somebody else following the thread, will be able to fix that. If nobody here can help you, you might have to contact NCVS for help.
    Last edited by Clyde Schechter; 01 Sep 2023, 11:19.

    Comment


    • #3
      Thanks so much, Clyde! To answer to some of your question, here is the guidance from NCVS on the PSU variables:

      "In a household survey with a cluster sample design,* PSUs are typically geographic areas that are the first level of selection in the sample. On the NCVS PUF, the actual PSUs are not available because of disclosure concerns. Instead, PSUs are approximated with the pseudo- stratum† (V2117) and half-sample§ (V2118) codes."

      The weights are all included in the dataset, so I am not calculating them myself .

      In that case, would my svyset code be:

      Code:
      egen `c(obs_t)' psu = group(V2117 V2118)
      svyset psu, IDPER weight(WGTPERCY), _n (SERIES_WEIGHT)
      I am unsure about the _n, as my understanding is that this might not be appropriate for cluster sampling?

      Or, based on the other post I saw here:

      Code:
      egen `c(obs_t)' psu = group(V2117 V2118)
      svyset psu, strata(strat) weight(SERIES_WEIGHT) vce(linearized)
      Thanks again!

      Update-- I tried to egen command you suggested and received this error:

      . egen `c(obs_t)' psu = group(V2117 V2118)]
      varlist not allowed
      r(101);
      Last edited by Kristin Bevilacqua; 01 Sep 2023, 15:39.

      Comment


      • #4
        There is an extraneous ] character at the end of that -egen- command. The error message is not really informative, even misleading, but I'm confident that ] is the problem.

        Also, I think that
        Code:
        svyset psu, IDPER weight(WGTPERCY), _n (SERIES_WEIGHT)
        should be
        Code:
        svyset psu, IDPER weight(WGTPERCY) || _n (SERIES_WEIGHT)
        with no second comma, and the || before _n.
        Last edited by Clyde Schechter; 01 Sep 2023, 16:43.

        Comment


        • #5
          Thank you so much, Clyde! For some reason, I am still getting the error that the weights in my subpopulation are zero when I run the regression code. I think I may need to reach out to NCVS to see if they have any guidance.

          Comment

          Working...
          X