Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survey data - using command svy

    Hello,

    I´m using a survey data (example below). I´m trying to understand the number of individuals that I have in the sample and the number of males and females.
    When I use the command "tab i_sex" the results state that I have 11 310 individuals (male: 41.36% and female: 58.64%)
    When I use weights with the command "svy: tab i_sex" the information that stata gives me is that I have 11 136 individuals (male: 43.76%and female: 56.24%).
    My question is: Why do I have this difference in my results?
    Thank you in advance.



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(pidp pid) byte(i_pno i_sex) int i_dvage
       76165  10689869 1 2 34
      732365  15752658 1 1 32
     1587125  17870879 1 2 51
     4849085 176725733 1 1 34
    68002725  10023526 1 2 63
    68008847        -8 1 2 59
    68010887        -8 1 2 53
    68029931        -8 2 1 48
    68031967        -8 1 2 69
    68035365  10403086 1 1 65
    68035367        -8 1 1 36
    68041487        -8 1 2 47
    68041491        -8 2 1 44
    68045567        -8 1 2 55
    68051007        -8 1 1 56
    68051011        -8 2 2 49
    68058487        -8 1 1 77
    68058491        -8 2 2 68
    68060531        -8 2 2 44
    68060533 160066204 2 2 61
    68060537 160066239 3 1 73
    68063247        -8 1 2 50
    68063927        -8 1 2 47
    68063931        -8 2 1 49
    68064605  10653872 1 1 68
    68064609  10653902 2 2 65
    68068007        -8 1 1 50
    68068082        -8 2 1 56
    68097245  10913629 1 2 67
    68097927        -8 1 2 68
    68120367        -8 1 2 66
    68120375        -8 1 2 37
    68125127        -8 1 2 51
    68125131        -8 2 1 26
    68125135        -8 3 2 21
    68133285  11218193 1 2 68
    68133289  11218282 1 2 32
    68136009  11234989 1 2 65
    68137365  11240547 1 2 63
    68138045  11242787 1 1 68
    68138049  11242817 2 2 67
    68138051        -8 2 2 63
    68144847        -8 1 1 49
    68144851        -8 2 2 41
    68148247        -8 1 1 69
    68150971        -8 2 2 58
    68150975        -8 3 1 29
    68155047        -8 1 2 60
    68155051        -8 2 1 65
    68157771        -8 2 2 23
    68159131        -8 2 2 37
    68160485  11418567 1 2 66
    68160489  11418591 1 1 41
    68173407        -8 1 2 60
    68180887        -8 1 2 47
    68184971        -8 2 2 42
    68185647        -8 1 2 55
    68187687        -8 1 1 62
    68187691        -8 2 2 58
    68191771        -8 1 2 45
    68193127        -8 1 2 64
    68195167        -8 1 1 74
    68195171        -8 2 2 74
    68195851        -8 2 2 43
    68197887        -8 1 2 56
    68197899        -8 3 2 22
    68197903        -8 4 1 19
    68199247        -8 1 1 33
    68207407        -8 1 2 72
    68207411        -8 2 1 78
    68211487        -8 1 1 65
    68214207        -8 1 1 58
    68216247        -8 1 2 43
    68218287        -8 1 1 74
    68231223        -8 2 2 18
    68238011        -8 2 2 59
    68262487        -8 1 1 47
    68266567        -8 1 2 80
    68278127        -8 1 2 70
    68288327        -8 1 2 44
    68288331        -8 2 1 44
    68291731        -8 1 2 62
    68293087        -8 1 2 49
    68293091        -8 2 1 54
    68293095        -8 3 1 29
    68293099        -8 4 1 26
    68293168        -8 2 1 28
    68294447        -8 1 1 60
    68294451        -8 2 2 44
    68297845  12521361 1 1 63
    68297849  12521396 2 2 61
    68297857  12521469 1 2 30
    68299207        -8 1 2 60
    68302611        -8 2 2 74
    68309407        -8 1 2 62
    68321647        -8 1 1 60
    68321651        -8 2 2 60
    68322327        -8 1 2 50
    68322331        -8 2 1 57
    68329807        -8 1 2 51
    end
    label values pid pid
    label def pid -8 "inapplicable", modify
    label values i_sex i_sex
    label def i_sex 1 "male", modify
    label def i_sex 2 "female", modify
    label values i_dvage i_dvage

  • #2
    I’m what it relates to the question above I think that this difference (between calculating the number of observations with or without the command svy) is most probably produced by missing values in the survey characteristics.
    can you please tell me the command to confirm if the difference is related to the missing data?

    Thank you in advance.

    Comment


    • #3
      In general terms, in survey data, not everyone has the same probability of being surveyed. For example, you might oversample racial minorities.

      in the US at least, the government sponsors several surveys that are representative of the full US population. If you summed up the probability weights of all the persons in the survey, you’d get a total equivalent to the estimated US (civilian, non institutionalized) population in the year in question. thus, in general terms, a tabulation of the raw data will always differ from svy tab.

      now, the odd part is that you actually have fewer people when you do svy tab than when you do a normal tabulation. Normally, the probability weights are 1 / the probability of being sampled. In the big US surveys, I know for a fact that the pweights are usually several thousand per observation. To your question, it does seem possible that for whatever reason, the survey designers may have given some people a pweight of zero, so they don’t affect svy tabulations at all. I can’t say this for a fact, because your sample data don’t seem to have any of the probability weights or strata ID, and they don’t include a svyset statement. Your sample have most of the respondents being given a pid of -8. That maybe sounds like person ID, and -8 maybe sounds like they aren’t supposed to be included in any final calculations for some reason.
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment

      Working...
      X