Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to write correct svyset?

    Dear all,

    I am using Labor Force Survey data in a logit analysis, and therefore need help with svyset before I run my regression.
    However, I am having trouble doing this as I am not very familiar with svyset. (I am only analyzing a sub-population, only individuals who are employed.)

    I am supposed to follow this structure according to: https://www.stata.com/support/faqs/s...stage-designs/

    Code:
     svyset su1 [pw=pwt], strata(strata1) fpc(fpc1) /// || su2, fpc(fpc2) || _n, fpc(fpc3)

    Is this correct? Should the urban rural variable be included somewhere?
    Code:
             
    svyset psu [pweight = weight], strata(prov)
    Any pointers are greatly appreciated!

    Thank you very much!


    - - - -
    The sampling design is as follows according to: http://catalog.ihsn.org/index.php/ca...tab=study-desc

    The sample size of the survey is 50,640 households per quarter, equivalent to 16,880 households per month. Sample size was designed to ensure the statistical significance of data for region by quarter and for province by year. Households were randomly selected from the 15% sample enumeration areas of the Population and Housing Census 2009 following a two-stage procedure:
    1. Selecting enumeration areas
    2. Selecting households. All residents ages 15 and above were interviewed and enumerated.

    Sample Frame: The sample of the 2013 Labour force survey is the two-stage stratified sample, presented for the whole country, urban/rural areas; 6 socio-economic regions, Hanoi and Ho Chi Minh City for quarterly and all centrally governed cities/provinces for yearly. Each centrally governed province, city constitutes a main stratum with two sub-stratums of urban areas and rural areas. The sample frame is the 15% sample enumeration areas of the 2009 Population and Housing Census.

    Sample design: The survey followed a two-stage stratified sampling procedure designed as follows:
    - Stage 1 (selecting enumeration areas): Each centrally governed city/province constitutes a main stratum, after that, each main stratum was divided into 2 sub-stratums within each representing "urban" and "rural" areas. Then, the list of enumeration areas of cities/provinces (the master sampling frame was taken from the sampling frame 15% of the Population and Housing Census 2009) was divided into 2 independent samples (urban and rural) and enumeration areas were chosen by the Kish method.

    - Stage 2 (selecting households): for each enumeration area defined in stage 1, 15 enumeration households (55 provinces) or 20 enumeration households (8 provinces: ) were systematically chosen.
    Last edited by Kim Veloso; 22 Jun 2018, 11:17.

  • #2
    You haven't given us much to go on. Name the important design variables mentioned in the document: main strata; the urban/rural variable (if there is one); enumeration areas; and hh id ; respondent id; the design weight.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Steve,

      Allow me to join this thread as this is relevant to the ongoing discussion.

      I have a confidentialised census sample which is 1% of the population from the Australian Bureau of Statistics the data methodology is given here
      HTML Code:
      http://www.abs.gov.au/ausstats/[email protected]/Latestproducts/2037.0.30.001Main%20Features202011?opendocument&tabname=Summary&prodno=2037.0.30.001&issue=2011&num=&view=
      The data is in 3 levels i,e Individual, family and dwelling with these respective IDs (ABSPID, ABSFID and ABSHID). It covers the whole country but areas are divided into states (STATE) and hh id is ABSFID ; respondent id is ABSPID; dwelling id is ABSHID, BUT the design weight is not given. a sample of some key variables including sex and age are as follows:-

      Code:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str14 ABSHID byte(ABSPID ABSFID Sex) float agegroup long STATE
      "CSF11B00000001" 0 1 2 6 1
      "CSF11B00000001" 0 1 1 3 1
      "CSF11B00000002" 0 1 2 4 1
      "CSF11B00000002" 0 1 1 3 1
      "CSF11B00000003" 0 1 1 1 1
      "CSF11B00000003" 0 1 2 5 1
      "CSF11B00000003" 0 1 1 1 1
      "CSF11B00000004" 0 1 1 5 1
      "CSF11B00000005" 0 1 2 9 1
      "CSF11B00000006" 0 1 1 4 1
      end
      label values Sex SEXP
      label def SEXP 1 "1. Male", modify
      label def SEXP 2 "2. Female", modify
      label values agegroup agegrouplbl
      label def agegrouplbl 1 "under 16", modify
      label def agegrouplbl 3 "20-29", modify
      label def agegrouplbl 4 "30-39", modify
      label def agegrouplbl 5 "40-49", modify
      label def agegrouplbl 6 "50-59", modify
      label def agegrouplbl 9 "85+", modify
      label values STATE STATE
      label def STATE 1 "NSW", modify
      My trouble comes on how to calculate and apply the weight and eventually survey set my data. Given that we can calculate weight as (sample size/population size), i am wondering if doing that wont simply give me a single number for the weights for each level (individual, family and household). The methodology file accompanying the data says the ideal PSU is dwelling, but i want to use the individual as unit of analysis, it has gone into details on what to do to avoid specifications which i completely understand


      Would you please advice on how i would go about survey setting this dataset using the calculated weights and individual as unit if analysis?



      Best regards.

      Sunganani Kalemba
      PhD Student.
      Queensland

      Comment


      • #4
        Please ask this question in a new topic, sunga. Kim's question concerns details of the Vietnam survey and I hope pursue those further.. I'd like to remind you of the strong preference in Statalist for using full real names.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          I found the following data description: http://catalog.ihsn.org/index.php/ca...36/datafile/F1

          However it does not include the variable that is the sampling weight, so it's up to you to find that in the data. It's not necessary to include a sampling stage for households, as in the absence of finite population corrections, only between-PSU variation counts. Since all eligible respondents in a HH are studied, they receive the HH sampling weight.
          • tinh
          Province/City (Stratum)

          • diaban
          Enumeration Area Number (PSU)

          • hoso
          Household Number

          • stt
          ID number

          • ttnt
          Urban/Rural (Substratum)

          The following code will work for individual and HH responses:
          Code:
          egen xstratum = group(tinh ttnt)   //This creates a separate category for urban and rural areas in each stratum.
          svyset diaban [pw = household weight], strata(xstratum)
          Last edited by Steve Samuels; 03 Jul 2018, 17:04.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            My apologies steven,

            am still getting my head around posting and updating my profile, excuse me for that. But i have proceeded as advised.

            Sunganani Kalemba
            PhD Student.
            Queensland

            Comment


            • #7
              Thank you very much for your help, Mr. Samuels! I highly appreciate it!

              Comment


              • #8
                Dear all, Thank you very much!! I have learned a lot from your daily posts.
                Respectfully, Hassen

                Comment

                Working...
                X