Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting a representative sample from bulk data

    Dear statalist members,

    I have a large data set containing answers to an online survey at national level. It's a non-probabilistic sample and it's not a panel, either. People entered and answered the questions within a given period, so we recorded a lot of data, along with subjects' socio-demographic coordinates.

    I want to do a post-estimation on data.
    I used the ipfweight package to generate a weighting factor based on gender/region/age group strata using census data for defining the population margins. Worked perfect. (Thank you, Michael Bergmann!)

    Now, it is possible to use the same strata for extracting a (probabilistic) sample from my data, using a matching technique?

    Thank you very much!
    Cristian

  • #2
    I'm not entirely sure what you have in mind here, particularly your reference to a "matching technique" (what is to be matched to what?), but it looks to me like -gsample-, available at ssc, should do what you want. If offers various options for sampling with weights, stratification, and sampling with or without replacement.

    Comment


    • #3
      Hello, Mike,

      Thank you very much for the reply! gsample looks and works great, however I'm not sure if sampling from my data using weights is the appropriate way to do it. I suppose I may induce more hidden problems.

      I'm sure I didn't explained well. What I want to do is to use a "correct", known, data (probabilistic) sample to extract form my data a "match" using certain variables present in both samples (in my case gender/region/age group). Say I have a standard probabilistic sample (balanced data in regards of the entire population) including variables A B C (along with others). I have a second, imbalanced, data set, containing the same variables. How can I use the first data set to extract a balanced subset from my data?
      It's an usual approach for balancing two data sets ("matching"), but I don't know how to do it in Stata, for my case.

      Thank you again!
      C.

      Comment


      • #4
        Sorry, but I'm farther away from understanding your goal than I was before. Maybe someone else will have better insight than I do.

        Comment

        Working...
        X