Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do proportionate stratified sampling without replacement in STATA 13

    Hello everybody, I want to select my sample in STATA 13 based on three stratum variables with 12 stratas in total (size - two stratas; sector - three stratas; intangible intensity - two stratas). The selection should be proportional without replacement.
    However, I can only find disproportionate selection commands that select for instance x% of each stratum.
    Can anyone help me out with this problem? Thanks in advance for your commments.
    Best,
    Tobias

  • #2
    Cross-posted at http://stackoverflow.com/questions/3...ut-replacement

    Please read and note our cross-posting policy, which is that you should tell us about it: http://www.statalist.org/forums/help#crossposting

    I'd recommend reading the whole document, up to http://www.statalist.org/forums/help#spelling

    Comment


    • #3
      Proportional selection" is not clear terminology, because it could also mean "sampling primary sampling units with probability proportional to size". You are, apparently, speaking of stratified simple random sampling with proportional allocation to strata. This means sampling the same percentage (i.e.proportion x 100) of the population in each stratum. (The percentages cannot be exactly equal, because stratum sample size \(n\) and population size \(N\) are discrete.) You haven't shown us what commands you've tried (as the FAQ ask), but those that let you specify the percentage sampled do exactly what you ask for.
      Last edited by Steve Samuels; 05 May 2016, 14:52.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Thank you for your replies. Please don't be offended that I didn't follow the forum policy. I just didn't know but will follow it from now on. Yes, I also asked the question at stackoverflow.com due to the fact that this is my first question in these forums.

        Steve's description is what I have in mind; my (especially German) literature is referring to this problem with the term proportionate sampling, sorry for the confusion.

        I had a look at three different commands or command-families: "gsample", "sample" and "svyset" as well as related examples in the web, e. g. http://www.ats.ucla.edu/stat/stata/s...r_sampling.htm
        http://www.ats.ucla.edu/stat/stata/faq/gsample.htm

        However, I did not find a solution for my problem. My problem is the link between the literature (Steve's description) and the implementation in Stata. I know for instance that x% of the population are large firms and I know that y% fall into the manufacturing sector. Is there a way to tell Stata that the final sample maintains these fractions? Is this approach / goal right or did I get the idea of propotionate stratified sampling wrong?

        I think that "gsample" combined with frequency weights goes into this direction (see also second link above), but as far as I understand it, gsample cannot work with more than one stratum variable. In my sample three stratum variables exist (size, sector, intangible intensity).

        Thank you again in advance for your comments.
        Last edited by Tobias Nell; 06 May 2016, 02:17. Reason: added reference to gsample example

        Comment


        • #5
          Thank you for this discussion. I think I know where my problem was.

          The command "gsample" can select strata based on different variables. Therefore, I thought I had to define three different stratum variables. But the solution should be more simple.

          There are 12 strata in total (the large firms with high intensity in sector 1, the small firms with high intensity in sector 1, and so on) with each firm in the sample falling in to one of the strata.

          All I have to do is creating a variable "strataident" with values from 1 to 12 identifying the different strata. I do this for the population dataset, so the number of firms falling into each stratum is representative for the population. The following command will provide me a stratified random sample that is representative for the population.

          Code:
          gsample 10, percent strata (strataident) wor
          Am I right? Once again, I look forward to your comments.
          Last edited by Tobias Nell; 06 May 2016, 06:10.

          Comment


          • #6

            Your version is okay, but more direct would be:
            Code:
            gsample 10, percent  strata(size sector intensity)
            Here's an illustration. Notice all percentages in the sampled data are identical.
            Code:
            sysuse auto, clear
            recode rep78 1/2 = 3
            . expand 10
            (666 observations created)
            
            . tab rep78 foreign, cell
            
                Repair |
                Record |       Car type
                  1978 |  Domestic    Foreign |     Total
            -----------+----------------------+----------
                     3 |       370         30 |       400
                       |     53.62       4.35 |     57.97
            -----------+----------------------+----------
                     4 |        90         90 |       180
                       |     13.04      13.04 |     26.09
            -----------+----------------------+----------
                     5 |        20         90 |       110
                       |      2.90      13.04 |     15.94
            -----------+----------------------+----------
                 Total |       480        210 |       690
                       |     69.57      30.43 |    100.00
            
            
            . gsample 20, percent wor strata(rep78 foreign)
            (602 observations deleted)
            
            . tab rep78 foreign, cell
            
            
                Repair |
                Record |       Car type
                  1978 |  Domestic    Foreign |     Total
            -----------+----------------------+----------
                     3 |        74          6 |        80
                       |     53.62       4.35 |     57.97
            -----------+----------------------+----------
                     4 |        18         18 |        36
                       |     13.04      13.04 |     26.09
            -----------+----------------------+----------
                     5 |         4         18 |        22
                       |      2.90      13.04 |     15.94
            -----------+----------------------+----------
                 Total |        96         42 |       138
                       |     69.57      30.43 |    100.0
            I have to say that this thread illustrates why we ask posters (in FAQ 12) to describe their data and show the commands tried and the results, rather than make vague statements that something "doesn't work".
            Last edited by Steve Samuels; 06 May 2016, 07:40.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Thanks a lot! The problem on my side was that I don't have the final population data yet. But yes, I could have tried it with the example data.

              Next time I will follow your suggestion.

              Comment


              • #8
                Hi,
                I would like to select random sample from my data set on following conditions
                • Two sites site A=450 and site B=300
                age criteria for each site is given below
                • **1• 20% 0-<2y
                • **2• 20% 2-<5y
                • **3• 20% 5-<10y
                • **4• 20% 10-<15y
                • **5• 20% 15-<25y

                Comment

                Working...
                X