Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing statistical differences of survey data with population data

    Dear Forum members,

    I would like to test if a categorical survey variable is statistically different from population (i.e. census) data.

    I found the -csgof- command which would be a good solution. However, I would like to account for the survey design, and csgof does not support -svy-. A similar command, mgof, does support -svy- but I do not seem to be able to define the population values.

    More in detail, using -csgof- I can define the exact distributions of the population data: e.g. csgof age, expperc(25 22 18 20 15). When I try the same with -mgof- and display the percentages (mgof age = (25 22 18 20 15), svy percent), it shows that the values defined were not taken into account.

    It would be great if you could help me with a solution for this or if you could recommend another method.

    Thanks a lot!
    Susanne

  • #2
    We have no idea of what you are seeing, so cannot really diagnose your problem. Show us everything, including tabulation of age, the svyset statement. As FAQ 12 directs, put commands, data listings and results between CODE delimiters.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Dear Steve,
      thanks for answering. And sorry for not having been clear.

      What I would like to do is a Chi2-test to test the distribution of my survey variable "age" against a known population distribution. Like this:

      csgof age_comp5, expperc(25 22 18 20 15)

      +----------------------------------------+
      | age_co~5 expperc expfreq obsfreq |
      |----------------------------------------|
      | 16-25 25 2560 1,807 |
      | 26-35 22 2252.8 1,952 |
      | 36-45 18 1843.2 2,263 |
      | 46-55 20 2048 2,506 |
      | 56-65 15 1536 1,712 |
      +----------------------------------------+

      chisq(4) is 479.85, p = 0

      However, I would also like to account for the survey design and "csgof" does not support "svy":

      svyset [pw=SPFWT0], jkrw(SPFWT1-SPFWT80, mult(.9875)) vce(jackknife)

      pweight: SPFWT0
      VCE: jackknife
      MSE: off
      jkrweight: SPFWT1 SPFWT2 SPFWT3 SPFWT4 SPFWT5 SPFWT6 SPFWT7 SPFWT8 SPFWT9 SPFWT10 SPFWT11 SPFWT12 SPFWT13
      SPFWT14 SPFWT15 SPFWT16 SPFWT17 SPFWT18 SPFWT19 SPFWT20 SPFWT21 SPFWT22 SPFWT23 SPFWT24 SPFWT25
      SPFWT26 SPFWT27 SPFWT28 SPFWT29 SPFWT30 SPFWT31 SPFWT32 SPFWT33 SPFWT34 SPFWT35 SPFWT36 SPFWT37
      SPFWT38 SPFWT39 SPFWT40 SPFWT41 SPFWT42 SPFWT43 SPFWT44 SPFWT45 SPFWT46 SPFWT47 SPFWT48 SPFWT49
      SPFWT50 SPFWT51 SPFWT52 SPFWT53 SPFWT54 SPFWT55 SPFWT56 SPFWT57 SPFWT58 SPFWT59 SPFWT60 SPFWT61
      SPFWT62 SPFWT63 SPFWT64 SPFWT65 SPFWT66 SPFWT67 SPFWT68 SPFWT69 SPFWT70 SPFWT71 SPFWT72 SPFWT73
      SPFWT74 SPFWT75 SPFWT76 SPFWT77 SPFWT78 SPFWT79 SPFWT80
      Single unit: missing
      Strata 1: <one>
      SU 1: <observations>
      FPC 1: <zero>

      .
      . svy: csgof age_comp5, expperc(25 22 18 20 15)

      csgof is not supported by svy with vce(jackknife); see help svy estimation for a list of Stata estimation commands
      that are supported by svy
      r(322);

      So I tried to run a Chi2-test with "mgof", but I do not seem to be able to specify what the expected proportions are. The expected proportions are all of equal size:

      . mgof age_comp5 = (25 22 18 20 15), svy percent

      Number of strata = 1 Number of obs = 5465
      Number of PSUs = . Pop size = 5.4e+07
      Design df = 79
      N of outcomes = 5
      F df1 = 1.95067
      F df2 = 154.103

      ---------------------------------------------------------
      Goodness-of-fit | Coef. F-value P-value
      ----------------------+----------------------------------
      Pearson's X2 | 105.7478 1549.8159 0.0000
      Log likelihood ratio | 104.1552 1526.4745 0.0000
      ---------------------------------------------------------

      age_comp5 | observed expected
      -------------+----------------------
      1 | 17.34 20.00
      2 | 18.04 20.00
      3 | 23.05 20.00
      4 | 23.72 20.00
      5 | 17.85 20.00
      -------------+----------------------
      Total | 100.00 100.00

      Is there a way to run the mgof command with specifying the expected proportions?

      Thanks again!
      Susanne







      Comment


      • #4
        Dear Suisanne, I asked that you put code and results between CODE delimiters, described in FAQ 12. Please do so.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Ok, here it is:

          What I would like to do is a Chi2-test to test the distribution of my survey variable "age" against a known population distribution. Like this:

          Code:
          csgof age_comp5, expperc(25 22 18 20 15)
          Here is the result:

          Code:
          +----------------------------------------+
          | age_co~5 expperc expfreq obsfreq |
          |----------------------------------------|
          | 16-25 25 2560 1,807 |
          | 26-35 22 2252.8 1,952 |
          | 36-45 18 1843.2 2,263 |
          | 46-55 20 2048 2,506 |
          | 56-65 15 1536 1,712 |
          +----------------------------------------+
          
          chisq(4) is 479.85, p = 0
          However, I would also like to account for the survey design and "csgof" does not support "svy". Here is my svysetting command:

          Code:
          svyset [pw=SPFWT0], jkrw(SPFWT1-SPFWT80, mult(.9875)) vce(jackknife)
          Here is the documentation of the svysetting:
          Code:
          pweight: SPFWT0
          VCE: jackknife
          MSE: off
          jkrweight: SPFWT1 SPFWT2 SPFWT3 SPFWT4 SPFWT5 SPFWT6 SPFWT7 SPFWT8 SPFWT9 SPFWT10 SPFWT11 SPFWT12 SPFWT13
          SPFWT14 SPFWT15 SPFWT16 SPFWT17 SPFWT18 SPFWT19 SPFWT20 SPFWT21 SPFWT22 SPFWT23 SPFWT24 SPFWT25
          SPFWT26 SPFWT27 SPFWT28 SPFWT29 SPFWT30 SPFWT31 SPFWT32 SPFWT33 SPFWT34 SPFWT35 SPFWT36 SPFWT37
          SPFWT38 SPFWT39 SPFWT40 SPFWT41 SPFWT42 SPFWT43 SPFWT44 SPFWT45 SPFWT46 SPFWT47 SPFWT48 SPFWT49
          SPFWT50 SPFWT51 SPFWT52 SPFWT53 SPFWT54 SPFWT55 SPFWT56 SPFWT57 SPFWT58 SPFWT59 SPFWT60 SPFWT61
          SPFWT62 SPFWT63 SPFWT64 SPFWT65 SPFWT66 SPFWT67 SPFWT68 SPFWT69 SPFWT70 SPFWT71 SPFWT72 SPFWT73
          SPFWT74 SPFWT75 SPFWT76 SPFWT77 SPFWT78 SPFWT79 SPFWT80
          Single unit: missing
          Strata 1: <one>
          SU 1: <observations>
          FPC 1: <zero>
          Csgof does not support "svy":

          Code:
           . svy: csgof age_comp5, expperc(25 22 18 20 15)
          Here is the error message:

          Code:
          csgof is not supported by svy with vce(jackknife); see help svy estimation for a list of Stata estimation commands
          that are supported by svy
          r(322);
          So I tried to run a Chi2-test with "mgof", but I do not seem to be able to specify what the expected proportions are.

          Code:
          mgof age_comp5 = (25 22 18 20 15), svy percent
          In the result you see, that all proportions are expected to be of equal size:

          Code:
          Number of strata = 1 Number of obs = 5465
          Number of PSUs = . Pop size = 5.4e+07
          Design df = 79
          N of outcomes = 5
          F df1 = 1.95067
          F df2 = 154.103
          
          ---------------------------------------------------------
          Goodness-of-fit | Coef. F-value P-value
          ----------------------+----------------------------------
          Pearson's X2 | 105.7478 1549.8159 0.0000
          Log likelihood ratio | 104.1552 1526.4745 0.0000
          ---------------------------------------------------------
          
          age_comp5 | observed expected
          -------------+----------------------
          1 | 17.34 20.00
          2 | 18.04 20.00
          3 | 23.05 20.00
          4 | 23.72 20.00
          5 | 17.85 20.00
          -------------+----------------------
          Total | 100.00 100.00

          Is there a way to run the mgof command with being able to specify the expected proportions?

          Thanks again!
          Susanne








          Comment


          • #6
            Thanks for the code delimiters.

            There is nothing in the help for mgof to show that it would accept the cgof syntax that you gave it:

            Code:
             cgof age = (25 22 18 20 15)
            I think that mgof should have issued an error message when you tried, but it just defaulted to the default test against a uniform distribution.

            What will work is to supply the expected proportions in a variable (taken from an example in the help):

            Code:
            recode age (1=.25)  (2 =.22)  (3 =.18)  (4 =.20) (5 = .15), gen(eprop)
            mgof age =eprop, svy percent
            I think that hypothesis testing is of questionable value here. No null hypothesis about descriptive attributes will be exactly true with finite populations. (Cochran, 1977, p. 39). Imagine a census of both two populations: the age distributions will never be exactly the same . So even if your test is non-significant, you can't claim equality. The real question is: how different or how close, which you can also examine with confidence intervals.
            Last edited by Steve Samuels; 30 Jan 2016, 15:28.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              I apologize for the cursory reference. it is: William Cochran, Sampling Techniques, Wiley Books, New York, 1977.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Dear Steve,

                thank you, that was very helpful advice!

                I am aware of the limitations of any statistical test in that situation and planned to examine the CIs in any case. However, I will discuss different tests (such as Chi2) in my paper and the reference you gave is a useful add-on to this discussion.

                Regards
                Susanne

                Comment

                Working...
                X