Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Chi2 test with if command

    Dear statalister,

    in my current project i would like to compare a sample attribute with the population's attribute in order to verify if the sample is representative for the population. For example, my sample includes the age of firms which I´ve also collected through a survey. I sent the survey to all firms in a databse (n=100) and received answers from about 50 (just as example). Now, I want to compare the age distribution of sample to the age distribution of the population database. My stata datatable looks like the following:
    Survey AGE_Survey (of Survey respondents (n=50)) AGE_DB (in Database (n=100))
    Firm 1 1 5 5
    Firm 2 1 2 2
    Firm 3 0 3
    Firm 4 0 2
    If i want to see the age distribution of the sample, i would for example use the following command (in order to omitt empty cells): tabulate AGE_Survey if Survey==1

    How would I be able to compare the age distribution of AGE_Survey (Survey=1) with the age distribution of AGE_DB using Chi2 or similar tests? I want that stata omitts the empty cells of the column AGE_Survey when comparing.

    I tried somehting like that: tab AGE_Survey if Survey==1 AGE_DB, chi2 but was not successful.

    Maybe some has a recommendation?

    Regards,
    Alex

  • #2
    Hello Christian,

    Welcome to the Stata Forum.

    You may try something like this:

    Code:
    . svy, subpop (if Survey ==1): tab Age_Survey Age_DB, row
    Hopefully that helps!

    Best,

    Marcos
    Best regards,

    Marcos

    Comment


    • #3
      Dear Marcos,

      thank you for the fast reply. I think this goes into the right direction. However, Stata uses the answers from AGE_DB which fit the condition Survey=1. I wanted that Stata only considers Survey=1 for AGESurvey and not for AGE_DB.

      Is there a way I could tell stata?

      Regards,
      Christian (Christian-Alexander)

      Comment


      • #4
        Well, if I understood correctly, I fear there is no way (neither point) in getting a 2-by-2 measure of assocation with a chi-square test, having missing values in one of the four cells.

        Hopefully you will get further advice on this.
        Best regards,

        Marcos

        Comment


        • #5
          Here is a trick to present the two distributions side by side, however to compare the distribution you should check for tab_chi (SSC) by Nick Cox, I have no experience with this program but I think it does exactly what you are looking for. Read also the helpful notes by Richard Williams.

          Code:
          clear*
          input Survey    AGE_Survey    AGE_DB    Firm
          1    5    5    1
          1    2    2    2
          0    .    3    3
          0    .    2    4
          end
          expand Survey*2, g(to_expand)
          lab de to_expand 1 "Survey respondents (n=50)" 0 "Database (n=100)"
          la val to_expand to_expand
          tab AGE_DB to_expand, col

          Comment

          Working...
          X