Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combine two datasets to make a t-test

    Hello.

    I have two different data sets. One contains 2448 women, and one contains 552 women. The 552 woman is a part of the 2448 women. What I want to test is to see if my 552 women are representative for the 2448 women. For that reason I want to to a t-test to see, if there is a statistical significant difference between the two groups in respect to 6 different variables (mean birthweight, distribution of sex etc). I have all my information about the 2448 in state file A, and my information about the 552 women in stata file B. I now want to combine the two data sets (without merging!!) to be able to do my t-test.

    Can anybody help me?

    Thanks a lot
    Last edited by Camilla Christensen; 07 Apr 2018, 05:10.

  • #2
    #1 I may suggest you use "append" to combine two data files and create a new group variable to define file 1 and file 2. Then you can do the t-test in one file.

    Comment


    • #3
      #2 when I append the two datasets, all the values in one of the data sets changes to 0 or 1. Why is that?

      Comment


      • #4
        Originally posted by Camilla Christensen View Post
        Hello.

        I have two different data sets. One contains 2448 women, and one contains 552 women. The 552 woman is a part of the 2448 women.
        At the risk of asking a silly question, is there a dichotomous variable in the larger dataset that has one value for the 552 women and another value for the other 1896 women?
        --
        Bruce Weaver
        Email: [email protected]
        Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
        Version: Stata/MP 18.0 (Windows)

        Comment


        • #5
          Originally posted by Bruce Weaver View Post

          At the risk of asking a silly question, is there a dichotomous variable in the larger dataset that has one value for the 552 women and another value for the other 1896 women?
          No, there is not. Would that help me?

          Comment


          • #6
            Yes. You could use such a variable as the group variable for an unpaired t-test.

            Code:
            ttest y, by(group)
            There are other issues you have not raised. Here are a few things that come to mind.
            1. Do you have 6 univariate questions or one multivariate question? (I'm thinking of the Huberty & Morris 1989 article as I ask that.)
            2. If you are going to use t-tests, I'd suggest using either the -unequal- or -welch- option, as the t-test will be sensitive to heterogeneity of variance, given such a large discrepancy in sample sizes.
            3. The estimated mean differences with CIs would be more informative than the p-values from 6 t-tests. (These can be seen in the t-test output in the 'diff' row.)
            HTH.
            --
            Bruce Weaver
            Email: [email protected]
            Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
            Version: Stata/MP 18.0 (Windows)

            Comment

            Working...
            X