Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unpaired t-test with weight

    Hello,

    I'm dealing with the descriptive statistics for a data set. Two variables related to paternal and maternal involvement are daily_f and daily_m. Means of these two variables are 0.43 and 0.69 respectively (weighted). Now I want to do an unpaired t-test for these two variables but weight function is not allowed.

    The result of unweighted t-test is as follow:

    Two-sample t test with equal variances
    ------------------------------------------------------------------------------
    Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
    daily_f | 9,626 .4887804 .0050952 .4999001 .4787927 .498768
    daily_m | 9,626 .7738417 .0042641 .4183646 .7654831 .7822003
    ---------+--------------------------------------------------------------------
    combined | 19,252 .631311 .0034772 .4824619 .6244955 .6381266
    ---------+--------------------------------------------------------------------
    diff | -.2850613 .0066441 -.2980843 -.2720383
    ------------------------------------------------------------------------------
    diff = mean(daily_f) - mean(daily_m) t = -42.9045
    Ho: diff = 0 degrees of freedom = 19250

    Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
    Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
    The unweighted means of two variables and other parameters are different from the weighted. Thus, this t-test result can not be applied to the weighted data, right?

    After browsing similar posts, I found that two ways may be used to solve the problem:

    a) First, to use -mean- and -lincom- commands.
    Code:
    mean daily_f daily_m [iweight=w2sweight]
    lincom daily_f-daily_m
    The result is as below:
    Mean estimation Number of obs = 15,126,283

    --------------------------------------------------------------
    | Mean Std. Err. [95% Conf. Interval]
    -------------+------------------------------------------------
    daily_f | .4452293 .0001278 .4449788 .4454797
    daily_m | .7173577 .0001158 .7171308 .7175847
    --------------------------------------------------------------

    . lincom daily_f-daily_m //p<0.01

    ( 1) daily_f - daily_m = 0

    ------------------------------------------------------------------------------
    Mean | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    (1) | -.2721285 .000134 -2031.01 0.000 -.2723911 -.2718659
    ------------------------------------------------------------------------------
    Because -mean- command drops samples when one of the two variables is missing, this test is similar to the paired t-test which is also different from the unpaired one. Consequently, -mean- and -lincom- commands may not provide the best solution.

    b) Another method is to append the data to combine paternal and maternal involvement variables into one parental involvement variable and to run -reg- with -weight- (gender as the independent variable). However, this method seems not that convenient.

    So... is there a good way to make an unpaired comparison between means of two variables? Is it necessary to conduct a weighted t-test?

    Thank you so much.



  • #2
    You do not provide any information of what are those variables, and what are those weights. Particularly if these are the paternal and maternal involvement for the same child(ren), then a not-paired test is inappropriate, because obviously if in family i the mother spends lots of time with the kids, the father can do other things. So the two measurements are strongly negatively related.

    It seems to be that the approach of a) you describe is fine, you are right that it drops data, but if data is missing at random, this should not be a problem.

    If you want to be pedantic about the non-dropping of data, and you are lazy to do what you describe in b), here is a third approach how to do this, which is easy enough:


    c) using -suest-

    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . reg price [weight=mpg]
    (analytic weights assumed)
    (sum of wgt is 1,576)
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(0, 73)        =      0.00
           Model |           0         0           .   Prob > F        =         .
        Residual |   521621024        73  7145493.48   R-squared       =    0.0000
    -------------+----------------------------------   Adj R-squared   =    0.0000
           Total |   521621024        73  7145493.48   Root MSE        =    2673.1
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _cons |   5794.871   310.7422    18.65   0.000     5175.562    6414.179
    ------------------------------------------------------------------------------
    
    . est sto Price
    
    . reg rep [weight=mpg]
    (analytic weights assumed)
    (sum of wgt is 1,469)
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(0, 68)        =      0.00
           Model |           0         0           .   Prob > F        =         .
        Residual |  71.7226076        68  1.05474423   R-squared       =    0.0000
    -------------+----------------------------------   Adj R-squared   =    0.0000
           Total |  71.7226076        68  1.05474423   Root MSE        =     1.027
    
    ------------------------------------------------------------------------------
           rep78 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _cons |   3.513955   .1236372    28.42   0.000     3.267241    3.760669
    ------------------------------------------------------------------------------
    
    . suest Price .
    
    Simultaneous results for Price, .
    
                                                    Number of obs     =         74
    
    ------------------------------------------------------------------------------
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    Price_mean   |
           _cons |   5794.871   290.0459    19.98   0.000     5226.391     6363.35
    -------------+----------------------------------------------------------------
    Price_lnvar  |
           _cons |   15.78199   .2501711    63.08   0.000     15.29167    16.27232
    -------------+----------------------------------------------------------------
    _LAST_mean   |
           _cons |   3.513955   .1330524    26.41   0.000     3.253177    3.774733
    -------------+----------------------------------------------------------------
    _LAST_lnvar  |
           _cons |   .0532983   .1521848     0.35   0.726    -.2449785    .3515751
    ------------------------------------------------------------------------------
    
    . test [Price_mean]_cons=[_LAST_mean]_cons
    
     ( 1)  [Price_mean]_cons - [_LAST_mean]_cons = 0
    
               chi2(  1) =  398.67
             Prob > chi2 =    0.0000

    Comment


    • #3
      Hi, Joro

      New command brings a new world! Thank you so much for the advice and I'll try it later.

      As for your comments, first, I just considered that what the variables and weight exactly are does not matter for the method. The variables are frequencies of paternal and maternal involvement and the weight is sampling weight (cross-section level). Second, more mothers' involvement leading to less fathers' is in line with the economic logic of time distribution but the relationship between paternal and maternal involvement in childcare seems to be more complicated. However, I think I should compare the results of paired and unpaired test and what you said about missing data is definitely right. Last, I'm not lazy with method b but just believe that there must be some simpler methods from stata.

      Thank you again for bringing something interesting.

      Comment


      • #4
        if I understand you correctly, a simple way is to recognize (1) regression with only a constant (i.e., no predictors/covariates) and the difference between your two paired variables as the response is the same as a paired t-test; (2) you can use weights with regression

        Comment


        • #5
          Hi, Rich

          Thanks for your suggestion.
          Yes, you are right. -reg- with weight is a better way. However, how to store previous result (e.g. mean or reg result) and use it in following steps was a problem for me. -suest- gives a solution.

          Thanks a lot.

          Comment


          • #6
            glad you found a solution; if I understand you correctly there are other ways to get what you want (one of the great things about Stata is that there is usually more than one way to get what is wanted) but that appears irrelevant now

            Comment


            • #7
              Thanks. haha, yes, Stata is brilliant! And what's other ways? (if you'd like to share)

              Comment


              • #8
                after estimating a regression, Stata saves a lot of the results in "r(table)" - you will probably need/want to do some simple arithmetic on those results but everything is there for you (if I understand what you want correctly); I suggest starting by making a new matrix that is the same as r(table) but will stay around and then extracting what you want/need

                Comment


                • #9
                  Originally posted by Rich Goldstein View Post
                  after estimating a regression, Stata saves a lot of the results in "r(table)" - you will probably need/want to do some simple arithmetic on those results but everything is there for you (if I understand what you want correctly); I suggest starting by making a new matrix that is the same as r(table) but will stay around and then extracting what you want/need
                  (if I understand you correctly) yes, you exactly got what I want. Initially I just want to test whether there is a statistical difference between mean values of two variables. I tried commands like this:
                  Code:
                  mean var1 [weight]
                  mean var2 [weight]
                  lincom mean_var1-mean_var2
                  However, mean_var1 in the memory can not be accessed when -lincom- is working. Then I realize that the problem is I do not know how to call what is stored (e.g. values, tables) in the memory and then to use them in following steps.
                  Thanks for your suggestion. Although I don't know how to make a new matrix for now, I'll start learning along this logic.

                  Comment


                  • #10
                    here is an example:
                    Code:
                    sysuse auto
                    regress gear i.for
                    mat li r(table)
                    mat b = r(table)
                    mat li b
                    see
                    Code:
                    help matrix

                    Comment


                    • #11
                      interesting! I'll try matrix and apply it later. -matrix- should be helpful in several contexts. Thank you so much for bringing something new and useful. Although the post is about t-test, I have learnt more than that.

                      Comment

                      Working...
                      X