Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • T-Test with Dummy Variables

    Stata Users,

    I'm new to the program and I am trying to figure out how to conduct a t-test between my continuous dependent variable (mntlhlth) and my independent categorical variable (class) that shows the independent variable groups in the t-test. The dataset for class is which social class people identify themselves as with responses 1-4 (1=Lower Class, 2 = Working Class, 3 = Middle Class, and 4 = Upper Class)
    I created dummy variables for the different social classes using gen lowerclass = 1 if class == 1 but not sure how to proceed to conduct a t-test after this.
    I have tried ttest mntlhlth, by (lowerclass) and it says "1 group found, 2 required"
    How do I go about creating a t-test that shows all 4 of the social classes?
    Thank you in advance.

  • #2
    Rudia.
    welcome to this forum.
    Why don't you go:
    Code:
    regress mntlhlth i.class
    instead?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      After you run the regression, look at test in the documentation to see how to do the tests.

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Rudia.
        welcome to this forum.
        Why don't you go:
        Code:
        regress mntlhlth i.class
        instead?
        Hi Carlo,

        I wanted to do a follow-up to this answer. Rudia was regressing a categorical variable (social class) on a continuous variable (mntlhlth) and the command you provided, was useful to have a similar output as in a ttest.

        My case is a bit different. I am interested in finding out if the mean of several categorical variables are different for a dependent variable which is coded as a dummy. Let me give you a few more details to understand the situation better. I want to know if there is a difference in the mean of an independent variable "invest" which is the investment made by an enterprise with respect to the previous year (it is coded as 1=did not invest, 2=less than last year, 3=same as last year, 4=more than last year) for women-owned and non women-owned enterprises. The dependent variable is "womenbuss" which is coded as 0 if an enterprise is not owned by a woman and as 1 if the enterprise is owned by a woman.

        Can I still use the same command? And would the interpretation be the same?

        Thanks for your help,
        Rodrigo

        Comment


        • #5
          Rodrigo:
          if I goy your query right, you can try something like:
          Code:
          . use http://www.stata-press.com/data/r15/lbw.dta
          (Hosmer & Lemeshow data)
          
          . logistic low i.race
          
          Logistic regression                             Number of obs     =        189
                                                          LR chi2(2)        =       5.01
                                                          Prob > chi2       =     0.0817
          Log likelihood = -114.83082                     Pseudo R2         =     0.0214
          
          ------------------------------------------------------------------------------
                   low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                  race |
                black  |   2.327536   1.078613     1.82   0.068     .9385072    5.772385
                other  |   1.889234   .6571342     1.83   0.067     .9554577    3.735597
                       |
                 _cons |   .3150685   .0753382    -4.83   0.000     .1971825     .503433
          ------------------------------------------------------------------------------
          Note: _cons estimates baseline odds.
          
          . test [low]2.race=[low]3.race
          
           ( 1)  [low]2.race - [low]3.race = 0
          
                     chi2(  1) =    0.20
                   Prob > chi2 =    0.6575
          
          .
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Hi Carlo,

            Thanks for your prompt response, I greatly appreciate it. You are correct, since the dependent variable is a dummy it should be a logit regression. Thanks for pointing that out.

            If I understood the output correctly, then for a given level of birth weight both black and "other race" are different from white at a 10% significance level. Then you proceeded to test if black was different from "other race" and it indeed is, also at a 10% significance level.

            How would I go about to test if the mean of the white population is different between low=0 and low=1? This is where I was going with my initial question when, for example, trying to test if the mean of "invest=4" (people who invested more that in the previous year) was different between women-owned businesses and non women-owned businesses ("womenbuss=1 and womenbuss=0", respectively).

            Thanks,
            Rodrigo


            Comment


            • #7
              Rodrigo:
              do you mean something like:
              Code:
              . test [low]_cons=[low]3.race=[low]2.race, mtest(bonferroni)
              
               ( 1)  - [low]3.race + [low]_cons = 0
               ( 2)  - [low]2.race + [low]_cons = 0
              
              ---------------------------------------
                     |        chi2     df       p
              -------+-------------------------------
                (1)  |       10.97      1     0.0019 #
                (2)  |       10.35      1     0.0026 #
              -------+-------------------------------
                all  |       12.70      2     0.0017
              ---------------------------------------
                       # Bonferroni-adjusted p-values
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Thanks Carlo, I have never performed a Bonferroni test but I understand that you used it to correct since there are multiple hypothesis being tested. I might be confused on my interpretation, but how is this result different from the one provided by the regression you gave in #5?

                I have the impression that in both outputs, you are keeping the variable "low" constant and comparing the mean of low between races. What I am interested in, is to know if in keeping a certain race constant (i.e. black) there are significant differences between low=0 and low=1.

                Please correct me if I am wrong in my interpretation. Once again, thanks for your help!

                Comment


                • #9
                  Rodrigo:
                  in #5 the the _cons (ie, white race) was not included in -test-.
                  The variable low is simply the name of the regressand of the toy-example; it does not affect the -test- outcome.
                  That said, there's something I fail to get in your query:
                  you are dealing with a simple logistic regression (ie, with one predictor only (ie, investment made by an enterprise with respect to the previous year), the regressand being the business ownership (women yes or no).
                  You can test whteher the level of the predictors differs in terms of the variations they caused in the regressand, but you cannot have the same variable being a regressand and a predictor.
                  Sharing an example/excerpt oy your data via -dataex- would contribute to make things clearer. Thanks.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Carlo,

                    That is a great idea, here is a sample of the database:

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input byte(wom_ownbuss inversion shocks)
                    0 3 1
                    0 3 2
                    1 4 2
                    0 1 2
                    0 2 1
                    1 1 3
                    1 4 2
                    0 3 1
                    1 1 1
                    end
                    "shocks" is a continuous variable and as I mentioned before in #4 "invest" is a categorical variable. Since shocks is continuous, I can perform a ttest directly to compare the mean per type of business ownership:

                    Code:
                    . ttest shocks, by( wom_ownbuss)
                    
                    Two-sample t test with equal variances
                    ------------------------------------------------------------------------------
                       Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
                    ---------+--------------------------------------------------------------------
                           0 |     498    1.937751    .0584324    1.303972    1.822946    2.052556
                           1 |     117    1.811966    .1078496    1.166572    1.598356    2.025576
                    ---------+--------------------------------------------------------------------
                    combined |     615    1.913821    .0515749    1.279017    1.812536    2.015106
                    ---------+--------------------------------------------------------------------
                        diff |            .1257852    .1314122               -.1322876     .383858
                    ------------------------------------------------------------------------------
                        diff = mean(0) - mean(1)                                      t =   0.9572
                    Ho: diff = 0                                     degrees of freedom =      613
                    
                        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                     Pr(T < t) = 0.8306         Pr(|T| > |t|) = 0.3389          Pr(T > t) = 0.1694
                    This output tells me that the mean of shocks in businesses not owned by women is 1.93 and it is 1.81 in the case of women-owned business. The difference is not significantly different from 0.

                    However, because of the nature of the "invest (or inversion in spanish)" variable, I cannot conduct the ttest directly. That is why I try to use the example you gave with the regress (in this case, the logit) command. What I would like to do, is to test if there is any difference in the mean of a specific category of the variable invest in the 2 groups of businesses. I hope this brings more clarity towards what I am trying to do.

                    Thanks,
                    Rodrigo



                    Comment


                    • #11
                      Rodrigo:
                      it seems that it is the first time that you mention the dependent variable you're really interested in: -shocks- (which is different from
                      ...a dependent variable which is coded as a dummy [as reported in your #]
                      .
                      Hence we have gone back and forth without any gain in the previous posts, wasting our time: please read the FAQ about posting-related topics.
                      That said, you may want something along the following lines:
                      Code:
                      regress shocks i.wom_ownbuss##i.invest
                      Last edited by Carlo Lazzaro; 02 May 2019, 23:46.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Hi Carlo,

                        Thanks for your help to figure this out. I apologize for the inconvenience, I got confused between the dependent and independent variable without realizing it until your last post. I guess that using the -dataex- command from the beginning, would have brought clarity much earlier in the process and it would have avoided wasting your time. My apologies for this and it is definitely a lesson learned for my next posts.

                        Just one final remark and to make sure I understood the output of your suggestion:

                        Code:
                         regress shocks i.wom_ownbuss##i.inversion
                        
                              Source |       SS       df       MS              Number of obs =     603
                        -------------+------------------------------           F(  7,   595) =    1.19
                               Model |  13.6558372     7  1.95083388           Prob > F      =  0.3057
                            Residual |  974.523267   595  1.63785423           R-squared     =  0.0138
                        -------------+------------------------------           Adj R-squared =  0.0022
                               Total |  988.179104   602  1.64149353           Root MSE      =  1.2798
                        
                        ---------------------------------------------------------------------------------------
                                       shocks |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        ----------------------+----------------------------------------------------------------
                                1.wom_ownbuss |   .1464552    .356106     0.41   0.681    -.5529223    .8458328
                                              |
                                    inversion |
                                           2  |   .4241451    .212551     2.00   0.046     .0067037    .8415865
                                           3  |   -.014821   .1894706    -0.08   0.938    -.3869335    .3572915
                                           4  |   .2089552   .1806512     1.16   0.248    -.1458363    .5637468
                                              |
                        wom_ownbuss#inversion |
                                         1 2  |  -.4727562   .4884012    -0.97   0.333    -1.431956    .4864437
                                         1 3  |  -.1873849   .4317837    -0.43   0.664     -1.03539    .6606206
                                         1 4  |  -.2986291   .4130451    -0.72   0.470    -1.109833    .5125745
                                              |
                                        _cons |   1.791045   .1563508    11.46   0.000     1.483978    2.098111
                        ---------------------------------------------------------------------------------------
                        The last section -wom_ownbuss##inversion- compares the mean of the -invest variable per type of business ownership,with the constant being the base value for -invest-. Is this correct?

                        Finally, I have also tried to get a similar result by using the following command:

                        Code:
                        . mlogit inversion wom_ownbuss, base(1)
                        
                        Iteration 0:   log likelihood = -780.57905  
                        Iteration 1:   log likelihood = -780.56602  
                        Iteration 2:   log likelihood = -780.56602  
                        
                        Multinomial logistic regression                   Number of obs   =        604
                                                                          LR chi2(3)      =       0.03
                                                                          Prob > chi2     =     0.9989
                        Log likelihood = -780.56602                       Pseudo R2       =     0.0000
                        
                        ------------------------------------------------------------------------------
                           inversion |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                        1            |  (base outcome)
                        -------------+----------------------------------------------------------------
                        2            |
                         wom_ownbuss |  -.0469722    .381627    -0.12   0.902    -.7949474    .7010029
                               _cons |   .1647552   .1660831     0.99   0.321    -.1607617    .4902722
                        -------------+----------------------------------------------------------------
                        3            |
                         wom_ownbuss |  -.0113489   .3373153    -0.03   0.973    -.6724746    .6497769
                               _cons |   .7651207   .1478845     5.17   0.000     .4752724    1.054969
                        -------------+----------------------------------------------------------------
                        4            |
                         wom_ownbuss |  -.0375721   .3227453    -0.12   0.907    -.6701412     .594997
                               _cons |   1.093625   .1411573     7.75   0.000     .8169616    1.370288
                        ------------------------------------------------------------------------------
                        Here, there are changes in the coefficients but the variables are still not significantly different from 0. Would you agree with this as an alternative?

                        Thanks,
                        Rodrigo

                        Comment


                        • #13
                          Rodrigo:
                          1) If you type:
                          Code:
                          regress shocks i.wom_ownbuss##i.inversion, allbase
                          you will have a clearer picture of what's going on with your interaction.
                          That said, I would say that, in your case, the -cons refers to the situtation in which reference categories are 0 for -wom_ownbuss- and 1 for -invest-

                          2) No, I do not support your second code as an alternative to 1), as you're indirectly comparing two really different regression models.
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            Thanks Carlo, I greatly appreciate your support!

                            Comment


                            • #15
                              Hi Carlo,

                              I would like to reopen this thread to make a follow-up question. I want to know if there is a significant difference in the pricing power between women and non-women owned businesses. Therefore, I am planning to regress -pric_pow- on -wom_ownbuss-. The dependent variable -pric_pow- is a categorical variable with 4 levels while the independent variable is a dummy (1=women owned businesses). Considering the nature of the dependent variable, I decided to brake it into several dummies (i.e. dum=1 if - pric_pow- = 1 and 0 if otherwise, dum2=1 if - pric_pow- = 2 and 0 if otherwise and so on). Here is an excerpt of the dataset with the variables I have just mentioned.

                              Code:
                              * Example generated by -dataex-. To install: ssc install dataex
                              clear
                              input byte(wom_ownbuss pric_pow dum1 dum2 dum3 dum4)
                              1 1 1 0 0 0
                              0 4 0 0 0 1
                              1 3 0 0 1 0
                              0 2 0 1 0 0
                              0 2 0 1 0 0
                              end
                              I then run a logit regression to determine if there is any significant difference, as shown below.

                              Code:
                              . logit dum1 wom_ownbuss
                              
                              Iteration 0:   log likelihood =  -385.9273  
                              Iteration 1:   log likelihood = -385.90021  
                              Iteration 2:   log likelihood = -385.90021  
                              
                              Logistic regression                               Number of obs   =        560
                                                                                LR chi2(1)      =       0.05
                                                                                Prob > chi2     =     0.8159
                              Log likelihood = -385.90021                       Pseudo R2       =     0.0001
                              
                              ------------------------------------------------------------------------------
                                      dum1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                              -------------+----------------------------------------------------------------
                               wom_ownbuss |   .0497615   .2139026     0.23   0.816    -.3694799     .469003
                                     _cons |   .1692921   .0946189     1.79   0.074    -.0161575    .3547416
                              ------------------------------------------------------------------------------
                              The variable -wom_ownbuss- is not significantly different from 0. I have taken the constant as the value for non-women owned businesses which is significantly different from 0 at a 10% significance level.

                              My questions are:
                              a) Is the interpretation of the output correct?
                              b) Would you recommend any other commands to run the regression without having to brake the variable -pric_power- into dummies?

                              Thanks for your help!
                              Rodrigo

                              Comment

                              Working...
                              X