Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing the difference between means

    I struggle with testing the difference between means of the indicators for firms. In my sample I have 9 companies. I have quarterly data from 2016 till 2021. I created dummies for the firms to classify them into all, big and others. I used:

    gen big_firms=0
    replace big_firms=1 if inlist(firm_id, 1,2,3,4)
    gen other_firms=1 if big_firms==0

    Then I wanted to compare the means of the indicators between the groups and the rest. I used:

    ttest ind1,by(big_firms)
    ttest ind1,by(other_firms)

    I was surprise that p-values are exactly the same for big and other firms

    I'm really confused with my results. Could you help me? Is there an other test for means?

  • #2
    Guest,

    your first code already compares big (1) vs. other (0) firms.
    Two further comments:
    1) as per FAQ, plese share what you typed and what Stata gave you back. Thanks;
    2) the -unequal- variance option should be considered in your -ttest- code.
    Last edited by sladmin; 28 Aug 2023, 08:34. Reason: anonymize original poster
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you Carlo for your answer! I cannot share my stata output since data is sensitive. I hope Stata Community could help me anyway or at least provide me with a hint.
      Thank you for pointing it out that the first code already compares big companies with the others. It isn't what I want.
      I wanted to show the indicator means for the 3 groups: all, big firms and other firms. I calculated them and for every group it is slightly different. For the big firms and other firms I wanted to have p-values showing whether the means are different from the rest. It looks like dummies I built are wrong. Or is it not feasible at all?

      Comment


      • #4
        You have two groups, big firms and the rest. There isn't a t-test for comparing (1) all firms and (2) either or both groups, because those subsets overlap.

        I am surprised your second t test worked at all, as

        Code:
        gen other_firms=1 if big_firms==0
        would yield 1 or missing and ttest is going to ignore missings on the by() variable.


        Not the question, but

        Code:
        gen big_firms=0
        replace big_firms=1 if inlist(firm_id, 1,2,3,4)
        can just be

        Code:
        gen big_firms = inlist(firm_id, 1,2,3,4)
        as explained in various places, e.g. https://www.stata.com/support/faqs/d...rue-and-false/

        https://www.stata-journal.com/articl...article=dm0099

        Comment


        • #5
          Guest,

          I would go -regress-, then:
          Code:
          sysuse auto.dta
          . bysort rep78: sum price
          
          ------------------------------------------------------------------------------------------------------------------------------------------
          -> rep78 = 1
          
              Variable |        Obs        Mean    Std. dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |          2      4564.5    522.5519       4195       4934
          
          ------------------------------------------------------------------------------------------------------------------------------------------
          -> rep78 = 2
          
              Variable |        Obs        Mean    Std. dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |          8    5967.625    3579.357       3667      14500
          
          ------------------------------------------------------------------------------------------------------------------------------------------
          -> rep78 = 3
          
              Variable |        Obs        Mean    Std. dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |         30    6429.233     3525.14       3291      15906
          
          ------------------------------------------------------------------------------------------------------------------------------------------
          -> rep78 = 4
          
              Variable |        Obs        Mean    Std. dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |         18      6071.5    1709.608       3829       9735
          
          ------------------------------------------------------------------------------------------------------------------------------------------
          -> rep78 = 5
          
              Variable |        Obs        Mean    Std. dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |         11        5913    2615.763       3748      11995
          
          ------------------------------------------------------------------------------------------------------------------------------------------
          -> rep78 = .
          
              Variable |        Obs        Mean    Std. dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |          5      6430.4    3804.322       3799      12990
          
          
          . regress price i.rep78
          
                Source |       SS           df       MS      Number of obs   =        69
          -------------+----------------------------------   F(4, 64)        =      0.24
                 Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
              Residual |   568436416        64     8881819   R-squared       =    0.0145
          -------------+----------------------------------   Adj R-squared   =   -0.0471
                 Total |   576796959        68  8482308.22   Root MSE        =    2980.2
          
          ------------------------------------------------------------------------------
                 price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                 rep78 |
                    2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
                    3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
                    4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
                    5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
                       |
                 _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
          ------------------------------------------------------------------------------
          
          . lincom 2.rep78 - 3.rep78
          
           ( 1)  2.rep78 - 3.rep78 = 0
          
          ------------------------------------------------------------------------------
                 price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   (1) |  -461.6083    1185.87    -0.39   0.698    -2830.656     1907.44
          ------------------------------------------------------------------------------
          
          .
          The -lincom- output is equal to the difference of means between rep78=2 vs. rep78=3.
          Last edited by sladmin; 28 Aug 2023, 08:34. Reason: anonymize original poster
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you Carlo and Nick for such a thorough explanation! It pushed me one little step forward in understanding statistics. I decided to compare only big and other firms

            Comment

            Working...
            X