Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split file by group i.e. group 1 (non clinical) vs group 2 (clinical)

    Hi there,

    Might someone please advise how I split my Stata dataset by 1 variable "Group" which is has two levels 1 (non clinical) vs group 2 (clinical) as I would like to be able to compare those two groups?

    I think I'm overthinking this - many thanks in advance for your time and expertise.

  • #2
    Mary:
    welcome to this forum.
    Do you mean something along the following lines?
    Code:
    . set obs 2
    number of observations (_N) was 0, now 2
    
    . g Group= 0 in 1
    
    . replace Group=1 in 2
    
    . label define Group 0 "non clinical" 1 "clinical"
    
    . label val Group Group
    
    . list
    
         +--------------+
         |        Group |
         |--------------|
      1. | non clinical |
      2. |     clinical |
         +--------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      please provide a data example using -dataex- (see the FAQ) and posted as per the FAQ; the issue is that your description is both vague in certain ways and ambiguous in others and the data example will clear all that up making it much easier to respond in a helpful way without guessing

      Comment


      • #4
        I think you are indeed overthinking this. In the following example, foreign takes the role of your Group.
        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . summarize weight length if foreign==0
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
              weight |         52    3317.115    695.3637       1800       4840
              length |         52    196.1346    20.04605        147        233
        
        . summarize weight length if foreign==1
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
              weight |         22    2315.909    433.0035       1760       3420
              length |         22    168.5455    13.68255        142        193
        
        . by foreign, sort: summarize weight length
        
        ------------------------------------------------------------------------------------------------
        -> foreign = Domestic
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
              weight |         52    3317.115    695.3637       1800       4840
              length |         52    196.1346    20.04605        147        233
        
        ------------------------------------------------------------------------------------------------
        -> foreign = Foreign
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
              weight |         22    2315.909    433.0035       1760       3420
              length |         22    168.5455    13.68255        142        193

        Comment


        • #5
          Dear STATA Experts,
          I have two questions:
          Q1: I have a dataset with a group variable with 2 categories: Cirrhosis present and Cirrhosis absent. I want to split it into two parts and calculate descriptive statistics separately for each category.
          Q2: I have five groups under etiology variable in my data set, and I need to select and analyze groups 1 and 4 without including the other groups. I've tried searching for a "select command or if command" but haven't been successful. Can you help me determine how to select these two specific groups only for analysis?
          Please help.

          Comment


          • #6
            Though there is an if command in Stata, most often that is not what you need. Most often (including in your use case) you will use if to choose a subset of observations on which to run another Stata command, such as summarize.

            See
            Code:
            help if
            You could for instance, do something like

            Code:
            sum myvar if etiology == 1 | etiology == 4

            Comment


            • #7
              Originally posted by Hemanshu Kumar View Post
              Though there is an if command in Stata, most often that is not what you need. Most often (including in your use case) you will use if to choose a subset of observations on which to run another Stata command, such as summarize.

              See
              Code:
              help if
              You could for instance, do something like

              Code:
              sum myvar if etiology == 1 | etiology == 4
              Hemanshu Kumar
              I have applied this if command it is giving me the results after adding the groups specified. I need separate statistics for rep78=1 and rep78=2

              Repair |
              record 1978 | Freq. Percent Cum.
              ------------+-----------------------------------
              1 | 2 2.90 2.90
              2 | 8 11.59 14.49
              3 | 30 43.48 57.97
              4 | 18 26.09 84.06
              5 | 11 15.94 100.00
              ------------+-----------------------------------
              Total | 69 100.00

              . sum price if rep78==1| rep78==2

              Variable | Obs Mean Std. dev. Min Max
              -------------+---------------------------------------------------------
              price | 10 5687 3216.375 3667 14500

              When I have selected rep78=1 and rep78=2, it is adding the numbers and giving combined statistics.

              Comment


              • #8
                Code:
                help by

                Comment


                • #9
                  Not much of an expert, but you can consider using `preserve`,`keep if` and `restore`. Might be helpful in analysing them separately. Before anything else keep a copy of your data

                  Comment


                  • #10
                    Originally posted by Harrison Ochieng View Post
                    Not much of an expert, but you can consider using `preserve`,`keep if` and `restore`. Might be helpful in analysing them separately. Before anything else keep a copy of your data
                    thanks

                    Comment


                    • #11
                      Originally posted by Hemanshu Kumar View Post
                      Code:
                      help by
                      thank you

                      Comment


                      • #12
                        Ankit:
                        you may want to consider what follows:
                        Code:
                        . use "C:\Program Files\Stata18\ado\base\a\auto.dta"
                        (1978 automobile data)
                        
                        *Reply to your 1st question*
                        . bysort foreign: sum price
                        
                        -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                        -> foreign = Domestic
                        
                            Variable |        Obs        Mean    Std. dev.       Min        Max
                        -------------+---------------------------------------------------------
                               price |         52    6072.423    3097.104       3291      15906
                        
                        -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                        -> foreign = Foreign
                        
                            Variable |        Obs        Mean    Std. dev.       Min        Max
                        -------------+---------------------------------------------------------
                               price |         22    6384.682    2621.915       3748      12990
                        
                        *Reply to your 2nd question*
                        . gen butler=0 if rep78<=3
                        
                        . replace butler=1 if rep78>3
                        
                        . bysort butler: sum price
                        
                        -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                        -> butler = 0
                        
                            Variable |        Obs        Mean    Std. dev.       Min        Max
                        -------------+---------------------------------------------------------
                               price |         40    6243.675     3425.43       3291      15906
                        
                        -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                        -> butler = 1
                        
                            Variable |        Obs        Mean    Std. dev.       Min        Max
                        -------------+---------------------------------------------------------
                               price |         34        6073    2315.435       3748      12990
                        
                        
                        .
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Originally posted by Carlo Lazzaro View Post
                          Ankit:
                          you may want to consider what follows:
                          Code:
                          . use "C:\Program Files\Stata18\ado\base\a\auto.dta"
                          (1978 automobile data)
                          
                          *Reply to your 1st question*
                          . bysort foreign: sum price
                          
                          -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                          -> foreign = Domestic
                          
                          Variable | Obs Mean Std. dev. Min Max
                          -------------+---------------------------------------------------------
                          price | 52 6072.423 3097.104 3291 15906
                          
                          -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                          -> foreign = Foreign
                          
                          Variable | Obs Mean Std. dev. Min Max
                          -------------+---------------------------------------------------------
                          price | 22 6384.682 2621.915 3748 12990
                          
                          *Reply to your 2nd question*
                          . gen butler=0 if rep78<=3
                          
                          . replace butler=1 if rep78>3
                          
                          . bysort butler: sum price
                          
                          -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                          -> butler = 0
                          
                          Variable | Obs Mean Std. dev. Min Max
                          -------------+---------------------------------------------------------
                          price | 40 6243.675 3425.43 3291 15906
                          
                          -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                          -> butler = 1
                          
                          Variable | Obs Mean Std. dev. Min Max
                          -------------+---------------------------------------------------------
                          price | 34 6073 2315.435 3748 12990
                          
                          
                          .

                          Thank you for your response. I appreciate the answer to my first question. I would like to rephrase my second question. In a hypothetical scenario, if I have three treatment groups (grp 1, grp 2, and grp 3) in a single column, and I want to compare the weight between grp 1 and grp 3 using an independent t-test, without considering grp 2. Do I need to create a separate column for grp 1 and grp 3 in order to perform the analysis, or is there a way to select the desired groups in a tool like SPSS and have the remaining group automatically excluded from the analysis? I would greatly appreciate your insights on this matter.

                          Comment


                          • #14
                            Ankit:
                            Code:
                             use "C:\Program Files\Stata18\ado\base\a\auto.dta"
                            (1978 automobile data)
                            
                            . gen butler=0 if rep78==3
                            
                            
                            . replace butler=1 if rep78==4
                            
                            . ttest price, by(butler) unequal
                            
                            Two-sample t test with unequal variances
                            ------------------------------------------------------------------------------
                               Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
                            ---------+--------------------------------------------------------------------
                                   0 |      30    6429.233    643.5995     3525.14    5112.924    7745.542
                                   1 |      18      6071.5    402.9585    1709.608    5221.332    6921.668
                            ---------+--------------------------------------------------------------------
                            Combined |      48    6295.083    427.0852    2958.933    5435.899    7154.268
                            ---------+--------------------------------------------------------------------
                                diff |            357.7333    759.3392               -1172.108    1887.574
                            ------------------------------------------------------------------------------
                                diff = mean(0) - mean(1)                                      t =   0.4711
                            H0: diff = 0                     Satterthwaite's degrees of freedom =  44.5217
                            
                                Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                             Pr(T < t) = 0.6801         Pr(|T| > |t|) = 0.6399          Pr(T > t) = 0.3199
                            
                            .
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Originally posted by Ankit Bhardwaj View Post
                              In a hypothetical scenario, if I have three treatment groups (grp 1, grp 2, and grp 3) in a single column, and I want to compare the weight between grp 1 and grp 3 using an independent t-test, without considering grp 2. Do I need to create a separate column for grp 1 and grp 3 in order to perform the analysis, or is there a way to select the desired groups in a tool like SPSS and have the remaining group automatically excluded from the analysis? I would greatly appreciate your insights on this matter.
                              To add to Carlo's illustration, note that the t-test is a pairwise test, so you cannot have a larger sample than that implied by two groups. In a situation with more than two groups, you need to use the -if- qualifier, as the following example illustrates.

                              Code:
                              sysuse auto, clear
                              *1 vs. 2
                              ttest mpg if inlist(rep78, 1, 2), by(rep78)
                              
                              *1 vs. 4
                              ttest mpg if inlist(rep78, 1, 4), by(rep78)
                              Res.:

                              Code:
                              . *1 vs. 2
                              
                              . 
                              . ttest mpg if inlist(rep78, 1, 2), by(rep78)
                              
                              Two-sample t test with equal variances
                              ------------------------------------------------------------------------------
                                 Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
                              ---------+--------------------------------------------------------------------
                                     1 |       2          21           3    4.242641   -17.11861    59.11861
                                     2 |       8      19.125    1.328768    3.758324    15.98296    22.26704
                              ---------+--------------------------------------------------------------------
                              Combined |      10        19.5    1.166667    3.689324    16.86082    22.13918
                              ---------+--------------------------------------------------------------------
                                  diff |               1.875    3.021731               -5.093125    8.843125
                              ------------------------------------------------------------------------------
                                  diff = mean(1) - mean(2)                                      t =   0.6205
                              H0: diff = 0                                     Degrees of freedom =        8
                              
                                  Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                               Pr(T < t) = 0.7239         Pr(|T| > |t|) = 0.5522          Pr(T > t) = 0.2761
                              
                              . 
                              . 
                              . 
                              . *1 vs. 4
                              
                              . 
                              . ttest mpg if inlist(rep78, 1, 4), by(rep78)
                              
                              Two-sample t test with equal variances
                              ------------------------------------------------------------------------------
                                 Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
                              ---------+--------------------------------------------------------------------
                                     1 |       2          21           3    4.242641   -17.11861    59.11861
                                     4 |      18    21.66667     1.16316     4.93487    19.21261    24.12072
                              ---------+--------------------------------------------------------------------
                              Combined |      20        21.6    1.067215     4.77273    19.36629    23.83371
                              ---------+--------------------------------------------------------------------
                                  diff |           -.6666667    3.651484               -8.338149    7.004816
                              ------------------------------------------------------------------------------
                                  diff = mean(1) - mean(4)                                      t =  -0.1826
                              H0: diff = 0                                     Degrees of freedom =       18
                              
                                  Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                               Pr(T < t) = 0.4286         Pr(|T| > |t|) = 0.8572          Pr(T > t) = 0.5714

                              Comment

                              Working...
                              X