Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Contract variables with statistics of another variable, or combination of contract and collapse

    Dear Stata users,

    My following question may be easy but I could not find a direct command to achieve it. We all know there are two command -- contract & collapse -- that are powerful in creating frequencies and summary statistics. What I want is a way to create frequencies of some variables, and in the same time, to create summary statistics of another certain variable. It would be in someway a combination of contract and collapse. The result will like this:
    Code:
    sysuse bplong
    preserve
    contract sex agegrp when
    restore
    preserve
    collapse bp, by( sex agegrp when )
    restore
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(sex agegroup when freq) float bp
    0 1 1 20 153.45
    0 1 2 20 146.45
    0 2 1 20 159.05
    0 2 2 20 157.25
    0 3 1 20  165.3
    0 3 2 20 162.85
    1 1 1 20  149.9
    1 1 2 20  142.2
    1 2 1 20 151.15
    1 2 2 20  144.3
    1 3 1 20 159.85
    1 3 2 20  155.1
    end
    label values sex sex
    label def sex 0 "Male", modify
    label def sex 1 "Female", modify
    label values agegroup agegroup
    label def agegroup 1 "30-45", modify
    label def agegroup 2 "46-59", modify
    label def agegroup 3 "60+", modify
    label values when when
    label def when 1 "Before", modify
    label def when 2 "After", modify

  • #2
    What about the new table command?

    Code:
    . sysuse bplong, clear
    (Fictional blood-pressure data)
    
    .
    . table (sex agegr when)(), nototal stat(freq) stat(mean bp)
    
    --------------------------------------
                     |  Frequency     Mean
    -----------------+--------------------
    Sex              |                    
      Male           |                    
        Age group    |                    
          30-45      |                    
            Status   |                    
              Before |         20   153.45
              After  |         20   146.45
          46-59      |                    
            Status   |                    
              Before |         20   159.05
              After  |         20   157.25
          60+        |                    
            Status   |                    
              Before |         20    165.3
              After  |         20   162.85
      Female         |                    
        Age group    |                    
          30-45      |                    
            Status   |                    
              Before |         20    149.9
              After  |         20    142.2
          46-59      |                    
            Status   |                    
              Before |         20   151.15
              After  |         20    144.3
          60+        |                    
            Status   |                    
              Before |         20   159.85
              After  |         20    155.1
    --------------------------------------
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thank you very much Maarten Buis, the new -table- command seems good. It is wise that Stata rewrote -table-. Hope same (community contributed) command will be suitable for Stata 16 soon.

      Comment


      • #4
        the statistic count may be useful,
        Code:
        collapse bp (count) patient, by(sex agegrp when)

        Comment


        • #5
          Øyvind Snilsberg, thank you so much. Your code above is great! It also gives total observations (N) in the last line and make it easily to caculate percentages. Thank you again.

          Comment


          • #6
            Here is a way using estout from SSC/ Stata Journal. You can further customize the look of the table.

            Code:
            sysuse bplong, clear
            egen group= group(sex age when), label
            estpost tabstat bp, by(group) stats(mean count)
            esttab ., cells("count mean") noobs nonumb drop("Total") mlab(none) collab(Frequency Mean) varwidth(20)
            Res.:

            Code:
            . esttab ., cells("count mean") noobs nonumb drop("Total") mlab(none) collab(Frequency Mean) varwidth(20)
            
            ----------------------------------------------
                                    Frequency         Mean
            ----------------------------------------------
            Male 30-45 Before              20       153.45
            Male 30-45 After               20       146.45
            Male 46-59 Before              20       159.05
            Male 46-59 After               20       157.25
            Male 60+ Before                20        165.3
            Male 60+ After                 20       162.85
            Female 30-45 Before            20        149.9
            Female 30-45 After             20        142.2
            Female 46-59 Before            20       151.15
            Female 46-59 After             20        144.3
            Female 60+ Before              20       159.85
            Female 60+ After               20        155.1
            ----------------------------------------------

            Comment


            • #7
              Thank you Andrew, I have installed -estout- since 2014 but never explored further like you. I only use it to export regression results. I will spend more time on it(s functionalities).
              Last edited by Chen Samulsion; 15 Nov 2021, 03:24.

              Comment


              • #8
                Interesting for me to recall my original motive for writing contract. (That was originally published as collfreq, thus avoiding any standard English word as a command name, as enjoined by StataCorp. Once the command was folded into official Stata, the name contract was readily available as the opposite of expand.)

                The motive
                was wanting to avoid work-arounds in collapse to get cell frequencies.

                As a twist on the fine approach in #4, which will work generally if not universally, note that count counts non-missing values whereas sometimes you want the number of observations, which may be different. A further work-around is exemplified here

                Code:
                sysuse bplong, clear 
                gen one = 1 
                collapse (sum) freq=one (count) nonmiss=bp (mean) mean=bp, by( sex agegrp when )
                The result is the same, but if no identifier exists, you need to do that, or to create an identifier too.

                Comment


                • #9
                  Dear Nick Cox, I'm so glad that my question bring you back to memories of early days. I do some search work and find that command -collfreq- was published as dm59 in stb44 in June 1998. And it was officially introduced into as new features by Stata 6 https://www.stata.com/stata6/ .
                  And I am embarrassed that the motive behind my question is against the design purpose of -contract-, i.e. avoiding work-arounds in -collapse- to get cell frequencies.
                  At all events, thank you for your caution and complement to @Øyvind Snilsberg's code in #4, code seems even more magical now.
                  Last edited by Chen Samulsion; 15 Nov 2021, 06:13.

                  Comment


                  • #10
                    No need or call for embarrassment here. collapse and contract have goals that overlap but that isn't in any sense your fault, or even mine... An extra point is that when it is what people want contract is more direct.

                    Comment

                    Working...
                    X