Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analysis by Group

    input double(YBIRTH SEX CLINICALVAR) float(age CONDITION Group)
    2002 1 5 18 1 1
    2002 2 5 18 1 1
    2002 1 5 18 1 1
    2002 1 4 18 1 1
    2002 1 2 18 2 1
    2003 1 4 17 1 1
    2002 1 5 18 0 1
    2002 2 5 18 0 1
    2002 1 2 18 1 1
    2003 2 5 17 1 1
    2002 1 1 18 2 1
    2002 1 2 18 2 1
    2003 2 1 17 0 1
    2002 2 3 18 2 1
    2003 1 2 17 2 1
    2003 1 5 17 0 1
    2003 2 5 17 1 1
    2003 1 5 17 1 1
    2002 1 5 18 0 1
    2003 2 5 17 0 2
    2003 1 5 17 1 2
    2003 2 4 17 1 2
    2002 1 1 18 1 2
    2003 1 2 17 0 2
    2002 2 2 18 2 2
    2003 1 4 17 1 2
    2003 2 2 17 2 2
    2004 1 5 16 0 2
    2003 1 5 17 1 2

    To explain my dataset, I have 6 variables, year of birth, sex, clinical variable, age, condition and group. How do I split the dataset by group i.e. 1 or 2 so that I can then determine what impact each group has on the other variables? i.e. how group 1 or 2 is different depending on a person's age, sex, clinical history, condition etc. please? Also, if i wish to conduct a regression analysis to determine how much each variable in each group is predicted by ownership to group 1 or 2 am i best to do this before I split the dataset (if I can)? Many thanks for your help in advance, I'm used to SPSS so I hope I explained myself clearly.

  • #2
    Thanks for the helpful data example.

    That's a rather general question which to me implies that you need some guidance from teachers, supervisor, advisor, mentor, or more technical colleagues, depending on your situation.

    I see no need to split the dataset here in any sense. It is already split (distinguished) by group, which appears to be a major focus, and dividing into different datasets would only make some analyses more difficult and other analyses impossible.

    Your first need is for some kind of descriptive overview. This sample code produces graphs that presumably should be more interesting with your full dataset.

    Code:
    clear 
    input double(YBIRTH SEX CLINICALVAR) float(age CONDITION Group)
    2002 1 5 18 1 1
    2002 2 5 18 1 1
    2002 1 5 18 1 1
    2002 1 4 18 1 1
    2002 1 2 18 2 1
    2003 1 4 17 1 1
    2002 1 5 18 0 1
    2002 2 5 18 0 1
    2002 1 2 18 1 1
    2003 2 5 17 1 1
    2002 1 1 18 2 1
    2002 1 2 18 2 1
    2003 2 1 17 0 1
    2002 2 3 18 2 1
    2003 1 2 17 2 1
    2003 1 5 17 0 1
    2003 2 5 17 1 1
    2003 1 5 17 1 1
    2002 1 5 18 0 1
    2003 2 5 17 0 2
    2003 1 5 17 1 2
    2003 2 4 17 1 2
    2002 1 1 18 1 2
    2003 1 2 17 0 2
    2002 2 2 18 2 2
    2003 1 4 17 1 2
    2003 2 2 17 2 2
    2004 1 5 16 0 2
    2003 1 5 17 1 2
    end 
    
    set scheme s1color 
    local j = 1 
    local graphs 
    foreach v in Y S a  CO CL {
        levelsof `v'
        histogram `v', horizontal freq by(G, note("")) discrete name(G`j', replace) yla(`r(levels)', ang(h)) bfcolor(blue*0.2) blcolor(blue)
        local graphs `graphs' G`j'
        local ++j 
    }
    
    graph combine `graphs'
    Click image for larger version

Name:	multihisto.png
Views:	1
Size:	25.7 KB
ID:	1585911

    Comment


    • #3
      Great thank you. Sorry I feel like a novice. Might you have a manual you recommend? Perhaps i need to start there. In the above example what does j refer to?

      Comment


      • #4
        You're totally in order to feel like a novice, and we all started that way. If you want to get started with Stata, one recommendation


        help >> PDF documented >> [GS] Getting Started

        If it's introductory statistics you want, I guess from your example that a congenial medical statistics text is where to start/


        Code:
        local j = 1
        is part of a loop. For more on loops see e.g. https://www.stata-journal.com/articl...article=pr0005
        Last edited by Nick Cox; 14 Dec 2020, 10:19.

        Comment


        • #5
          Sorry me again, hopefully for the final time. if I wanted to have graphs showing percentages or density rather than frequency as above? Would what would I change "freq" to in the below example please? When I use the command help percent, to learn more, Stata doesn't seem to recognise this.

          set scheme s1color
          local j = 1
          local graphs
          foreach v in YBIRTH SEX age SPAS_TYPE GMFCS_PROFESSIONAL {
          levels of `v'
          histogram `v', horizontal freq by(Group, note("")) discrete name(Group`j', replace) yla(`r(levels)', ang(h)) bfcolor(blue*0.2) blcolor(blue)
          local graphs `graphs' Group`j'
          local ++j
          }
          graph combine `graphs'

          Comment


          • #6
            In this context, the command of interest is -histogram-, so -help histogram- is what you want to consult. In that documentation, you will see that -histogram- includes density, fraction, frequency, and percent as what Stata calls "options" to that command, so you can change "freq" to e.g., "percent" .
            For the most part, Stata's help is structured around *commands*, so you want first to look for help on the command, not its options.

            Comment

            Working...
            X