Analysis by Group

Mary McCavert

Join Date: Nov 2020

Posts: 24
#1

Analysis by Group

14 Dec 2020, 07:51

input double(YBIRTH SEX CLINICALVAR) float(age CONDITION Group)
2002 1 5 18 1 1
2002 2 5 18 1 1
2002 1 5 18 1 1
2002 1 4 18 1 1
2002 1 2 18 2 1
2003 1 4 17 1 1
2002 1 5 18 0 1
2002 2 5 18 0 1
2002 1 2 18 1 1
2003 2 5 17 1 1
2002 1 1 18 2 1
2002 1 2 18 2 1
2003 2 1 17 0 1
2002 2 3 18 2 1
2003 1 2 17 2 1
2003 1 5 17 0 1
2003 2 5 17 1 1
2003 1 5 17 1 1
2002 1 5 18 0 1
2003 2 5 17 0 2
2003 1 5 17 1 2
2003 2 4 17 1 2
2002 1 1 18 1 2
2003 1 2 17 0 2
2002 2 2 18 2 2
2003 1 4 17 1 2
2003 2 2 17 2 2
2004 1 5 16 0 2
2003 1 5 17 1 2

To explain my dataset, I have 6 variables, year of birth, sex, clinical variable, age, condition and group. How do I split the dataset by group i.e. 1 or 2 so that I can then determine what impact each group has on the other variables? i.e. how group 1 or 2 is different depending on a person's age, sex, clinical history, condition etc. please? Also, if i wish to conduct a regression analysis to determine how much each variable in each group is predicted by ownership to group 1 or 2 am i best to do this before I split the dataset (if I can)? Many thanks for your help in advance, I'm used to SPSS so I hope I explained myself clearly.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35782

14 Dec 2020, 08:22

Thanks for the helpful data example.

That's a rather general question which to me implies that you need some guidance from teachers, supervisor, advisor, mentor, or more technical colleagues, depending on your situation.

I see no need to split the dataset here in any sense. It is already split (distinguished) by group, which appears to be a major focus, and dividing into different datasets would only make some analyses more difficult and other analyses impossible.

Your first need is for some kind of descriptive overview. This sample code produces graphs that presumably should be more interesting with your full dataset.

Code:

clear 
input double(YBIRTH SEX CLINICALVAR) float(age CONDITION Group)
2002 1 5 18 1 1
2002 2 5 18 1 1
2002 1 5 18 1 1
2002 1 4 18 1 1
2002 1 2 18 2 1
2003 1 4 17 1 1
2002 1 5 18 0 1
2002 2 5 18 0 1
2002 1 2 18 1 1
2003 2 5 17 1 1
2002 1 1 18 2 1
2002 1 2 18 2 1
2003 2 1 17 0 1
2002 2 3 18 2 1
2003 1 2 17 2 1
2003 1 5 17 0 1
2003 2 5 17 1 1
2003 1 5 17 1 1
2002 1 5 18 0 1
2003 2 5 17 0 2
2003 1 5 17 1 2
2003 2 4 17 1 2
2002 1 1 18 1 2
2003 1 2 17 0 2
2002 2 2 18 2 2
2003 1 4 17 1 2
2003 2 2 17 2 2
2004 1 5 16 0 2
2003 1 5 17 1 2
end 

set scheme s1color 
local j = 1 
local graphs 
foreach v in Y S a  CO CL {
    levelsof `v'
    histogram `v', horizontal freq by(G, note("")) discrete name(G`j', replace) yla(`r(levels)', ang(h)) bfcolor(blue*0.2) blcolor(blue)
    local graphs `graphs' G`j'
    local ++j 
}

graph combine `graphs'

Click image for larger version

Name: multihisto.png
Views: 1
Size: 25.7 KB
ID: 1585911

Comment

Mary McCavert

Join Date: Nov 2020

Posts: 24
#3

14 Dec 2020, 09:13

Great thank you. Sorry I feel like a novice. Might you have a manual you recommend? Perhaps i need to start there. In the above example what does j refer to?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35782
#4

14 Dec 2020, 10:06

You're totally in order to feel like a novice, and we all started that way. If you want to get started with Stata, one recommendation

help >> PDF documented >> [GS] Getting Started

If it's introductory statistics you want, I guess from your example that a congenial medical statistics text is where to start/

Code:

local j = 1

is part of a loop. For more on loops see e.g. https://www.stata-journal.com/articl...article=pr0005

Last edited by Nick Cox; 14 Dec 2020, 10:19.
Comment
Mary McCavert

Join Date: Nov 2020

Posts: 24
#5

15 Dec 2020, 07:52

Sorry me again, hopefully for the final time. if I wanted to have graphs showing percentages or density rather than frequency as above? Would what would I change "freq" to in the below example please? When I use the command help percent, to learn more, Stata doesn't seem to recognise this.

set scheme s1color
local j = 1
local graphs
foreach v in YBIRTH SEX age SPAS_TYPE GMFCS_PROFESSIONAL {
levels of `v'
histogram `v', horizontal freq by(Group, note("")) discrete name(Group`j', replace) yla(`r(levels)', ang(h)) bfcolor(blue*0.2) blcolor(blue)
local graphs `graphs' Group`j'
local ++j
}
graph combine `graphs'
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2423
#6

15 Dec 2020, 08:26

In this context, the command of interest is -histogram-, so -help histogram- is what you want to consult. In that documentation, you will see that -histogram- includes density, fraction, frequency, and percent as what Stata calls "options" to that command, so you can change "freq" to e.g., "percent" .
For the most part, Stata's help is structured around *commands*, so you want first to look for help on the command, not its options.
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment