how to create a two way table with multiple variables

Jessica Maitland

Join Date: Jun 2016

Posts: 5
#1

how to create a two way table with multiple variables

27 Jun 2016, 08:38

Hello,

I am new to STATA and cant seem to create a frequency table in the format that I need. I want to create a single two-way table that shows the frequency of multiple variables by age category if in a certain year.

example:
0-4 4-8 8-12 ....

var1

var2

var3

var4

....

I have tried to use tabmult to do this, but it does not display age categories that have a count of zero, and it also displays the variables (which are all dummy variables) separated into 0 and 1. I want the table to show just the counts (variable = 1) in each age category. I have also tried tabm but when I put it by age category, I get an error saying "option by not allowed"

Any ideas?
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

27 Jun 2016, 08:46

Hello Jessica,

Welcome to the Stata Forum.

You may try - tabstat - commands with the statistics option as stat(count) or stat(n).

Best,

Marcos

Best regards,

Marcos
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#3

27 Jun 2016, 08:47

I don't understand your question. What, for example, goes in the var1 X 0-4 cell? Is it the count of observations where var1 takes on a value in the range 0-4? If so, does a value of 4 for var1 belong in the first column or the second, or do you want to have it in both (double-counting)?

It might be best to post a small sample of your data (please use -dataex-, -ssc install dataex-, -help dataex-), and the corresponding hand-worked result you want.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35722
#4

27 Jun 2016, 08:53

Please do study the FAQ Advice as we request all posters to do, especially

http://www.statalist.org/forums/help#stata

http://www.statalist.org/forums/help#spelling

As Clyde says, we don't get a clear picture of your data or of exactly what you want.

tabmult where does this come from? exactly what syntax did you try?

tabm where does this come from? exactly what syntax did you try?
1 like
Comment
Jessica Maitland

Join Date: Jun 2016

Posts: 5
#5

27 Jun 2016, 08:53

The variables are causes of death and 0-4 is an age category, so inside the cells are the number of deaths by the specified cause of death in that age group.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

27 Jun 2016, 09:00

The variables are causes of death and 0-4 is an age category, so inside the cells are the number of deaths by the specified cause of death in that age group.

That's somewhat clearer but still leaves me guessing. Do you mean that each of var1 to var4 is an indicator variable taking on the value 1 for some particular cuase of death and 0 for others? And is each of 0-4, 4-8, and 8-12 a separate 0/1 variable as well? Or are those levels of a single variable?

This really would go a lot faster if you would just post a small sample of your data.
1 like
Comment
Jessica Maitland

Join Date: Jun 2016

Posts: 5
#7

27 Jun 2016, 09:11

Sorry for the confusion! Please bear with me, the data is confidential and it is difficult for me to install packages in order to export any data samples as the computer with STATA is not equipped with USB or internet access.
I will try to provide a clearer example. Lets say, var1 = heart attack, if var1 == 1 then the individual died of a heart attack and so if var1 == 0 then the individual did not die of a heart attack. So var1 to var4 are causes of death such as heart attack, coded to be 1 or 0. 0-4, 4-8, 8-12 are age categories. I have exact ages, but grouped them by ranges to be a new variable "agecat"

For example -
50-54 55-59 60-64

heart attack 6 12 18

stroke 4 8 12

this fictitious data would then say that 6 people, within the age group of 50-54 years, died of a heart attack.
This is the format of the data I would like, but with around 10 causes of death and 18 age categories. I do not want to include any summary data other than frequencies and I do not want to include the negative events for each variable (example, i do not want to display when heart attacks did not occur, or when var1 == 0).

I hope that provides a clearer picture! Again, I am sorry for the inconvenience.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35722

27 Jun 2016, 09:25

Stata FAQ Advice #12 cited earlier addresses your situation already:

If your dataset is confidential, then provide a fake example instead.

You showed us the result of a tabulation; from that we can fake an example (but we shouldn't have to do that).

Code:

clear 
input var1 var2 agecat freq 
1 0 10 6 
1 0 11 12 
1 0 12 18 
0 1 10 4 
0 1 11 8 
0 1 12 12 
end 
expand freq 
label var var1 "heart attack" 
label var var2 "stroke" 
label def agecat 10 "50-54"  11 "55-59"    12 "60-64" 
label val agecat agecat 

* you need to start around here; if you already have an identifier, 
* you don't need to create one 
gen long id = _n 
reshape long var, i(id) j(condition) 
keep if var == 1 
label def condition 1 "heart attack" 2 "stroke"
label val condition condition 
tab condition agecat 

            |              agecat
   condition |     50-54      55-59      60-64 |     Total
-------------+---------------------------------+----------
heart attack |         6         12         18 |        36 
      stroke |         4          8         12 |        24 
-------------+---------------------------------+----------
       Total |        10         20         30 |        60

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

27 Jun 2016, 09:47

Ok, I think I understand your data layout now. Here's a toy data set that, if I understand correctly, resembles your data:

Code:

// CREATE TOY DATA
clear*
set obs 12
gen age_cat = "0-4" in 1/4
replace age_cat = "4-8" in 5/8
replace age_cat = "8-12" in 9/12
set seed 1234
foreach x in heart_attack stroke pulmonary renal {
    gen byte `x' = runiform() < 0.2
}

list, noobs clean abbr(16)

If that's right, you can get a table almost like what you asked for with just this:

Code:

collapse (sum) heart_attack stroke pulmonary renal, by(age_cat)
list, noobs clean abbr(16)

Now, that's "on its side" compared to what you asked for. If it's important to have it the way you asked for it, you can get from here to there with:

Code:

rename (heart_attack stroke pulmonary renal) dx_count=
reshape long dx_count, i(age_cat) j(disease) string
tabdisp disease age_cat, c(dx_count)

Comment

Jessica Maitland

Join Date: Jun 2016

Posts: 5
#10

27 Jun 2016, 10:58

Thank you Nick and Clyde!!

Clyde, your method works perfectly with my data! Except for one thing, the table is in the exact format I need it to be in, but I also need it by year and place. For example, I want to have a frequency table, which shows deaths by certain causes per age group, in each of the provinces of Canada in each year from 1995-2000. At the end, I would like a set of 5 frequency tables for each province, each year being in its own table.
I tried:

collapse (sum) heart_attack stroke pulmonary renal, by(age_cat death_year province)
list, noobs clean abbr(16)
rename (heart_attack stroke pulmonary renal) dx_count=
reshape long dx_count, i(age_cat) j(disease) string

but then I get an error saying "variable id does not uniquely identify the observations"
So, I then tried:

gen long id = _n
reshape long dx_count, i(id) j(disease) string
tabdisp disease age_cat, c(dx_count) if death_year == 1995 & province == Ontario

but got the error:
option if not allowed
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35722
#11

27 Jun 2016, 11:22

Your first syntax does not even mention id. So, the error message doesn't tally with the syntax.

Code:

tabdisp disease age_cat, c(dx_count) if death_year == 1995 & province == Ontario

The if qualifier is in the wrong place. Stata is telling you that it thinks you think if is part of an option. Also, unless you have a variable Ontario, you have a problem. Perhaps

Code:

tabdisp disease age_cat if death_year == 1995 & province == "Ontario", , c(dx_count)
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

#12

27 Jun 2016, 11:31

So it's like this:

Code:

// CREATE TOY DATA
clear*
set obs 3
gen country = "USA" in 1
replace country = "UK" in 2
replace country = "DEN" in 3

expand 5
by country, sort: gen year = 1999 + _n


expand 12
by country year, sort: gen age_cat = "0-4" if mod(_n, 3) == 0
replace age_cat = "4-8" if mod(_n, 3) == 1
replace age_cat = "8-12" if mod(_n, 3) == 2
gen obs_no = _n
set seed 1234
foreach x in heart_attack stroke pulmonary renal {
    gen byte `x' = runiform() < 0.2
}

// CREATE THE TABLES

collapse (sum) heart_attack stroke pulmonary renal, by(age_cat country year)

rename (heart_attack stroke pulmonary renal) dx_count=
reshape long dx_count, i(age_cat country year) j(disease) string
tabdisp disease age_cat, c(dx_count)

by country year, sort: tabdisp disease age_cat, c(dx_count)

Now, you have been asked nicely, twice by me, and reminded by Nick, that you could have facilitated this process by posting samples of your own data (or fake data with the same layout). Between us, Nick and I have created three fake data sets to illustrate the code. I think you are stretching our generosity to its limits here. It's one thing if you didn't at first realize that seeing the layout of the data is important to finding a solution, but after you've been prodded three times, you really ought to be responsive. Going forward, be helpful to those who are trying to help you.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35722
#13

27 Jun 2016, 11:34

I agree with Clyde. (So you don't think I am the nice one.)
Comment
Jessica Maitland

Join Date: Jun 2016

Posts: 5
#14

27 Jun 2016, 12:08

Thank you Nick, that fixed the problem. I apologize for not facilitating the process better. I am very new to stata and was just trying to work within my limitations. I appreciate all the support you both provided me.
Comment
Tiago Munhoz

Join Date: Mar 2015

Posts: 16
#15

27 Sep 2019, 15:54

Hello. Is there a way to include p-value using tabmult? Any tips?
Thanks in advance
Comment

	0-4	4-8	8-12	....
var1
var2
var3
var4
....

Announcement