Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to create a two way table with multiple variables

    Hello,

    I am new to STATA and cant seem to create a frequency table in the format that I need. I want to create a single two-way table that shows the frequency of multiple variables by age category if in a certain year.

    example:
    0-4 4-8 8-12 ....
    var1
    var2
    var3
    var4
    ....

    I have tried to use tabmult to do this, but it does not display age categories that have a count of zero, and it also displays the variables (which are all dummy variables) separated into 0 and 1. I want the table to show just the counts (variable = 1) in each age category. I have also tried tabm but when I put it by age category, I get an error saying "option by not allowed"

    Any ideas?

  • #2
    Hello Jessica,

    Welcome to the Stata Forum.

    You may try - tabstat - commands with the statistics option as stat(count) or stat(n).

    Best,

    Marcos
    Best regards,

    Marcos

    Comment


    • #3
      I don't understand your question. What, for example, goes in the var1 X 0-4 cell? Is it the count of observations where var1 takes on a value in the range 0-4? If so, does a value of 4 for var1 belong in the first column or the second, or do you want to have it in both (double-counting)?

      It might be best to post a small sample of your data (please use -dataex-, -ssc install dataex-, -help dataex-), and the corresponding hand-worked result you want.

      Comment


      • #4
        Please do study the FAQ Advice as we request all posters to do, especially

        http://www.statalist.org/forums/help#stata

        http://www.statalist.org/forums/help#spelling

        As Clyde says, we don't get a clear picture of your data or of exactly what you want.

        tabmult where does this come from? exactly what syntax did you try?

        tabm where does this come from? exactly what syntax did you try?

        Comment


        • #5
          The variables are causes of death and 0-4 is an age category, so inside the cells are the number of deaths by the specified cause of death in that age group.

          Comment


          • #6
            The variables are causes of death and 0-4 is an age category, so inside the cells are the number of deaths by the specified cause of death in that age group.
            That's somewhat clearer but still leaves me guessing. Do you mean that each of var1 to var4 is an indicator variable taking on the value 1 for some particular cuase of death and 0 for others? And is each of 0-4, 4-8, and 8-12 a separate 0/1 variable as well? Or are those levels of a single variable?

            This really would go a lot faster if you would just post a small sample of your data.

            Comment


            • #7
              Sorry for the confusion! Please bear with me, the data is confidential and it is difficult for me to install packages in order to export any data samples as the computer with STATA is not equipped with USB or internet access.
              I will try to provide a clearer example. Lets say, var1 = heart attack, if var1 == 1 then the individual died of a heart attack and so if var1 == 0 then the individual did not die of a heart attack. So var1 to var4 are causes of death such as heart attack, coded to be 1 or 0. 0-4, 4-8, 8-12 are age categories. I have exact ages, but grouped them by ranges to be a new variable "agecat"

              For example -
              50-54 55-59 60-64
              heart attack 6 12 18
              stroke 4 8 12
              this fictitious data would then say that 6 people, within the age group of 50-54 years, died of a heart attack.
              This is the format of the data I would like, but with around 10 causes of death and 18 age categories. I do not want to include any summary data other than frequencies and I do not want to include the negative events for each variable (example, i do not want to display when heart attacks did not occur, or when var1 == 0).

              I hope that provides a clearer picture! Again, I am sorry for the inconvenience.

              Comment


              • #8
                Stata FAQ Advice #12 cited earlier addresses your situation already:

                If your dataset is confidential, then provide a fake example instead.
                You showed us the result of a tabulation; from that we can fake an example (but we shouldn't have to do that).

                Code:
                clear 
                input var1 var2 agecat freq 
                1 0 10 6 
                1 0 11 12 
                1 0 12 18 
                0 1 10 4 
                0 1 11 8 
                0 1 12 12 
                end 
                expand freq 
                label var var1 "heart attack" 
                label var var2 "stroke" 
                label def agecat 10 "50-54"  11 "55-59"    12 "60-64" 
                label val agecat agecat 
                
                * you need to start around here; if you already have an identifier, 
                * you don't need to create one 
                gen long id = _n 
                reshape long var, i(id) j(condition) 
                keep if var == 1 
                label def condition 1 "heart attack" 2 "stroke"
                label val condition condition 
                tab condition agecat 
                
                            |              agecat
                   condition |     50-54      55-59      60-64 |     Total
                -------------+---------------------------------+----------
                heart attack |         6         12         18 |        36 
                      stroke |         4          8         12 |        24 
                -------------+---------------------------------+----------
                       Total |        10         20         30 |        60

                Comment


                • #9
                  Ok, I think I understand your data layout now. Here's a toy data set that, if I understand correctly, resembles your data:

                  Code:
                  // CREATE TOY DATA
                  clear*
                  set obs 12
                  gen age_cat = "0-4" in 1/4
                  replace age_cat = "4-8" in 5/8
                  replace age_cat = "8-12" in 9/12
                  set seed 1234
                  foreach x in heart_attack stroke pulmonary renal {
                      gen byte `x' = runiform() < 0.2
                  }
                  
                  list, noobs clean abbr(16)
                  If that's right, you can get a table almost like what you asked for with just this:
                  Code:
                  collapse (sum) heart_attack stroke pulmonary renal, by(age_cat)
                  list, noobs clean abbr(16)
                  Now, that's "on its side" compared to what you asked for. If it's important to have it the way you asked for it, you can get from here to there with:
                  Code:
                  rename (heart_attack stroke pulmonary renal) dx_count=
                  reshape long dx_count, i(age_cat) j(disease) string
                  tabdisp disease age_cat, c(dx_count)

                  Comment


                  • #10
                    Thank you Nick and Clyde!!

                    Clyde, your method works perfectly with my data! Except for one thing, the table is in the exact format I need it to be in, but I also need it by year and place. For example, I want to have a frequency table, which shows deaths by certain causes per age group, in each of the provinces of Canada in each year from 1995-2000. At the end, I would like a set of 5 frequency tables for each province, each year being in its own table.
                    I tried:

                    collapse (sum) heart_attack stroke pulmonary renal, by(age_cat death_year province)
                    list, noobs clean abbr(16)
                    rename (heart_attack stroke pulmonary renal) dx_count=
                    reshape long dx_count, i(age_cat) j(disease) string


                    but then I get an error saying "variable id does not uniquely identify the observations"
                    So, I then tried:

                    gen long id = _n
                    reshape long dx_count, i(id) j(disease) string
                    tabdisp disease age_cat, c(dx_count) if death_year == 1995 & province == Ontario


                    but got the error:
                    option if not allowed

                    Comment


                    • #11
                      Your first syntax does not even mention id. So, the error message doesn't tally with the syntax.

                      Code:
                      tabdisp disease age_cat, c(dx_count) if death_year == 1995 & province == Ontario
                      The if qualifier is in the wrong place. Stata is telling you that it thinks you think if is part of an option. Also, unless you have a variable Ontario, you have a problem. Perhaps

                      Code:
                      tabdisp disease age_cat if death_year == 1995 & province == "Ontario", , c(dx_count)

                      Comment


                      • #12
                        So it's like this:
                        Code:
                        // CREATE TOY DATA
                        clear*
                        set obs 3
                        gen country = "USA" in 1
                        replace country = "UK" in 2
                        replace country = "DEN" in 3
                        
                        expand 5
                        by country, sort: gen year = 1999 + _n
                        
                        
                        expand 12
                        by country year, sort: gen age_cat = "0-4" if mod(_n, 3) == 0
                        replace age_cat = "4-8" if mod(_n, 3) == 1
                        replace age_cat = "8-12" if mod(_n, 3) == 2
                        gen obs_no = _n
                        set seed 1234
                        foreach x in heart_attack stroke pulmonary renal {
                            gen byte `x' = runiform() < 0.2
                        }
                        
                        // CREATE THE TABLES
                        
                        collapse (sum) heart_attack stroke pulmonary renal, by(age_cat country year)
                        
                        rename (heart_attack stroke pulmonary renal) dx_count=
                        reshape long dx_count, i(age_cat country year) j(disease) string
                        tabdisp disease age_cat, c(dx_count)
                        
                        by country year, sort: tabdisp disease age_cat, c(dx_count)
                        Now, you have been asked nicely, twice by me, and reminded by Nick, that you could have facilitated this process by posting samples of your own data (or fake data with the same layout). Between us, Nick and I have created three fake data sets to illustrate the code. I think you are stretching our generosity to its limits here. It's one thing if you didn't at first realize that seeing the layout of the data is important to finding a solution, but after you've been prodded three times, you really ought to be responsive. Going forward, be helpful to those who are trying to help you.

                        Comment


                        • #13
                          I agree with Clyde. (So you don't think I am the nice one.)

                          Comment


                          • #14
                            Thank you Nick, that fixed the problem. I apologize for not facilitating the process better. I am very new to stata and was just trying to work within my limitations. I appreciate all the support you both provided me.

                            Comment


                            • #15
                              Hello. Is there a way to include p-value using tabmult? Any tips?
                              Thanks in advance

                              Comment

                              Working...
                              X