Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tabulate two variables to show the value of a third variable; save dataset as matrix; multiple plots using plotmatrix

    Hi all,

    I have data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int region float(agecat tempjan)
    1 1  23.16087
    1 2  28.99518
    1 3 31.462856
    2 1 18.091358
    2 2  25.40435
    2 3  29.77333
    3 1  38.58561
    3 2  50.13235
    3 3  64.27907
    4 1  40.23812
    4 2  54.53425
    4 3  61.50435
    end
    label values region region
    label def region 1 "NE", modify
    label def region 2 "N Cntrl", modify
    label def region 3 "South", modify
    label def region 4 "West", modify
    label values agecat agecat
    label def agecat 1 "19-29", modify
    label def agecat 2 "30-34", modify
    label def agecat 3 "35+", modify
    The data contains three variables: the value of tempjan by region and agecat. How can I generate a matrix that contains the value of tempjan as like in a two-way contingency table?
    The content of the matrix should look like the inner part of the table below (in bold)
    Code:
    tabulate region agecat, summarize(tempjan) means
    
                       Means of Average January temperature
    
        Census |            agecat
        Region |     19-29      30-34        35+ |     Total
    -----------+---------------------------------+----------
            NE |  23.16087  28.995181  31.462857 | 27.885366
       N Cntrl | 18.091358  25.404348  29.773333 | 21.694366
         South | 38.585612  50.132353  64.279069 |   46.1456
          West | 40.238125  54.534247  61.504348 | 46.225391
    -----------+---------------------------------+----------
         Total | 31.159172  38.398101  47.122137 | 35.748952
    I am asking this question because I want to use plotmatrix to visualize my data, and this command requires us to generate a matrix first. My data is not really based on contingency table, i.e. the third variable tempjan does not refer to the frequency or percentage of any other variables, so I cannot use twoway tab here. As you can see above, I tried tab, summarize(), but it has two problems: I do not know how to save the results as a matrix; I do not know how to get rid of the "Total" row and column on the sides of the table.

    So I think I have the following questions:
    1. transform the data into a contingency-table-like format.
    2. save it as a matrix
    3. use plotmatrix to plot this matrix. However, I am not sure whether plotmatrix is the right command here. How can I make it recognize that the row and column corresponds to the different categories of region and agecat?
    4. Also, I would really appreciate if anybody could tell me whether I could plot multiple matrices and combine them into the same graph using plotmatrix (something like addplot)? I do not see such an extension available in its help file.

    I also considered tabplot. But I feel it occupies too much space on a single page, if I want to show multiple contingency tables together (the many bars take space, while plotmatrix can convey the information using shaded colors). Also, it is based on cross-tabulation while my data is not really based on that (it assumes you want to show the fractions/percent of the third variable within each category defined by two given vars, while what I want to show is the original value of a third variable). However, if plotmatrix does not work, I can also use tabplot. But then I do not know how to transform my data into a cross-tabulation-like format so that tabplot can be executed on it. The value of my variable of interest tempjan is not integers so I cannot expand the data.

    Thank you!
    Last edited by shem shen; 18 Feb 2019, 09:46.

  • #2
    What's plotmatrix?

    Someone might add: what's tabplot? But I can answer that one. tabplot is from the Stata Journal.

    The underlying request is longstanding (FAQ Advice 12.1):

    If you are using community-contributed (also known as user-written) commands, explain that and say where they came from: the Stata Journal, SSC, or other archives. This helps (often crucially) in explaining your precise problem, and it alerts readers to commands that may be interesting or useful to them.

    Here are some examples:
    I am using xtreg in Stata 13.1.
    I am using estout from SSC in Stata 13.1.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      What's plotmatrix?

      Someone might add: what's tabplot? But I can answer that one. tabplot is from the Stata Journal.

      The underlying request is longstanding (FAQ Advice 12.1):

      Thank you Nick for the reminder!

      information on plotmatrix can be found here: https://www.stata.com/meeting/dcconf...9_radyakin.pdf
      tabplot is another command. Both can be installed by "ssc install ***"

      Comment


      • #4
        Not so on tabplot. As I pointed out in #2 tabplot is to be downloaded from the Stata Journal. I really do know about that. I am the author.

        As for plotmatrix I think you are a little confused. ssc desc plotmatrix does indeed show a program -- by Adrian Mander. Adrian Mander isn't Sergiy Radyakin. I don't know whether Sergiy's command is downloadable.

        This rather underscores the point of FAQ Advice 12.1, of making clear which programs you are talking about.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          Not so on tabplot. As I pointed out in #2 tabplot is to be downloaded from the Stata Journal. I really do know about that. I am the author.

          As for plotmatrix I think you are a little confused. ssc desc plotmatrix does indeed show a program -- by Adrian Mander. Adrian Mander isn't Sergiy Radyakin. I don't know whether Sergiy's command is downloadable.

          This rather underscores the point of FAQ Advice 12.1, of making clear which programs you are talking about.
          My apologies! I thought ssc and Stata Journal are similar things.
          Thank you for pointing out my confusion. Yes, I mistake Adrian's command with Sergiy's. What I really meant is Sergiy's plotmatrix. However, as you said, I just did a search in both Stata and Google and could not find where to download.

          It seems that tabplot is now the only command that can do the job because plotmatrix seems unavailable. So my main issue now is how to transform my data set into the format compatible with tabplot. By the way, as the author of tabplot, do you know whether we could fill the bars using different colors (or patterns, which however do not seem possible according to your comment under another thread) according to the value of the row and column variables? For instance, in the thread (https://www.statalist.org/forums/for...ver-age-groups) you generated the graph below:

          Is it possible to make all bars in the leftmost column and all bars in the first row (which means in age group 30-44 and cognlab 3) have the same color and all the remaining bars have another color? (the original codes are copied and pasted below)
          Also, is it possible to reduce the gap between the bars so as to make the graph smaller? I saw we have two options height and barwidth. However, both seem to apply to the size of bar only rather than the space gap in between.
          Thank you very much!

          Code:
          gen age2 = 15 * floor(age/15)
          foreach age in 30 45 60 75 {
              local AGE = `age' + 14
              label define age2 `age' "`age'-`AGE'", modify
          }
          label val age2 age2
          
          catplot cognlab age2, percent(age2) stack recast(bar) asyvars yla(, ang(h)) ///
          legend(order(3 2 1) pos(3) col(1)) ///
          bar(3, bcolor(blue*0.6)) ///
          bar(2, fcolor(blue*0.2) lcolor(blue*0.6)) ///
          bar(1, fcolor(red*0.2) lcolor(red*0.6)) b1title(Age (years)) name(G1, replace)
          
          tabplot cognlab age2, percent(age2) yla(, ang(h)) ///
          showval ///
          xtitle(Age (years)) name(G2, replace) ///
          subtitle(% in age group) ///
          yreverse ///
          separate(cognlab) ///
          bar3(bcolor(blue*0.6)) ///
          bar2(bfcolor(blue*0.2) blcolor(blue*0.6)) ///
          bar1(bfcolor(red*0.2) blcolor(red*0.6))
          Last edited by shem shen; 18 Feb 2019, 10:54.

          Comment


          • #6
            The separate() option allows separation into two groups. This is documented. You used it. The separating variable you want has to be defined in advance.

            But I think heatmap commands are available in Stata from the community. I just don't keep track of them.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              The separate() option allows separation into two groups. This is documented. You used it. The separating variable you want has to be defined in advance.

              But I think heatmap commands are available in Stata from the community. I just don't keep track of them.
              Thank you so much!

              Comment


              • #8
                Originally posted by Nick Cox View Post
                The separate() option allows separation into two groups. This is documented. You used it. The separating variable you want has to be defined in advance.

                But I think heatmap commands are available in Stata from the community. I just don't keep track of them.
                Hi Nick, sorry to bother you again. After reading your article on tabplot, I realized tabplot is indeed more suitable for my purpose. However, I am confused by one of the examples you mentioned there (page 203 of https://ageconsearch.umn.edu/bitstre...art_gr0004.pdf)

                Figure 8: tableplot permits the display of values of a third variable by combinations of two variables. es. Here the division of labor in some Mexican villages between females and males is shown for various tasks.
                I have two quick questions:
                1. So do you mean tabplot can indeed be used to show the value of a third variable according to the values of two other variables? I cannot find such an option in its help file. How did you do that? What is most interesting in that graph is that it contains bars for negative values (e.g. all cells in the bottom row). This is exactly what I need.
                2. I tried to use tabplot to display my data as below by incorporating the third variable in [iw] (it contains negative values so fw or aw are not applicable). It works but there is one problem: if I increase the value of the last observation from 161.5 to 261.5 and generate a new plot, the height of that bar is still the same. However, I want it to increase to reflect the proportionate rise in tempjan from 161 to 261. So it seems [iw] cannot do the job. Do you have any suggestion?
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input int region float(agecat tempjan)
                1 1  -23.16087
                1 2  28.99518
                1 3 31.462856
                2 1 18.091358
                2 2  -25.40435
                2 3  29.77333
                3 1  -38.58561
                3 2  50.13235
                3 3  64.27907
                4 1  40.23812
                4 2  54.53425
                4 3  161.50435
                end
                label values region region
                label def region 1 "NE", modify
                label def region 2 "N Cntrl", modify
                label def region 3 "South", modify
                label def region 4 "West", modify
                label values agecat agecat
                label def agecat 1 "19-29", modify
                label def agecat 2 "30-34", modify
                label def agecat 3 "35+", modify
                
                tabplot region agecat [iw=tempjan]

                Comment


                • #9
                  The main results of

                  Code:
                  . search tabplot, sj
                  are

                  SJ-17-3 gr0066_1 . . . . . . . . . . . . . . . . Software update for tabplot
                  (help tabplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
                  Q3/17 SJ 17(3):779
                  added options for reversing axis scales; improved handling of
                  axis labels containing quotation marks

                  SJ-16-2 gr0066 . . . . . . Speaking Stata: Multiple bar charts in table form
                  (help tabplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
                  Q2/16 SJ 16(2):491--510
                  provides multiple bar charts in table form representing
                  contingency tables for one, two, or three categorical variables

                  SJ-12-3 gr0053 . Speaking Stata: Axis practice, or what goes where on a graph
                  (help multqplot if installed) . . . . . . . . . . . . . . . N. J. Cox
                  Q3/12 SJ 12(3):549--561
                  discusses variations on what goes on each axis of a two-way
                  plot; provides multiple quantile plots

                  SJ-4-2 gr0004 . Speaking Stata: Graphing categorical and compositional data
                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
                  Q2/04 SJ 4(2):190--215 (no commands)
                  discusses graphical possibilities for categorical and
                  compositional data


                  You're citing the 2004 article from a .pdf posted in unauthorised and discourteous manner by someone at the University of Minnesota; an abuse not now very important since the .pdf has long since been available from https://www.stata-journal.com/sjpdf....iclenum=gr0004

                  Either way, the 2004 article was superseded by the 2016 and 2017 pieces cited above as far as tabplot is concerned.

                  You cite a mention of tableplot. tableplot is a quite different command and no longer developed by me.

                  Code:
                  . ssc type tableplot.ado
                  *! 1.0.7 NJC 22 January 2007
                  Your question #1 is cancelled by your question #2 in which you do the very thing you wonder is possible in #1. Further, explanation of the point is included in the help file. If you can't see this, then possibly you installed the wrong version.

                  A recipe for subverting tabplot to plot any variable that takes on a single value for each
                  cross-combination of categories is illustrated in the examples below. The key is to select precisely
                  one observation for each cross-combination and to specify that variable as (most generally) an
                  iweight.

                  Furthermore, using an iweight is the only possible method whenever a variable has at least some
                  negative values. In that case, you might do the following:

                  1. Consider changing the maximum height through height() to avoid overlap of bars variously
                  representing positive and negative values. By default, tabplot chooses the scale to accommodate
                  the longest bar to be shown, but it contains no special intelligence otherwise to avoid overlap of
                  bars in the same column or row.

                  2. If also using showval or showval(), consider changing offset() and using a transparent
                  bfcolor().
                  As for your example, if you change the largest value and it remains the largest value, then necessarily it still corresponds to the longest (tallest) bar. I think you'll find that the other bars have shrunk correspondingly.
                  Last edited by Nick Cox; 19 Feb 2019, 02:00.

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    The main results of

                    Code:
                    . search tabplot, sj
                    are

                    SJ-17-3 gr0066_1 . . . . . . . . . . . . . . . . Software update for tabplot
                    (help tabplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
                    Q3/17 SJ 17(3):779
                    added options for reversing axis scales; improved handling of
                    axis labels containing quotation marks

                    SJ-16-2 gr0066 . . . . . . Speaking Stata: Multiple bar charts in table form
                    (help tabplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
                    Q2/16 SJ 16(2):491--510
                    provides multiple bar charts in table form representing
                    contingency tables for one, two, or three categorical variables

                    SJ-12-3 gr0053 . Speaking Stata: Axis practice, or what goes where on a graph
                    (help multqplot if installed) . . . . . . . . . . . . . . . N. J. Cox
                    Q3/12 SJ 12(3):549--561
                    discusses variations on what goes on each axis of a two-way
                    plot; provides multiple quantile plots

                    SJ-4-2 gr0004 . Speaking Stata: Graphing categorical and compositional data
                    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
                    Q2/04 SJ 4(2):190--215 (no commands)
                    discusses graphical possibilities for categorical and
                    compositional data


                    You're citing the 2004 article from a .pdf posted in unauthorised and discourteous manner by someone at the University of Minnesota; an abuse not now very important since the .pdf has long since been available from https://www.stata-journal.com/sjpdf....iclenum=gr0004

                    Either way, the 2004 article was superseded by the 2016 and 2017 pieces cited above as far as tabplot is concerned.

                    You cite a mention of tableplot. tableplot is a quite different command and no longer developed by me.

                    Code:
                    . ssc type tableplot.ado
                    *! 1.0.7 NJC 22 January 2007
                    Your question #1 is cancelled by your question #2 in which you do the very thing you wonder is possible in #1. Further, explanation of the point is included in the help file. If you can't see this, then possibly you installed the wrong version.



                    As for your example, if you change the largest value and it remains the largest value, then necessarily it still corresponds to the longest (tallest) bar. I think you'll find that the other bars have shrunk correspondingly.
                    Hi Nick,
                    Thank you very much! May I ask you one more question about tabplot? I have a plot as below.
                    1. Is it possible to insert a textbox in the lower-right blank space?
                    2. Is it possible to rearrange the position of the three plots? For instance, how can I put all the three plots in the same row or column, instead of two on top and one at bottom, which looks kind of awkward?
                    Thank you! I really like the tabplot command. Very informative and beautiful.

                    My current command is:
                    Code:
                    #delimit ;
                    tabplot Wp Hp [iw=impact], by(imptype, note("",place(ne)))
                    subtitle($lab, fcolor(none) size(*.9)) showval(impact, format(%2.1f) offset(.15)) height(.6)
                    xtitle("Husbands' earnings categories",size(*.9)) ytitle("Wives' earnings categories",size(*.9)) yla(, labsize(medsmall))
                    separate(imptype) bar1(bfcolor(red*0.2) blcolor(red)) bar2(bfcolor(blue*0.2) blcolor(blue)) bar3(bfcolor(green*0.2) blcolor(green))
                    scheme(s1mono)
                    ;
                    #delimit cr
                    Click image for larger version

Name:	figure2.png
Views:	1
Size:	121.9 KB
ID:	1486309

                    Comment


                    • #11
                      1. Is it possible to insert a textbox in the lower-right blank space?
                      A job for the Graph Editor.

                      2. Is it possible to rearrange the position of the three plots? For instance, how can I put all the three plots in the same row or column, instead of two on top and one at bottom, which looks kind of awkward?
                      See

                      Code:
                      help by option
                      for suboptions like row() and col() which you can use within tabplot.

                      Here is a simple example:

                      Code:
                      sysuse auto, clear 
                      tabplot rep78, by(foreign, col(1))

                      Comment


                      • #12
                        Thank you so much!

                        Comment

                        Working...
                        X