Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • designplot now available from SSC (something also for fans of descriptive tables)

    Thanks as usual to Kit Baum, a new package designplot is now available from SSC. Stata 8.2 is required.

    The name of the program may mean little or nothing to people. What's a design plot? The problem bites backwards more than forwards. Sometimes simple plots don't really need names in your papers and presentations: you just write or say "plotting something versus something else". It's almost an accident if a plot has a standard name (histogram, scatter plot, box plot) and often standard names are less than standard (what's a dot plot, and do you call it something else?). But a programmer writing a program must give it a name, and I chose designplot because "design plot" is a name in the literature. However, the design plots in the literature don't bear very much resemblance to the results of designplot.

    But let's curtail that dogged discussion (there's more in the help for those so inclined).

    Here's an example straight away:

    Code:
    sysuse auto
    set scheme s1color 
    designplot mpg foreign rep78




    The main idea is

    1. You name a response and at least one predictor.

    2. The graph shows summarize results for the response given the distinct levels of the predictors and their cross-combinations.

    3. The default is just the mean, but one or more results can be shown.

    4. If you name (say) two predictors, you get the zero-way breakdown (no breakdown at all), both one-way breakdowns for each predictor and the two-way breakdown for both predictors combined. (You are asked to swallow the non-standard term "zero-way" as a modest extension of standard terminology.)

    5. You can get less than #4 by restricting, e.g., to just the one-way breakdowns, or at most the one-way breakdowns.

    6. graph dot is used by default, but you can invoke graph hbar (which often works well) or graph bar (which less often works well).

    7. You can save the results graphed as a new dataset. This may help in tabulation or in preparing a new graph.

    This works somewhat like the existing (and apparently rather neglected) grmeanby command and also a lot like graph dot used directly. But there are different twists. Otherwise the command would be pointless.

    #7 is different over either. The scope for multiscale breakdowns is new over either. grmeanby is restricted to means or medians (although any competent user-programmer could clone it quickly to do otherwise).

    Here is another simple example. We will look at means and medians, sort within groups on means, add variable labels and restrict scope to zero- and one-way breakdowns.

    Code:
    designplot mpg foreign rep78, stat(median mean) variablelabels maxway(1) entryopts(sort(2) descending)


    I would want to use the Graph Editor to tweak that, notably to tweak "Repair Record 1978" to two lines to take up less space, but that's always the sort of detail you want to improve.

    Here is a variant on a common problem often tackled with tables. People are often interested in seeing various univariate breakdowns of frequencies for categorical variables. (To get percents, save the results as a dataset, do a simple calculation and call up graph again.)

    Code:
     
    designplot mpg foreign rep78 if !missing(foreign,rep78), stat(count) recast(hbar) blabel(total) yla(none) t1title("frequencies")  variablelabels ytitle("") ysc(r(0 72))


    One more example, assuming you're still reading. Looking at (one version of) the Titanic data, the focus is in variations of fraction survived as a response to age, sex, class and their interactions. The code is in the help file.



    This kind of graph can be useful for description or exploration and perhaps even give you ideas about whether your models need interaction terms.











    Attached Files

  • #2
    Thanks to Kit Baum as usual, this program has been updated on SSC. The update mostly concerns an amplified help file.

    Comment


    • #3
      Thanks for that, Nick- it looks exceedingly useful. I always feel like to eat up too much time trying to kludge together plots to see what I want to see when doing EDA.
      __________________________________________________ __
      Assistant Professor, Department of Biostatistics and Epidemiology
      School of Public Health and Health Sciences
      University of Massachusetts- Amherst

      Comment


      • #4
        Although it's not mentioned in the Abstract, designplot was mentioned in my talk at the recent Boston meeting. Files are accessible from http://www.stata.com/meeting/boston14/abstracts/

        Comment


        • #5
          Written up at http://www.stata-journal.com/article...article=gr0061

          Comment


          • #6
            Originally posted by Nick Cox View Post
            Thanks as usual to Kit Baum, a new package designplot is now available from SSC. Stata 8.2 is required.

            The name of the program may mean little or nothing to people. What's a design plot? The problem bites backwards more than forwards. Sometimes simple plots don't really need names in your papers and presentations: you just write or say "plotting something versus something else". It's almost an accident if a plot has a standard name (histogram, scatter plot, box plot) and often standard names are less than standard (what's a dot plot, and do you call it something else?). But a programmer writing a program must give it a name, and I chose designplot because "design plot" is a name in the literature. However, the design plots in the literature don't bear very much resemblance to the results of designplot.

            But let's curtail that dogged discussion (there's more in the help for those so inclined).

            Here's an example straight away:

            Code:
            sysuse auto
            set scheme s1color
            designplot mpg foreign rep78




            The main idea is

            1. You name a response and at least one predictor.

            2. The graph shows summarize results for the response given the distinct levels of the predictors and their cross-combinations.

            3. The default is just the mean, but one or more results can be shown.

            4. If you name (say) two predictors, you get the zero-way breakdown (no breakdown at all), both one-way breakdowns for each predictor and the two-way breakdown for both predictors combined. (You are asked to swallow the non-standard term "zero-way" as a modest extension of standard terminology.)

            5. You can get less than #4 by restricting, e.g., to just the one-way breakdowns, or at most the one-way breakdowns.

            6. graph dot is used by default, but you can invoke graph hbar (which often works well) or graph bar (which less often works well).

            7. You can save the results graphed as a new dataset. This may help in tabulation or in preparing a new graph.

            This works somewhat like the existing (and apparently rather neglected) grmeanby command and also a lot like graph dot used directly. But there are different twists. Otherwise the command would be pointless.

            #7 is different over either. The scope for multiscale breakdowns is new over either. grmeanby is restricted to means or medians (although any competent user-programmer could clone it quickly to do otherwise).

            Here is another simple example. We will look at means and medians, sort within groups on means, add variable labels and restrict scope to zero- and one-way breakdowns.

            Code:
            designplot mpg foreign rep78, stat(median mean) variablelabels maxway(1) entryopts(sort(2) descending)


            I would want to use the Graph Editor to tweak that, notably to tweak "Repair Record 1978" to two lines to take up less space, but that's always the sort of detail you want to improve.

            Here is a variant on a common problem often tackled with tables. People are often interested in seeing various univariate breakdowns of frequencies for categorical variables. (To get percents, save the results as a dataset, do a simple calculation and call up graph again.)

            Code:
            designplot mpg foreign rep78 if !missing(foreign,rep78), stat(count) recast(hbar) blabel(total) yla(none) t1title("frequencies") variablelabels ytitle("") ysc(r(0 72))


            One more example, assuming you're still reading. Looking at (one version of) the Titanic data, the focus is in variations of fraction survived as a response to age, sex, class and their interactions. The code is in the help file.



            This kind of graph can be useful for description or exploration and perhaps even give you ideas about whether your models need interaction terms.










            This is really an excellent program in concept. It could have been more useful if the variable combinations (that appears under all the main ones) were removable as they altogether end up making one huge variable and make the graph unreadable (Age1, Age2.......... Ag10, Sex1, Sex2, Age1Sex1, Age1Sex2......Age10Sex2)
            Click image for larger version

Name:	Screenshot 2020-09-24 220936.jpg
Views:	1
Size:	732.9 KB
ID:	1574167

            Comment


            • #7
              Thanks for #6. In British English, at least, and perhaps more widely, it's proverbial that you can't fit a quart into a pint pot: https://www.collinsdictionary.com/di...nto-a-pint-pot (for the information of people blessed with purely metric systems,of units: one quart = 2 pints). It's tacit in #1 that this will work well with up to about 30 entries, beyond that you will struggle, although changing the graph size can help.

              Beyond that I can tell that you are using the oldest version, as the ? in the left margin are a side-effect of Stata's support for Unicode, which broke a trick used in the original version. A search reveals that the latest version should be downloaded from the files for Stata Journal 19(3):

              -----------------------------------------------------------------------------------------------------------------------
              search for designplot (manual: [R] search)
              -----------------------------------------------------------------------------------------------------------------------

              Search of official help files, FAQs, Examples, and Stata Journals

              SJ-19-3 gr0061_3 . . . . . . . . . . . . . . . Software update for designplot
              (help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
              Q3/19 SJ 19(3):748--751
              any attempt to use the missing option of graph dot,
              graph hbar, or graph bar is now ignored and advice on
              what to do instead is shown

              SJ-17-3 gr0061_2 . . . . . . . . . . . . . . . Software update for designplot
              (help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
              Q3/17 SJ 17(3):779
              help file updated

              SJ-15-2 gr0061_1 . . . . . . . . . . . . . . . Software update for designplot
              (help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
              Q2/15 SJ 15(2):605--606
              bug fixed for Stata 14

              SJ-14-4 gr0061 Design plots for graphical summary of a response given factors
              (help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
              Q4/14 SJ 14(4):975--990
              produces a graphical summary of a numeric response variable
              given one or more factors

              [...]

              designplot from http://fmwww.bc.edu/RePEc/bocode/d
              'DESIGNPLOT': module to produce a graphical summary of response given one
              or more factors / designplot produces a graphical summary of a numeric
              response / variable given one or more "factors", "factor" here meaning any
              / numeric or string variable treated in terms of its distinct / levels in


              To help more, I need to see minimally the command you issued and ideally an equivalent data example.

              Comment


              • #8
                Originally posted by Nick Cox View Post
                Thanks for #6. In British English, at least, and perhaps more widely, it's proverbial that you can't fit a quart into a pint pot: https://www.collinsdictionary.com/di...nto-a-pint-pot (for the information of people blessed with purely metric systems,of units: one quart = 2 pints). It's tacit in #1 that this will work well with up to about 30 entries, beyond that you will struggle, although changing the graph size can help.

                Beyond that I can tell that you are using the oldest version, as the ? in the left margin are a side-effect of Stata's support for Unicode, which broke a trick used in the original version. A search reveals that the latest version should be downloaded from the files for Stata Journal 19(3):

                -----------------------------------------------------------------------------------------------------------------------
                search for designplot (manual: [R] search)
                -----------------------------------------------------------------------------------------------------------------------

                Search of official help files, FAQs, Examples, and Stata Journals

                SJ-19-3 gr0061_3 . . . . . . . . . . . . . . . Software update for designplot
                (help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
                Q3/19 SJ 19(3):748--751
                any attempt to use the missing option of graph dot,
                graph hbar, or graph bar is now ignored and advice on
                what to do instead is shown

                SJ-17-3 gr0061_2 . . . . . . . . . . . . . . . Software update for designplot
                (help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
                Q3/17 SJ 17(3):779
                help file updated

                SJ-15-2 gr0061_1 . . . . . . . . . . . . . . . Software update for designplot
                (help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
                Q2/15 SJ 15(2):605--606
                bug fixed for Stata 14

                SJ-14-4 gr0061 Design plots for graphical summary of a response given factors
                (help designplot if installed) . . . . . . . . . . . . . . N. J. Cox
                Q4/14 SJ 14(4):975--990
                produces a graphical summary of a numeric response variable
                given one or more factors

                [...]

                designplot from http://fmwww.bc.edu/RePEc/bocode/d
                'DESIGNPLOT': module to produce a graphical summary of response given one
                or more factors / designplot produces a graphical summary of a numeric
                response / variable given one or more "factors", "factor" here meaning any
                / numeric or string variable treated in terms of its distinct / levels in


                To help more, I need to see minimally the command you issued and ideally an equivalent data example.
                Thank you for the updated version.
                Here is the code designplot
                Code:
                designplot Age - DivorceEver ,  recast(hbar)

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte(Age Sex Education MarriedEver EthnicMinority DivorceEver)
                2 1 5 2  2 2
                2 2 6 2  2 2
                6 1 2 2  2 1
                3 2 2 1  2 2
                8 2 1 2  2 2
                7 1 6 1  2 2
                6 2 6 1  2 2
                6 1 7 1  2 1
                6 2 3 1  2 2
                5 2 4 1  2 2
                1 2 4 2  2 2
                7 1 1 1  2 2
                5 2 3 1  2 2
                5 1 7 2  2 2
                5 1 7 1  2 2
                5 1 7 1  2 2
                4 1 3 1  2 1
                2 1 4 2  2 2
                3 1 4 1  2 2
                5 2 4 1  2 2
                2 2 4 1  2 2
                3 2 4 1  2 2
                3 1 3 1  2 1
                3 2 5 2  2 2
                3 2 5 2  2 2
                4 1 3 2  2 2
                3 1 5 1  2 1
                2 2 4 2  2 2
                4 1 7 1  2 1
                3 1 4 2  2 2
                3 1 2 1  2 2
                6 2 3 1  2 2
                4 1 6 2  2 2
                6 2 6 1  1 2
                5 2 2 1  1 2
                5 2 2 1  2 2
                3 1 6 1  2 2
                5 1 2 1  2 2
                7 1 3 1  2 2
                1 1 2 2  2 2
                2 1 6 2  2 2
                7 1 1 1  2 2
                1 1 2 2  2 2
                6 2 5 2  1 2
                6 1 4 1  2 2
                3 1 3 2  2 2
                3 2 6 1  2 1
                8 2 1 1  2 2
                4 1 3 1  2 2
                2 2 3 2  2 2
                8 1 3 1  2 1
                5 1 6 1  2 1
                4 2 5 1  2 2
                2 2 5 2  2 2
                3 2 2 1  1 2
                7 1 5 2  2 2
                3 2 3 2  2 2
                7 1 7 1  2 1
                2 2 2 2  2 2
                3 2 1 2  1 2
                3 2 7 1  2 2
                7 1 4 1  2 1
                5 2 7 1  2 2
                6 2 1 1  2 2
                4 1 2 1  2 1
                2 2 2 2  2 2
                7 2 4 1  2 2
                3 1 3 2  2 2
                3 2 4 2  1 2
                8 1 1 1  2 2
                8 2 7 2  2 2
                3 1 2 1  2 2
                4 2 4 1  2 2
                6 2 4 1  1 1
                3 2 5 1  2 2
                1 2 2 2  2 2
                4 2 7 1  2 1
                3 2 7 1  2 2
                5 2 2 1  2 1
                7 2 1 1  2 1
                5 2 2 1  2 2
                3 1 3 1  2 2
                4 1 5 1  1 1
                6 2 2 1  2 2
                5 2 7 1  2 2
                4 2 1 1  2 2
                6 1 3 1  2 2
                7 1 2 1  1 1
                3 1 2 1  2 2
                7 1 4 1  2 1
                4 2 7 2  2 2
                4 1 3 2  2 2
                6 1 3 2  2 2
                2 1 1 2  2 2
                5 2 5 1  2 2
                2 2 6 2 .c 2
                4 2 4 1  2 2
                7 2 5 2  2 2
                5 1 2 1  2 2
                2 2 4 2  2 2
                end
                Last edited by Sonnen Blume; 25 Sep 2020, 07:49.

                Comment


                • #9
                  Your syntax choice makes Age the response variable and the others predictors and plots means for many combinations of predictors. I am not a social scientist, but that doesn't seem to me to be a good idea.

                  Where you have a mix of variables -- some perhaps outcomes, the others predictors or either -- the help that designplot can give is as a kind of data overview that cuts out the need for multiple little tables or graphs. The trick here is to feed designplot a constant variable as outcome and then to ignore it.

                  Although the code here is fairly short, you may need a certain amount of fooling around before you get something you really like.

                  Code:
                   
                  gen one = 1
                  set scheme s1color
                  designplot one Age-DivorceEver , bar(1, blcolor(blue) bfcolor(blue*0.2)) stat(count) min(1) max(1) recast(hbar) variablenames t1title("")  blabel(total) ysc(alt) entryopts(label(labsize(small)))
                  Click image for larger version

Name:	designplot2.png
Views:	1
Size:	29.2 KB
ID:	1574256


                  Notes:

                  You can show variable labels if you want, or otherwise improve the text explaining categories.

                  Given the bar labels, the axis labels and ticks may seem redundant, or conversely.

                  The trade-off between better explanations of categories and keeping the display uncluttered is obvious in principle and hard to optimise in practice.
                  Last edited by Nick Cox; 25 Sep 2020, 08:53.

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    Your syntax choice makes Age the response variable and the others predictors and plots means for many combinations of predictors. I am not a social scientist, but that doesn't seem to me to be a good idea.

                    Where you have a mix of variables -- some perhaps outcomes, the others predictors or either -- the help that designplot can give is as a kind of data overview that cuts out the need for multiple little tables or graphs. The trick here is to feed designplot a constant variable as outcome and then to ignore it.

                    Although the code here is fairly short, you may need a certain amount of fooling around before you get something you really like.

                    Code:
                    gen one = 1
                    set scheme s1color
                    designplot one Age-DivorceEver , bar(1, blcolor(blue) bfcolor(blue*0.2)) stat(count) min(1) max(1) recast(hbar) variablenames t1title("") blabel(total) ysc(alt) entryopts(label(labsize(small)))
                    [ATTACH=CONFIG]n1574256[/ATTACH]

                    Notes:

                    You can show variable labels if you want, or otherwise improve the text explaining categories.

                    Given the bar labels, the axis labels and ticks may seem redundant, or conversely.

                    The trade-off between better explanations of categories and keeping the display uncluttered is obvious in principle and hard to optimise in practice.
                    This is wonderful. I didn't realise the first on the list is treated as a response variable.

                    So the
                    Code:
                    gen one=1
                    is doing the trick. Could you please tell a bit about how this works. My goal is to show the percentages instead of count (because some bars can look very long relative to others). In a previous thread, you gave a solution to that (https://www.statalist.org/forums/for...-on-designbars). This time it solves the cluttering issue, but removes the percent option.
                    Please give a reference to use of
                    Code:
                     gen one = 1
                    and
                    Code:
                    gen percent = 100/r(N)
                    commands, if available.

                    Thank you.

                    Comment


                    • #11
                      Creating a response and then ignoring it is exemplified but not trumpeted in the 2014 paper.

                      Initialising 100 / sample size is documented in the latest (2019) public version of the help in response to a previous question by ... Sonnen Blume.

                      Comment

                      Working...
                      X