Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • transplot package downloadable from SSC

    Thanks as always to Kit Baum, a new transplot package is now downloadable from SSC using

    Code:
    ssc install transplot
    Stata 8 is the minimum requirement. That said, I have not tested this in Stata 8 but I imagine that people may shout out if there are problems running it in any old version of Stata.

    transplot is to draw plots trying out transformations. It grows out of a long-term personal interest in transformations, an odd topic in that statistically experienced people seem to vary greatly in their willingness to transform.

    That said, often a researcher just knows from experience or theory that a particular transformation (including here link function) should make sense.

    But sometimes you need to try out a transformation before you can be sure that it is a good idea -- or see that it is useless, or changes matters so little that it is pointless -- or even to spot that it is a bad idea.

    The immediate stimulus for this command came when I focused on why I never (well, hardly ever) use any of the official commands ladder, gladder, qladder. What I wanted instead was typically a focused comparison of what the data look like on the original scale and using one or at most a few different transformed scales. Most commonly, the question can just be: is taking logarithms a good idea?

    I gave a talk centred on transplot at the London Stata conference in September 2019 but did not release the code (or a help file, which I had not even written at the time). Marination has allowed some further modest extensions of functionality.

    The slides are accessible at https://www.stata.com/meeting/uk19/slides/uk19_cox.pptx

    The help file is fairly detailed so a few examples of the command at work should be enough for now.

    First, the command is used in one-way mode with named (a) distribution plotting command (b) variables (c) transformations (@ is symbol for a variable on original form). The unsurprising big picture here is that each variable shown is strongly positively skewed and would be easier to work with when logged.


    Code:
    set scheme s1color 
    webuse grunfeld, clear
    transplot qnorm invest mvalue kstock, trans(@ log10) ms(Oh) combine(colfirst)
    Click image for larger version

Name:	transplot_qnorm.png
Views:	2
Size:	59.7 KB
ID:	1561838



    Second, an example in which we play with logarithmic and reciprocal versions of a response variable:

    Code:
    sysuse auto, clear
    transplot scatter mpg weight, ytrans(@ log10 100/@) ms(Oh)
    Click image for larger version

Name:	transplot_scatter.png
Views:	1
Size:	51.1 KB
ID:	1561839

    Third you can try transforming the predictor too:

    Code:
    transplot scatter mpg weight, ytrans(@ log10 100/@) xtrans(@ log10) ms(Oh)  combine(colfirst)
    Click image for larger version

Name:	transplot_scatter2.png
Views:	1
Size:	76.0 KB
ID:	1561840
    Attached Files

  • #2
    Here's another example. This repeats themes from above -- transformations might help and plotting with respect to some reference distribution might help too -- but adds the idea of comparing groups.

    Do foreign and domestic cars in the auto data vary in mpg? Here you need qplot from the Stata Journal as well as to download transplot from SSC.


    Code:
    sysuse auto, clear
    set scheme s1color 
    transplot qplot mpg, over(foreign) trans(@ sqrt log 1000/@) scheme(s1color) legend(pos(11) ring(0) order(2 1) col(1)) trscale(invnormal(@)) xtitle(standard normal deviate)


    Click image for larger version

Name:	transplot_twogroup.png .png
Views:	1
Size:	50.0 KB
ID:	1564148


    Here a commentary might run: normal quantile plots do show that foreign cars have higher mpg than domestic, but the comparison is more complicated than an additive shift, so is it (for example) multiplicative rather than additive? A logarithmic transformation does help -- noting along the way that a root transformation does not help much, so forget about it -- but we might as well as keep going and use reciprocals. In fact it's a standard comment that gallons per so many miles (or litres per so many km!) is as or more natural a scale as the original. Evidently reciprocals flip the groups around so that domestic cars plot higher than foreign in the last panel -- being more inefficient.

    An implication of the first panel is that a t test oversimplifies!

    Although this example ends up as one line of code that does the Hogwarts stuff, I always find myself fooling around and building up to it bit by bit -- and sometimes digressing or deviating along the way. So. it's more typical that several intermediate steps end up on the cutting room floor.

    Comment


    • #3
      Dear Nick,

      Thank you for this additional example, being able to (better) compare groups is important for practical purposes.
      When I compare the first panel (top left) with the fourth panel (bottom right) my (possibly) naiev interpretation is that the reciprocal transformation should provide a quantile regression model result with coefficients that are more close between quantiles.

      Indeed, comparing the result of :
      Code:
      sysuse auto, clear
      gen mpg1000 = 1000/mpg
      sqreg mpg foreign , quantile(.10 .25 .5 .75 .90)
      *{results omitted}
      with the result of:
      Code:
      sqreg mpg1000 foreign , quantile(.10 .25 .5 .75 .90)
      *{results omitted}
      shows that the difference between the coefficients, for example, of q50 and q10 as well as of q50 and q90 is reduced (from 3 to 1,07 and from 3 to 1.65).

      Of course this is a 'toy' example, so I cannot assume that this finding should replicate with similar data, but, I suppose the objective of using transplot is to investigate if such an analytical improvement presents itself, or not.
      http://publicationslist.org/eric.melse

      Comment


      • #4
        Indeed, the spirit of transplot is entirely descriptive or exploratory. What formal inferences follow - in terms of say tests or model fits -- is a different question and I would not dream of including any such in the code.

        I would want to stress the critical role of
        transplot in underlining which tests make most sense, or how far tests make sense. The most common pitfall I've seen is comparing two or more means without being clear that additive shift is the main story. I've not seen any text discourage graphics before or alongside such tests, but my impression is that most encourage drawing histograms or box plots, variously inefficient or even irrelevant for consideration of means and variation around them.

        Comment

        Working...
        X