Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Uncombining data points when graphing longitudinal data with profileplot

    Hi there,

    I'm using the "profileplot" command to plot sales data (public expenditure, GBP per 1000 population) over time (from 2013 to 2018) for 31 countries.

    My code thus far is:
    profileplot cost_13_pop cost_14_pop cost_15_pop cost_16_pop cost_17_pop cost_18_pop, ///
    by(Country2) graphregion(fcolor(white)) xtitle(" ") ///
    xlabel(1 "2013" 2 "2014" 3 "2015" 4 "2016" 5 "2017" 6 "2018") msymbol(i) ///
    legend(cols(1) pos(2) size(vsmall)) ///
    ytitle("Public expenditure" "(£ per 1000 population)" " ")


    which produces this graph:
    Click image for larger version

Name:	expend_pop_time.tif
Views:	1
Size:	34.3 KB
ID:	1555677




    I have four questions I require help with:
    1. the graph seems to combine the 31 countries into 15 - is there a way around this or is 15 the max number the graph can handle?
    2. can "mean" be removed from the legend?
    3. can "Variables" be removed from the x-axis? I've tried using xtitle(" ") but this didn't work
    4. how can I order the countries in the legend by descending (i.e. Ireland is the top line so I want this to come first in the legend rather than alphabetical order)

    Thanks for your time and advice,
    Georgia

  • #2
    profileplot is a community-contributed command from https://stats.idre.ucla.edu/stat/stata/ado/analysis (as you are asked to explain: https://www.statalist.org/forums/help#stata 12.1).

    It's just a wrapper for xtline, overlay, which you might just as well use directly. However, you are holding panel or longitudinal data in wide form, which isn't a good idea for most Stata purposes. I will come back to that later in this post.

    What you are seeing is just a general default that Stata has for the number of legend elements.

    You don't give a data example, but anyone can run this script and see the same problem:

    Code:
    clear
    set obs 183
    set seed 2803
    egen year = seq(), from(2013) to(2018)
    egen id = seq(), block(6)
    gen y = exp(rnormal(0, 1))
    xtset id year
    xtline y, overlay
    Frankly, your graph is likely to strike many readers as a mess. The problem is not in goodwill, but in people's sheer inability to untangle spaghetti.

    If you worked harder at different line patterns as well as colours, the legend would be about twice as large, which is not the way to go.

    I would use logarithmic scale for such data, assuming that there aren't any zeros.

    https://www.stata-journal.com/articl...article=gr0080 covers some (but by no means all!) of what can be said here constructively. It's currently behind a paywall, but I imagine that the author would be willing to send you a copy if you send an email. (In olden days, people used to send little postcards to authors asking for paper reprints.)

    I would consider 6 panels with about 5 countries each.

    OR

    using a "front-and-back plot" as discussed at https://www.statalist.org/forums/for...ailable-on-ssc That's quite a long thread, but anyone can skim and skip through it.

    Your dataset isn't large. You could post it here using

    Code:
    help dataex
    
    dataex cost_13_pop cost_14_pop cost_15_pop cost_16_pop cost_17_pop cost_18_pop Country2
    If the call to dataex fails, then

    1. You are using a version of Stata earlier than the present, and it's asked that you say so (https://www.statalist.org/forums/help#version)

    2. You should just install dataex using ssc install dataex (https://www.statalist.org/forums/help#stata)

    That done, however, a reshape long is a much better idea for these data, using say

    Code:
    reshape long cost_ , i(Country2) j(year) string
    replace year = subinstr(year, "_pop", "", .)
    destring year, replace
    replace year = year + 2000
    However, you may well have other variables, in which case you may need more detailed advice.

    NOTE: I haven't looked hard at the code, but showing a mean is a default for profileplot.






    Last edited by Nick Cox; 28 May 2020, 04:13.

    Comment


    • #3
      Thanks Nick, this was super helpful.
      I've reshaped the data to long form and used the subsetplot command.
      My figure is as follows:
      Click image for larger version

Name:	pub_expenditure.tif
Views:	1
Size:	51.8 KB
ID:	1555877


      Thanks again!

      Comment


      • #4
        Thanks for coming back with a report. What's the story with Ireland?

        subsetplot (SSC) is considered superseded by fabplot (SSC).

        The suggestion to use logarithmic scale remains, to which I add that alphabetical order is rarely best.

        Comment

        Working...
        X