Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphing observations over time

    Hi,

    Sorry if I'm being stupid here (I am a bit of a Stata novice!), but looking for a way to graph number of observations based on a date variable, and filtering using other variables. In particular I want to see the cumulative number of observations over a period of days (of the order of 50 days, with each observation only recorded as a date), with a single point plotted on a graph for each of those days. I also need to be able to filter this data based on other variables at will.

    The former I haven't had a problem with as I can create temporary cumulative variables and graph these, but this is a bit of a workaround, and I don't particularly fancy coding this separately for each of the combinations of filters I might wish to apply.

    For example (and not what I am working on):

    Lets say my observations are purchases of tins of paint. Each has a date associated with the purchase, and is a tin of a specified colour and a specified size. I want to be able to graph the cumulative number of purchases of paint tins over time (with date as my x axis, cumulative number of tins as my y axis), but I also want to be able to easily filter this to look at the same graph for individual colours or size of tins. In my own work the number of combinations of 'colours' or 'sizes' of tins is well into four digits, so coding each separately is far from ideal. In an ideal world these is a single line of code that I can just add 'if colour == XXX & size == YYY' to at will.

    Am I just being stupid?

    Thanks!
    I

  • #2
    Welcome to the Stata Forum / Statalist.

    I'm not sure if I fully understand your query. Indeed, the best approach - as recommended in the FAQ - is presenting data as an workable example. You may use just a toy example, if you will. Please make sure to share it under CODE delimiters or by installing the SSC - dataex - also recommended in the FAQ.

    These comments being made, and considering there is no example with data to work on, I gather you may use time-series analysis for that. For examples - tsset - followed by - tsine - , perhaps with "if condition" as you wish, may do the trick for you.
    Best regards,

    Marcos

    Comment


    • #3
      Hi,

      Example data as described above:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float id str5 colour str7 size float purchasedate
        1 "Blue"  "Large"   21058
        2 "Blue"  "X-Large" 21066
        3 "Red"   "Small"   21054
        4 "Red"   "Medium"  21061
        5 "White" "Medium"  21062
        6 "Blue"  "Large"   21066
        7 "Red"   "Small"   21063
        8 "Blue"  "Large"   21065
        9 "Green" "X-Large" 21061
       10 "White" "Small"   21058
       11 "Red"   "Small"   21070
       12 "White" "Medium"  21058
       13 "Green" "X-Large" 21059
       14 "Blue"  "Large"   21057
       15 "White" "Medium"  21059
       16 "Blue"  "X-Large" 21060
       17 "Red"   "X-Large" 21064
       18 "White" "X-Large" 21057
       19 "Green" "X-Large" 21063
       20 "Green" "Small"   21061
       21 "White" "Large"   21057
       22 "Blue"  "Medium"  21057
       23 "Red"   "Large"   21059
       24 "Blue"  "Medium"  21056
       25 "White" "X-Large" 21067
       26 "Green" "Small"   21055
       27 "Red"   "Large"   21064
       28 "Green" "Large"   21061
       29 "Blue"  "Small"   21061
       30 "Green" "X-Large" 21064
       31 "White" "Medium"  21059
       32 "Red"   "Large"   21057
       33 "Blue"  "Medium"  21060
       34 "White" "X-Large" 21059
       35 "White" "Medium"  21054
       36 "Blue"  "Large"   21060
       37 "Red"   "Medium"  21060
       38 "Green" "Large"   21060
       39 "White" "Medium"  21055
       40 "Green" "X-Large" 21057
       41 "White" "Large"   21058
       42 "Blue"  "Medium"  21062
       43 "White" "Large"   21054
       44 "White" "Large"   21062
       45 "Green" "Medium"  21062
       46 "White" "X-Large" 21059
       47 "Green" "Large"   21061
       48 "Red"   "Large"   21061
       49 "White" "X-Large" 21058
       50 "Red"   "Large"   21064
       51 "White" "X-Large" 21056
       52 "Blue"  "Large"   21056
       53 "White" "Medium"  21057
       54 "Green" "Small"   21058
       55 "Blue"  "Large"   21056
       56 "Red"   "Medium"  21061
       57 "White" "Medium"  21061
       58 "Blue"  "Small"   21062
       59 "Red"   "X-Large" 21061
       60 "Green" "Large"   21063
       61 "Green" "X-Large" 21065
       62 "White" "X-Large" 21056
       63 "Green" "Medium"  21061
       64 "Blue"  "Large"   21062
       65 "Green" "Medium"  21062
       66 "Red"   "X-Large" 21058
       67 "Blue"  "Large"   21059
       68 "Red"   "Large"   21062
       69 "Green" "Large"   21061
       70 "Blue"  "X-Large" 21059
       71 "White" "Medium"  21068
       72 "Green" "X-Large" 21060
       73 "Red"   "X-Large" 21060
       74 "Blue"  "Medium"  21060
       75 "Red"   "Small"   21060
       76 "Green" "X-Large" 21060
       77 "Blue"  "Large"   21066
       78 "Red"   "Small"   21060
       79 "Red"   "Large"   21068
       80 "Red"   "X-Large" 21057
       81 "Blue"  "Medium"  21063
       82 "Blue"  "Small"   21061
       83 "Red"   "Large"   21067
       84 "Red"   "Medium"  21065
       85 "Red"   "X-Large" 21066
       86 "Blue"  "Medium"  21061
       87 "Red"   "X-Large" 21056
       88 "Red"   "X-Large" 21060
       89 "White" "X-Large" 21060
       90 "White" "X-Large" 21059
       91 "Red"   "Medium"  21064
       92 "White" "Medium"  21057
       93 "White" "X-Large" 21057
       94 "White" "Small"   21060
       95 "Red"   "Medium"  21063
       96 "Green" "Medium"  21059
       97 "Blue"  "X-Large" 21057
       98 "Blue"  "Medium"  21060
       99 "Green" "Large"   21057
      100 "Blue"  "X-Large" 21061
      end
      format %td purchasedate
      My understanding (which is, admittedly, quite limited) is that Time Series wouldn't be appropriate here as the observations are the individual purchases (and therefore the same date will occur within many observations), rather than the dates. Although I guess if there is a way to change to represent the data this way then this would be possible!

      Thanks,
      I

      Comment


      • #4
        This sounds just like graphs of the cumulative frequencies of your date variable, qualified as you wish by colour, size or any combination thereof.

        distplot (Stata Journal) is a dedicated command to install. Of the entries below, the latest indicates which software you can see -- indeed as I write a further cosmetic update is in press in Stata Journal 17(3).

        Code:
        . search distplot , sj historical
        
        Search of official help files, FAQs, Examples, SJs, and STBs
        
        SJ-10-1 gr41_4  . . . . . . . . . . . . . . . . . Software update for distplot
                (help distplot if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                Q1/10   SJ 10(1):164
                new reverse(ge) option specifies plotting probabilities or
                frequencies greater than or equal to any data value
        
        SJ-5-3  gr0018  . . . . . . . . . .  Speaking Stata: The protean quantile plot
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                Q3/05   SJ 5(3):442--460           (see gr41_3 and gr42_3 for commands)
                discusses quantile and distribution plots as used in
                the analysis of species abundance data in ecology
        
        SJ-5-3  gr41_3  . . . . . . . . . . . . . . . . . Software update for distplot
                (help distplot if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                Q3/05   SJ 5(3):471
                simplified syntax; both by() and over() are now allowed
        
        SJ-4-2  gr0004  .  Speaking Stata: Graphing categorical and compositional data
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                Q2/04   SJ 4(2):190--215                                 (no commands)
                discusses graphical possibilities for categorical and
                compositional data
        
        SJ-4-1  gr0003  . . . . . . . . . . . . Speaking Stata: Graphing distributions
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                Q1/04   SJ 4(1):66--88                                   (no commands)
                a review of official and user-written commands for
                graphing univariate distributions; includes tricks
                beyond what is obviously and readily available
        
        SJ-3-4  gr41_2  . . . . . . . . . . . . . . . . . Software update for distplot
                (help distplot if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                Q4/03   SJ 3(4):449
                option tscale() renamed as trscale()
        
        SJ-3-2  gr41_1  . . . . . . . . . . . . . . . . . Software update for distplot
                (help distplot if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                Q2/03   SJ 3(2):211
                enhanced to use Stata 8 graphics and provides new options
        
        STB-51  gr41  . . . . . . . . . . . . . . . . . .  Distribution function plots
                (help distplot if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                9/99    pp.12--16; STB Reprints Vol 9, pp.108--112
                plots the cumulative distribution function or survival function
                and allows multiple variables
        Here are some token code examples:

        Code:
        distplot purchasedate, over(colour) frequency c(J..)
        
        distplot purchasedate if colour == "Blue", frequency c(J)
        Please note: We ask for full real names here, not nicknames, abbreviations, or pseudonyms. The FAQ Advice explains further.
        Last edited by Nick Cox; 13 Sep 2017, 08:34.

        Comment


        • #5
          That's excellent, thank you! I knew there must have been an easier way!

          I

          Comment


          • #6
            Correction to #4: An update to distplot will not be appearing in SJ 17(3). I confused intention with action. I added a feature, or fixed a misfeature, but have yet to finish the task.

            Comment


            • #7
              Dear Nick, thank you for the distplot program. This is really a basic graph which has been missing in STATA.

              Comment


              • #8
                #7 Thanks for the thanks. The StataCorp line (so to speak) on this can be seen at

                Code:
                help cumul

                which underlines that cumul can calculate the (empirical (cumulative)) distribution function and line can then plot it.

                Comment

                Working...
                X