Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time-series graph issues

    Dear Members,

    I am coming back to you as I have been stuck for the past few days attemting to graph the results of a randomised trial. I am very new at using STATA graph commands

    I created a dataset for that matter which just has variables time (0, 1, and 5 month(s)), study group (control, group1, group2), and another variable containing the collapsed prevalence estimates (proportions) by study group and trial phase. All I wish to do is a graph which would show readers the trial results by study group and trial phase. This would be a graph with three lines on the same graph representing the level of the primary outcome for each study group and by trial phase. I am assuming the points on the X axis would be each trial phase (so 0, 1 and 5 months (s), and the y-axis, the prevalence).

    So far all the commands I have tried have given me graphs that do not make sense (in terms of the y and x axis). The last command I tried is:

    twoway (tsline prevVar t, mcolor("red"))||(tsline prevVar t, mcolor("blue"))||(tsline prevVar t, mcolor("green")).

    This is from a command I found on one of the numerous help materials I had looked for online. And I have to admit that I may not understand all the command is doing.

    The graph I am getting is one where the x-axis is time, but it is written (1960m1-1960m2 up until 1960m6). I am not sure why.
    I don't know what the y-axix is as it goes from 0 to 5 (whereas these should be proportions, and the highest result from the trial is close to 0.30). And the graph legend shows me 6 lines of different colors (whereas I am only seeing 3 lines on the graph). 3 lines indicate the 3 trial time-points, and 3 the prevalence estimates that I do see on the graph.

    Would you please be kind enough to advise me?

    Thank you very much.

  • #2
    The appearance of your x-axis labels as 1960m1 through 1960m6 would, I imagine, be because your variable t has a %tm display format. If you remove that display format, you will just get numbers 1 through 6. Run -format t %1.0f-.

    As for the vertical axis extending up to 5 when you expect all values to be between 0 and 1, I can only suggest that you have an error in your data, and some values of prevVar are in fact greater than 1, and near 5. I have never known Stata to spuriously extend the axis. So run -summarize prevVar- to see the overall distribution of prevVar, and then find the offending observations and do whatever it takes to correct them.

    The command you give appears to make three copies of the same graph. These would be superimposed on each other, but show up separately in the legends, accounting for why you have a legend with more entries than you have visible curves.

    So here's an outline of what you need to do:

    1. If your treatment group variable is not currently a numeric variable, create one using -encode-. I'll assume that control = 0, treatment 1 = 1, and treatment2 = 2 for the purposes of illustration here and that this variable is called study_arm.

    2. You will need to reshape your data wide for this. I assume you have a variable, participant_id, that uniquely identifies each study participant's observations, and that for each participant you have a separate observation at each time period, the time periods being identified by variable t.
    Code:
    reshape wide prevVar, i(participant_id t) j(study_arm)
    3. Now you are ready to graph:
    Code:
    tsline prevVar0-prevVar2 t
    You can embellish that basic graph with -tsline- options as you see fit.

    As you can see, I had to write extensive assumptions and instructions about your data. All of that could have been avoided had you posted an example of your data using the -dataex- command. It would have saved both of us a lot of time. If you are running Stata version 15.1, -dataex- is part of your official installation. If running earlier Stata, run -ssc install dataex- to get the command. Either way, read -help dataex- for the simple instructions. When requesting help with code, always show example data, and always use -dataex- to do so.

    Comment


    • #3
      Welcome to the Stata Forum/ Statalist,

      Maybe I’m wrong, but you’re demanding too much imagination from the ones who could help you.

      You could take a look at the FAQ, then present data/command/output according to the recommendations.

      With regards to the way you see dates in the x-axis, this will be linked to the way you - tsset - the data, which keeps arcane to us so far.

      P.S.: Crossed with Clyde’s insightful reply.
      Best regards,

      Marcos

      Comment


      • #4
        Dear Clyde and Marcos,

        Thank you very much for yoru response, and I am extrenely sorry if the way my message was written required for you to use too much imagination, This was really not my intent. I honestly thought that my message was detailed.

        I have attempted to use dataex (not excatly sure how this work), to see if this would give you more clarity regarding the data. I have just selected three variables for this purpose. The trial phase (there are three phases), the study group (variable srgp: three study groups), and one variable (prbinhssoap) which contains 6 proportions corresponding to the prevalence for each study group and for each trial phase. You will see that all numbers are between 0 and 1. There is no ID variable in this dataset, as I had created it just for the prurpose of the graphs. It is not the actual full dataset. I did not know I would need ID var for this dataset. I hope the example below is clearer.

        Thank you very much again for your time, and I am very sorry again.


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double prbinhwsoap byte(phase sgrp)
         .05730659142136574 1 1
          .0338028185069561 1 2
         .06488011032342911 1 3
        .048013243824243546 2 1
          .0879712775349617 2 2
          .2431972771883011 2 3
         .06864988803863525 3 1
         .08771929889917374 3 2
          .2177777737379074 3 3
        end
        label values phase phase
        label def phase 1 "Baseline", modify
        label def phase 2 "One-month follow-up", modify
        label def phase 3 "Five-months follow-up", modify

        Comment


        • #5






          Your use of -dataex- was perfect. Thank you.

          So, for your limited example, I think you want:
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double prbinhwsoap byte(phase sgrp)
           .05730659142136574 1 1
            .0338028185069561 1 2
           .06488011032342911 1 3
          .048013243824243546 2 1
            .0879712775349617 2 2
            .2431972771883011 2 3
           .06864988803863525 3 1
           .08771929889917374 3 2
            .2177777737379074 3 3
          end
          label values phase phase
          label def phase 1 "Baseline", modify
          label def phase 2 "One-month follow-up", modify
          label def phase 3 "Five-months follow-up", modify
          
          
          reshape wide prbinhwsoap, i(phase) j(sgrp)
          tsset phase
          tsline prbinhwsoap*
          You may want to apply some options to the -tsline- command to prettify the horizontal axis or otherwise control the aesthetics of the graph.

          Comment


          • #6
            Dear Clyde,

            Thank you so so so much for your help. The recommended commands above worked perfectly. After reshaping the data, STATA proposed a prevalence variable for each of my study groups. I enterred this command: tsline prbinhwsoap1 prbinhwsoap2 prcbinhwsoap3 to have the three lines by study group. The x and y axis are now correct. I will now work on prettifying the graph.

            Thank you so much.

            Comment


            • #7
              Dear Clyde and STATALIST,

              I am following up on the query I had sent above regarding time series graph, if this is ok. I wish to fit conifdence intervals (range plots) for the values of the graph I had ploted, as explained above (time serie graph showing the results of a cluster trial by the 3 study phases and 3 trial arms). However, I am not sure how to go about it. Th einitial command used to create the graphs was:
              tsline prbinhwsoap1 prbinhwsoap2 prbinhwsoap3

              From the STATA Graphics manual, I could see that this may require that I collapse my dataset and generate a standard deviation variable and create the upper and lower confidence limits.However, I am not sure how to go about this. Below is the dataset:


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input double(prbinhwsoap1 prbinhwsoap2 prbinhwsoap3)
              5.730659008026123 3.380281686782837  6.488011360168457
              4.801324367523193 8.797127723693848 24.319726943969727
              6.864988327026367 8.771929740905762  21.77777862548828
              end
              Var1 is composed of the probality estimates of the primary outcome for the trial arm 1 for each of the 3 sudy phases; Var2, the estimates for the trial arm 2 for each study phase; the same goes for Var3.

              Thank you very much for your help again.

              Kind regards

              Comment


              • #8
                I don't understand what you want to do here. If this is a continuation of the earlier work, then it seems you have exactly one observation for each group in each phase. (Indeed, that must be the case or the code in #4 would not have worked.) So there is nothing to take standard deviations or errors of and there is no range of values to show. Please clarify the source of variation that you are trying to show. And then post a data example that includes that variable.

                Comment


                • #9
                  Dear Clyde,

                  I apologise, this was the wrong dataset. The actual dataset is below, with 'sgrp' the study group, 'compid' the unique ID, and 'phase2' the trial phase. For each compound (compid), the unique ID is repeated three times, corresponding to the measure taken at each trial phase. To reproduce the graph I had done previously based on the above thread, I tried to reshape my data in wide format. However, I get an error code from STATA stating that the values in sgrp are not unique within phase2. I am not sure how to go about this. Regarding the confidence intervals, it would be around the values in my previous post (with the wrong dataset). These values are the probalities for the primary outcome in each of my trial groups and at each trial phase. These probalities are calculated based on all the values in the dataset below and per study group. I hope this is clearer.

                  Thank you very much.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str23 sgrp str4 compid float phase2 double prbinhwsoap
                  "TNSB-based intervention" "1479" 1 12.903225898742676
                  "TNSB-based intervention" "1479" 3               12.5
                  "TNSB-based intervention" "1479" 2             28.125
                  "Control"                 "1558" 3                  0
                  "Control"                 "1558" 1  2.702702760696411
                  "Control"                 "1558" 2                  0
                  "Control"                 "1559" 1                 10
                  "Control"                 "1559" 2                  0
                  "Control"                 "1559" 3 15.384614944458008
                  "Control"                 "1653" 3   8.69565200805664
                  "Control"                 "1653" 2  5.263157844543457
                  "Control"                 "1653" 1                  0
                  "HWS-only intervention"   "1682" 3                  0
                  "HWS-only intervention"   "1682" 1                  0
                  "HWS-only intervention"   "1682" 2                  0
                  "Control"                 "1693" 2                 10
                  "Control"                 "1693" 3 3.5087718963623047
                  "Control"                 "1693" 1  5.263157844543457
                  "HWS-only intervention"   "1712" 3                  0
                  "HWS-only intervention"   "1712" 2  3.846153736114502
                  "HWS-only intervention"   "1712" 1  6.666666507720947
                  "TNSB-based intervention" "1713" 3                 10
                  "TNSB-based intervention" "1713" 2                 12
                  "TNSB-based intervention" "1713" 1 3.4482758045196533
                  "HWS-only intervention"   "1724" 3  6.666666507720947
                  "HWS-only intervention"   "1724" 1                  0
                  "HWS-only intervention"   "1724" 2                  0
                  "HWS-only intervention"   "1740" 1 2.2727272510528564
                  "HWS-only intervention"   "1740" 2  5.882352828979492
                  "HWS-only intervention"   "1740" 3                  0
                  "TNSB-based intervention" "1741" 2  42.85714340209961
                  "TNSB-based intervention" "1741" 3  9.090909004211426
                  "TNSB-based intervention" "1741" 1                 16
                  "HWS-only intervention"   "1757" 2 13.333333015441895
                  "HWS-only intervention"   "1757" 3                  0
                  "HWS-only intervention"   "1757" 1                  0
                  "TNSB-based intervention" "1760" 2                 40
                  "TNSB-based intervention" "1760" 3  45.45454406738281
                  "TNSB-based intervention" "1760" 1                  0
                  "HWS-only intervention"   "203"  3 13.333333015441895
                  "HWS-only intervention"   "203"  2                  0
                  "HWS-only intervention"   "203"  1                  0
                  "HWS-only intervention"   "2147" 2 13.636363983154297
                  "HWS-only intervention"   "2147" 1                  0
                  "HWS-only intervention"   "2147" 3  16.66666603088379
                  "TNSB-based intervention" "2206" 1                 20
                  "TNSB-based intervention" "2267" 3                 36
                  "TNSB-based intervention" "2267" 1   4.34782600402832
                  "TNSB-based intervention" "2267" 2 10.810811042785645
                  "Control"                 "2308" 1  6.060606002807617
                  "Control"                 "2308" 2                  5
                  "Control"                 "2308" 3  4.166666507720947
                  "HWS-only intervention"   "2413" 1                  0
                  "HWS-only intervention"   "2413" 3                  0
                  "HWS-only intervention"   "2413" 2  5.128205299377441
                  "TNSB-based intervention" "2441" 2  42.85714340209961
                  "TNSB-based intervention" "2441" 1             21.875
                  "TNSB-based intervention" "2441" 3             15.625
                  "Control"                 "2448" 1  4.761904716491699
                  "Control"                 "2448" 2 3.4482758045196533
                  "Control"                 "2448" 3   2.17391300201416
                  "Control"                 "2493" 1  8.108108520507813
                  "Control"                 "2493" 3 15.384614944458008
                  "Control"                 "2493" 2  7.142857074737549
                  "Control"                 "2504" 1                  0
                  "Control"                 "2504" 2  3.846153736114502
                  "Control"                 "2504" 3                  0
                  "HWS-only intervention"   "2538" 1  11.11111068725586
                  "HWS-only intervention"   "2538" 3                 25
                  "HWS-only intervention"   "2538" 2                 20
                  "HWS-only intervention"   "2544" 3 3.4482758045196533
                  "HWS-only intervention"   "2544" 1                  0
                  "HWS-only intervention"   "2544" 2  9.523809432983398
                  "HWS-only intervention"   "26"   1 15.384614944458008
                  "HWS-only intervention"   "26"   3 14.814814567565918
                  "HWS-only intervention"   "26"   2  4.545454502105713
                  "TNSB-based intervention" "2759" 1  9.090909004211426
                  "TNSB-based intervention" "2759" 2                  0
                  "TNSB-based intervention" "2759" 3  11.11111068725586
                  "TNSB-based intervention" "2761" 2  55.55555725097656
                  "TNSB-based intervention" "2761" 1                  0
                  "TNSB-based intervention" "2761" 3  23.52941131591797
                  "HWS-only intervention"   "2765" 2  16.66666603088379
                  "HWS-only intervention"   "2765" 1   4.34782600402832
                  "HWS-only intervention"   "2765" 3  18.18181800842285
                  "HWS-only intervention"   "2777" 1  4.545454502105713
                  "HWS-only intervention"   "2777" 3 28.571428298950195
                  "HWS-only intervention"   "2777" 2  16.66666603088379
                  "Control"                 "278"  1                  0
                  "Control"                 "278"  2                  0
                  "Control"                 "278"  3                  0
                  "TNSB-based intervention" "2835" 3  19.44444465637207
                  "TNSB-based intervention" "2835" 1                  0
                  "TNSB-based intervention" "2835" 2 26.923076629638672
                  "TNSB-based intervention" "2989" 2 10.526315689086914
                  "TNSB-based intervention" "2989" 1 1.6666666269302368
                  "TNSB-based intervention" "2989" 3 29.032258987426758
                  "Control"                 "3001" 3                 20
                  "Control"                 "3001" 2                  0
                  "Control"                 "3001" 1                  0
                  end

                  Comment


                  • #10
                    Much clearer, thanks. There are a few obstacles in your path here. There may be a user written command that makes quick work of putting standard error bars on line graphs, but within official Stata we have only -rcap-, which gets cumbersome when you are plotting several graphs on the same panel. Next, reshaping your data to wide layout is made more complicated here because your sgrp variable is not only a string variable, but it takes on values that are not legal in variable names. So we have to deal with that. The gist of it is this:

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str23 sgrp str4 compid float phase2 double prbinhwsoap
                    "TNSB-based intervention" "1479" 1 12.903225898742676
                    "TNSB-based intervention" "1479" 3               12.5
                    "TNSB-based intervention" "1479" 2             28.125
                    "Control"                 "1558" 3                  0
                    "Control"                 "1558" 1  2.702702760696411
                    "Control"                 "1558" 2                  0
                    "Control"                 "1559" 1                 10
                    "Control"                 "1559" 2                  0
                    "Control"                 "1559" 3 15.384614944458008
                    "Control"                 "1653" 3   8.69565200805664
                    "Control"                 "1653" 2  5.263157844543457
                    "Control"                 "1653" 1                  0
                    "HWS-only intervention"   "1682" 3                  0
                    "HWS-only intervention"   "1682" 1                  0
                    "HWS-only intervention"   "1682" 2                  0
                    "Control"                 "1693" 2                 10
                    "Control"                 "1693" 3 3.5087718963623047
                    "Control"                 "1693" 1  5.263157844543457
                    "HWS-only intervention"   "1712" 3                  0
                    "HWS-only intervention"   "1712" 2  3.846153736114502
                    "HWS-only intervention"   "1712" 1  6.666666507720947
                    "TNSB-based intervention" "1713" 3                 10
                    "TNSB-based intervention" "1713" 2                 12
                    "TNSB-based intervention" "1713" 1 3.4482758045196533
                    "HWS-only intervention"   "1724" 3  6.666666507720947
                    "HWS-only intervention"   "1724" 1                  0
                    "HWS-only intervention"   "1724" 2                  0
                    "HWS-only intervention"   "1740" 1 2.2727272510528564
                    "HWS-only intervention"   "1740" 2  5.882352828979492
                    "HWS-only intervention"   "1740" 3                  0
                    "TNSB-based intervention" "1741" 2  42.85714340209961
                    "TNSB-based intervention" "1741" 3  9.090909004211426
                    "TNSB-based intervention" "1741" 1                 16
                    "HWS-only intervention"   "1757" 2 13.333333015441895
                    "HWS-only intervention"   "1757" 3                  0
                    "HWS-only intervention"   "1757" 1                  0
                    "TNSB-based intervention" "1760" 2                 40
                    "TNSB-based intervention" "1760" 3  45.45454406738281
                    "TNSB-based intervention" "1760" 1                  0
                    "HWS-only intervention"   "203"  3 13.333333015441895
                    "HWS-only intervention"   "203"  2                  0
                    "HWS-only intervention"   "203"  1                  0
                    "HWS-only intervention"   "2147" 2 13.636363983154297
                    "HWS-only intervention"   "2147" 1                  0
                    "HWS-only intervention"   "2147" 3  16.66666603088379
                    "TNSB-based intervention" "2206" 1                 20
                    "TNSB-based intervention" "2267" 3                 36
                    "TNSB-based intervention" "2267" 1   4.34782600402832
                    "TNSB-based intervention" "2267" 2 10.810811042785645
                    "Control"                 "2308" 1  6.060606002807617
                    "Control"                 "2308" 2                  5
                    "Control"                 "2308" 3  4.166666507720947
                    "HWS-only intervention"   "2413" 1                  0
                    "HWS-only intervention"   "2413" 3                  0
                    "HWS-only intervention"   "2413" 2  5.128205299377441
                    "TNSB-based intervention" "2441" 2  42.85714340209961
                    "TNSB-based intervention" "2441" 1             21.875
                    "TNSB-based intervention" "2441" 3             15.625
                    "Control"                 "2448" 1  4.761904716491699
                    "Control"                 "2448" 2 3.4482758045196533
                    "Control"                 "2448" 3   2.17391300201416
                    "Control"                 "2493" 1  8.108108520507813
                    "Control"                 "2493" 3 15.384614944458008
                    "Control"                 "2493" 2  7.142857074737549
                    "Control"                 "2504" 1                  0
                    "Control"                 "2504" 2  3.846153736114502
                    "Control"                 "2504" 3                  0
                    "HWS-only intervention"   "2538" 1  11.11111068725586
                    "HWS-only intervention"   "2538" 3                 25
                    "HWS-only intervention"   "2538" 2                 20
                    "HWS-only intervention"   "2544" 3 3.4482758045196533
                    "HWS-only intervention"   "2544" 1                  0
                    "HWS-only intervention"   "2544" 2  9.523809432983398
                    "HWS-only intervention"   "26"   1 15.384614944458008
                    "HWS-only intervention"   "26"   3 14.814814567565918
                    "HWS-only intervention"   "26"   2  4.545454502105713
                    "TNSB-based intervention" "2759" 1  9.090909004211426
                    "TNSB-based intervention" "2759" 2                  0
                    "TNSB-based intervention" "2759" 3  11.11111068725586
                    "TNSB-based intervention" "2761" 2  55.55555725097656
                    "TNSB-based intervention" "2761" 1                  0
                    "TNSB-based intervention" "2761" 3  23.52941131591797
                    "HWS-only intervention"   "2765" 2  16.66666603088379
                    "HWS-only intervention"   "2765" 1   4.34782600402832
                    "HWS-only intervention"   "2765" 3  18.18181800842285
                    "HWS-only intervention"   "2777" 1  4.545454502105713
                    "HWS-only intervention"   "2777" 3 28.571428298950195
                    "HWS-only intervention"   "2777" 2  16.66666603088379
                    "Control"                 "278"  1                  0
                    "Control"                 "278"  2                  0
                    "Control"                 "278"  3                  0
                    "TNSB-based intervention" "2835" 3  19.44444465637207
                    "TNSB-based intervention" "2835" 1                  0
                    "TNSB-based intervention" "2835" 2 26.923076629638672
                    "TNSB-based intervention" "2989" 2 10.526315689086914
                    "TNSB-based intervention" "2989" 1 1.6666666269302368
                    "TNSB-based intervention" "2989" 3 29.032258987426758
                    "Control"                 "3001" 3                 20
                    "Control"                 "3001" 2                  0
                    "Control"                 "3001" 1                  0
                    end
                    
                    
                    collapse (mean) prbinhwsoap (semean) stderr = prbinhwsoap, by(sgrp phase2)
                    gen upper = prbinhwsoap + 1.96*stderr
                    gen lower = prbinhwsoap - 1.96*stderr
                    
                    drop stderr
                    
                    encode sgrp, gen(n_sgrp)
                    label list n_sgrp
                    drop sgrp
                    reshape wide prbinhwsoap upper lower, i(phase2) j(n_sgrp)
                    graph twoway line prbinhwsoap* phase2, sort || rcap upper1 lower1 phase2 ///
                        || rcap upper2 lower2 phase2 || rcap upper3 lower3 phase2, ///
                        note("Error bars represent 95% CIs")
                    Now, you will want to improve this code by controlling the appearance of the legend, and you may want to further customize the graph using some -twoway- options, e.g. to control the colors of the error bars to match those of the lines they center on. The one "frill" that I did put in is the note at the bottom making it clear that the error bars represent 95% confidence intervals. This is because error bars are, in the literature as a whole, used indifferently to represent confidence intervals or standard errors or even standard deviations. Without clarification, the graph becomes an enigma.

                    Comment


                    • #11
                      Dear Clyde,

                      Thank you so much for the explanation and solution above. I still cannot understand how one can get to this level of knowledge! Thank you so much for the frill added for clarification. This is perfect!. If it is ok, I will take some time to digest everything you have said and have a look at my data, as some of the estimates I am seeing are a little different than what they should be. Also, I was wondering if there was a way for STATA graph CI command to take clustering into account, with compid as the cluster unit variable? I can see that my confidence intervals are also a little different than what they should be. I shall be back here, as soon as I have identified why I am not getting exactly the same estimates as the one I reported.

                      Thank you so much again

                      Comment

                      Working...
                      X