Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looking for appropriate way to plot graph [Stata 18.0]

    Hey everyone,

    I've just begun with Stata, and I have absolutely no clue how to solve this.
    I have a drawing for a graph that I need to plot (see attachment). In the graph I need to depict errors in probands' estimations compared to the real world value for 8 different questions.
    Click image for larger version

Name:	drawing graph.png
Views:	2
Size:	1.44 MB
ID:	1760860

    My data structure is the following:
    The probands were asked for their estimations on 8 different topics -> I have 8 variables that show percentage values for each proband (>100 observations)
    Probands differ between each other by treatment (two treatments) and region (two regions).
    For each of the 8 topics I also have one real world value (in percent).

    The graph is supposed to differentiate between the two regions (Region 1 on the left, region 2 on the right).
    For each of the 8 variables I need one dotted line with
    - real world value (as the baseline)
    - belief percentage for treatment==0: mean with confidence interval
    - belief percentage for treatment==1: mean with confidence interval
    all on the same line.

    So far, I have calculated:
    - means
    - standard errors
    - lower and upper confidence interval level
    for all 8 variables.

    I have also generated new variables:
    Code:
     foreach v of varlist v1 v2 v3 v4 v5 v6 v7 v8  {
        gen meanS_`v' = mean_`v' if north==0
        gen meanN_`v' = mean_`v' if north==1
    }
    (same for the other statistics)
    in order to differentiate between regions in the graph, not sure if neccessary.

    Based on the drawing and the data structure, does anyone on the top of their head has an idea how to plot this graph?
    If anything is unclear, I'm happy to provide more detailed information.

    Thank you!

    Anna
    Attached Files

  • #2
    Dear Anna,

    Welcome to the Statalist. You could take a look at coefplot using this weblink, using:
    Code:
    * Set up
    ssc install coefplot, replace
    h coefplot // Check the help file
    which offers a lot of flexibility to create graphs, plots, of the 'above kind'.

    Do also read The Stata Journal paper of the author of this user community contributed command.
    Also his working paper is a 'must read' that I can recommend.
    http://publicationslist.org/eric.melse

    Comment


    • #3
      If coefplot solves this, that's great. Otherwise, here is some technique, after which the graphics is a twoway call.

      Code:
      clear 
      set obs 200 
      gen direction = cond(_n <= 100, "North", "South") 
      gen treatment = mod(_n, 2)
      set seed 314159 
      
      gen var1 = rnormal(0, 1)
      gen var2 = rnormal(1, 1)
      
      save sillyexample  
      
      forval j = 1/2 { 
          use sillyexample, clear 
          statsby, by(treatment direction) : ci mean var`j'
          gen which = `j'
          save ci`j' 
      }
      
      append using ci1 
      
      list
      Most of my code is setting up a data example, absent one in #1 here. You would need to change dataset and variable names, and the last command would be something like

      Code:
      append using ci1 ci2 ci3 ci4 ci5 ci6 ci7

      Comment


      • #4
        Thank you, Eric (#2).
        My first idea was coefplot, too, however I've only used it after regress or margins and am unsure if it's applicable to my type of data.

        This is an example how my data looked like in the beginning:
        expat_percent expat_true expat_percent_error treatment north
        1 . 10.83 . 0 0
        2 9 10.83 -1.83 0 0
        3 22 10.83 11.17 0 1
        4 . 10.83 . 0 1
        ... ... ... ... ... ...
        140 31 10.83 20.17 1 1
        - expat_percent: Participants were asked for their estimation how many expats live in a country (in percent)
        - expat_percent_error = expat_percent - 10.83
        I need to visualize the estimation errors of my participants, so expact_percent_error is the first of 8 variables that are of interest.

        Because mean and confidence interval for the expat_percent error will be shown, I was adviced to use the collapse command:
        Code:
        preserve
        collapse (mean) mean_* = expat_percent_error (semean) se_y = expat_percent_error, by(treatment north)
        save mean_and_se, replace
        restore
        merge m:1 treatment north using "~/mean_and_se.dta", assert(3) nogen
        
        gen yu = mean_expat_percent_error + 1.96 * se_y
        gen yl = mean_expat_percent_error - 1.96 * se_y
        So the data I got now is
        expat_percent expat_true expat_percent_error treatment north mean_expat_percent_error yu yl
        1 . 10.83 . 0 0 10.02 12.34 8.73
        2 9 10.83 -1.83 0 0 10.02 12.34 8.73
        3 22 10.83 11.17 0 1 8.74 11.29 6.01
        4 . 10.83 . 0 1 8.74 11.29 6.01
        ... ... ... ... ... ... ... ... ...
        140 31 10.83 20.17 1 1 8.85 10.92 6.23

        In order to use coefplot, how can I proceed in this case? (I did try to do regress and margins but it doesn't seem to be the right way)
        Would a twoway scatterplot work instead?

        Thank you!

        Comment


        • #5
          Dear Anna,
          You are right about the 'classical' functionality of coefplot that involves the processing of result data after regress or margins.
          But, there is a third input source that coefplot can process: a matrix.
          It is a somewhat more involved exercise but it allows for flexible coding and I use it a lot.
          I think that you can use the collapse command to get the data to work with for your plot (I have not considered your data example as such).
          But, from #4, should I conclude that you want to plot 140 variables?
          Your paper sketch mentions 8 variables(?).
          http://publicationslist.org/eric.melse

          Comment


          • #6
            Hi Eric,

            thanks for your reply and sorry for the confusion.
            Reg #4: 140 is my number of observations in the original data set.
            I just realized I wasn’t clear enough in my description, so let me clarify!
            My original dataset:
            • 140 observations
            • Various variables (incl treatment and north)
            • 8 variables of interest (first is expat_percent_error, called the rest v2-v8)
            • expat_percent: Participants were asked for their estimation how many expats live in a country (in percent)
            • expat_percent_error = expat_percent - 10.83
            • same for the 7 other variables of interest.
            observations Treatment north expat_percent expat_true expat_percent_error v2_percent V2_true V2_percent_error Same for v3-v7
            1 0 0 . 10.83 .
            2 0 0 9 10.83 -1.83
            3 0 1 22 10.83 11.17
            4 0 1 . 10.83 .
            ... … … ... ... ...
            140 1 1 31 10.83 20.17
            Because mean and confidence interval for the expat_percent_error (and v2-v8) will be shown, I was adviced to use the collapse command:
            Code:
            collapse (mean) mean_* = expat_percent_error (semean) se_* = expat_percent_error, by(treatment north)
            After collapse, I had
            • 4 observations
            • 16 variables (mean & semean for all 8 variables of interest)
            Observations mean_expat_percent_error se_ expat_percent_error …same for v2-v8…
            1 (treatment=0, north=0)
            2 (treatment=0, north=1)
            3 (treatment=1, north=0)
            4 (treatment=1, north=1)
            I then generated confidence interval limits:
            Code:
            gen yu_* = mean_expat_percent_error + 1.96 * se_*
            gen yl_* = mean_expat_percent_error - 1.96 * se_*
            for all 8 variables.

            Meaning I got the following data:
            • 4 observations
            • 24 variables (mean, yu & yl for all 8 variables of interest)
            Observations mean_expat_percent_error yu_expat_percent_error yl_expat_percent_error ...same for v2-v8…
            1 (treatment=0, north=0) 10.02 12.34 8.73
            2 (treatment=0, north=1) 8.74 11.29 6.01
            3 (treatment=1, north=0) 10.05 12.47 9.02
            4 (treatment=1, north=1) 8.85 10.92 6.23
            I can use this data set for plotting the graph if there is an appropriate way.
            I will try the coefplot matrix solution you suggested.




            My last step was probably very unnecessary, but I thought I needed it:
            Merged original data set (140 obs) with data from above.
            Code:
            merge m:1 treatment north using "~/data_from_above.dta", assert(3) nogen
            Result:
            Observation Treatment north expat_percent expat_true expat_percent_error mean_expat_percent_error yu yl …Same for 7 other variables…
            1 0 0 . 10.83 . 10.02 12.34 8.73
            2 0 0 9 10.83 -1.83 10.02 12.34 8.73
            3 0 1 22 10.83 11.17 8.74 11.29 6.01
            4 0 1 . 10.83 . 8.74 11.29 6.01
            ... … … ... ... ... ... ... ...
            140 1 1 31 10.83 20.17 8.85 10.92 6.23
            The graph is supposed to look like this:
            Click image for larger version

Name:	updated drawing graph.png
Views:	1
Size:	138.6 KB
ID:	1761020

            My ideas were
            • Coefplot (did not know about the matrix option, which is why I did the merge with the original data set- thought I had to somehow regress for it to work)
            • Twoway scatterplot (but I couldn’t manage to show all 8 variables on the y-axis, with treatment=0 and treatment=1 on the same line for all 8 variables)
            Hope this description is more clear.

            Thank you!
            Last edited by Anna Binger; 08 Aug 2024, 04:33.

            Comment


            • #7
              Your data example is not well-organized. Presumably, you have all combinations of region (North, South) and treatment (Control, Treatment) per unit of observation. People are less likely to respond if you do not put in the work to create a satisfactory example. I will make an exception since you are new. Note that unless the treatment and control groups exhibit significant differences, placing the confidence intervals in the same row would make the presentation difficult to follow. You need offsets. Additionally, there is no need for a marker for the true values since they are constant; the vertical line is sufficient.

              Code:
              clear
              input int obs byte expat_percent double(expat_true expat_percent_error) byte(treatment north) double(mean_expat_percent_error yu yl)
                1  . 10.83     . 0 0 10.02 12.34 8.73
                1  . 10.83     . 1 0 12.02 14.34 10.73
                1  . 10.83     . 0 1 13.02 13.34 9.73
                1  . 10.83     . 1 1 13.02 15.34 11.73
                3 22 10.83 11.17 1 1  10.74 13.29 8.01
                3 22 10.83 11.17 0 1  8.74 11.29 6.01
                3 22 10.83 11.17 1 0  11.74 14.29 9.01
                3 22 10.83 11.17 0 0  9.74 12.29 7.01
              140 31 10.83 20.17 1 0  8.85 10.92 6.23
              140 31 10.83 20.17 1 1  11.85 13.92 9.23
              140 31 10.83 20.17 0 0  7.85 9.92 5.23
              140 31 10.83 20.17 0 1  10.85 12.92 8.23
              end
              
              cap noisily{
                  mkmat expat_true-yl if north & !treatment, mat(coefs10) rowname(obs)
                  mkmat expat_true-yl if north & treatment, mat(coefs11) rowname(obs)
                  mkmat expat_true-yl if !north & !treatment, mat(coefs00) rowname(obs)
                  mkmat expat_true-yl if !north & treatment, mat(coefs01) rowname(obs)
              }
              
              coefplot (mat(coefs10[,5]), ci((coefs10[,7] coefs10[,6])))  (mat(coefs11[,5]), ci((coefs11[,7] coefs11[,6]))), bylabel(North) || ///
              (mat(coefs00[,5]), ci((coefs00[,7] coefs00[,6]))) (mat(coefs01[,5]), ci((coefs01[,7] coefs01[,6]))), bylabel(South) ||, ///
              xline(`=coefs11[1,1]') leg(order(2 "Control" 4 "Treatment")) byopts(note(Dashed line represents true values))
              Click image for larger version

Name:	Graph.png
Views:	1
Size:	24.9 KB
ID:	1761026

              Last edited by Andrew Musau; 08 Aug 2024, 06:15.

              Comment


              • #8
                New thread at https://www.statalist.org/forums/for...ence-intervals

                If you've abandoned this one, please say so. Otherwise please don't run two or more related threads at the same time.

                Comment


                • #9
                  Hi Andrew,

                  Thank you so much and my apologies.
                  This is my first time ever working with data as well as Stata and I thought it would be neccessary to walk you through my entire thought and work process.
                  I now realize that was wrong.

                  There seems to be a misunderstanding in #7. I made a mistake in how I presented my data.


                  This is the data I need to present in an easier to use version:

                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input float(treatment north mean_v1 mean_v2 yu_v1 yu_v2 yl_v1 yl_v2)
                  0 0 12.92 4.38 15.19 5.21 10.61  3.5
                  0 1  8.05 3.07 10.59 4.15  5.47 1.99
                  1 0  13.4 5.42 17.96 6.76 10.82 4.08
                  1 1  8.11 2.99 12.54 3.68  5.62 2.32
                  end
                  (To make it easier, I reduced the amount of v* from 8 to 2.)
                  The graph then should look like this:

                  Thank you!
                  Click image for larger version

Name:	drawing graph two var.png
Views:	3
Size:	90.1 KB
ID:	1761042

                  Comment


                  • #10
                    Hi Nick #8.

                    My apologies. I did not mean to abandon this thread. It took me a while to figure out how to rephrase my question and present the data.
                    In this thread, I asked which would be the appropriate way to visualize my data.
                    In the other thread I asked a specific question regarding the code of a scatterplot, which is why I thought it would not be appropriate to post in this thread, even tho it uses the same example data.

                    My apologies! I will take down the other thread, if that is what I am supposed to do.

                    Anna

                    Comment


                    • #11
                      Same technique once you restructure your data, except that the points I make in #7 apply. Hollow symbols may help if you insist on having everything in one line.

                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input float(treatment north mean_v1 mean_v2 yu_v1 yu_v2 yl_v1 yl_v2)
                      0 0 12.92 4.38 15.19 5.21 10.61  3.5
                      0 1  8.05 3.07 10.59 4.15  5.47 1.99
                      1 0  13.4 5.42 17.96 6.76 10.82 4.08
                      1 1  8.11 2.99 12.54 3.68  5.62 2.32
                      end
                      
                      gen id=_n
                      reshape long mean_ yu_ yl_, i(id) j(which) string
                      destring which, ignore(v) replace
                      cap noisily{
                          mkmat which-yl if north & !treatment, mat(coefs10) rowname(which)
                          mkmat which-yl if north & treatment, mat(coefs11) rowname(which)
                          mkmat which-yl if !north & !treatment, mat(coefs00) rowname(which)
                          mkmat which-yl if !north & treatment, mat(coefs01) rowname(which)
                      }
                      
                      coefplot (mat(coefs10[,4]), ci((coefs10[,6] coefs10[,5])) msy(Oh)) ///
                      (mat(coefs11[,4]), ci((coefs11[,6] coefs11[,5])) msy(Th)), offset(0) bylabel(North) || ///
                      (mat(coefs00[,4]), ci((coefs00[,6] coefs00[,5]))) ///
                      (mat(coefs01[,4]), ci((coefs01[,6] coefs01[,5]))), offset(0) bylabel(South) ||, ///
                      leg(order(2 "Control" 4 "Treatment")) byopts(note(Dashed line represents true values)) ///
                      ciopts(recast(rcap)) ylab(1 "Var 1" 2 "Var 2")
                      Click image for larger version

Name:	Graph.png
Views:	1
Size:	24.6 KB
ID:	1761060

                      Comment


                      • #12
                        #10 Thanks for your explanation. You can't delete threads or posts of yours; this is explained at https://www.statalist.org/forums/help#closure

                        Otherwise, speaking for myself, there are now several questions in two threads, and several lines of attack suggested (from me too), so I think I'll bail out now and trust that your threads will converge.

                        Comment


                        • #13
                          Thank you, Andrew and Nick, this was incredibly helpful.

                          Comment


                          • #14
                            Hello everyone,
                            I do have a follow up question that I'm not sure belongs here or in a new thread.
                            Since it is based on the same data, I decided to try it here first. Happy to open a new thread if that is more appropriate!

                            Data:
                            Code:
                            * Example generated by -dataex-. For more info, type help dataex
                            clear
                            input float(treatment north mean_v1 mean_v2 se_v1 se_v2)
                            0 0 12.92 4.38 .76 .25
                            0 1  8.05 3.07  .9 .19
                            1 0  13.4 5.42 .91 .24
                            1 1  8.11 2.99 .85 .21
                            end
                            I need to generate a table like this:
                            North South
                            Treatment = 0 Treatment = 1 Treatment = 0 Treatment =1
                            mean se mean se mean se mean se
                            Var 1 8.05 0.90 8.1 0.85 12.92 0.76 13.4 0.91
                            Var 2 3.07 0.19 2.99 0.21 4.38 0.25 5.42 0.24
                            My idea was plotting a matrix (like before in #11):


                            Code:
                            gen id=_n 
                             reshape long mean_ se_, i(id) j(which) string destring which, ignore(v) replace cap noisily{     mkmat which-yl if north & !treatment, mat(table10) rowname(which)     mkmat which-yl if north & treatment, mat(table11) rowname(which)     mkmat which-yl if !north & !treatment, mat(table00) rowname(which)     mkmat which-yl if !north & treatment, mat(table01) rowname(which)}
                            but I'm at a loss how to proceed. Does anyone have an idea? Thank you!

                            Comment


                            • #15
                              Start a new thread as this thread is about graphing and #14 is not.

                              Comment

                              Working...
                              X