Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphing based on a condition of longitudinal data

    Dear Statalist User,

    I have a problem regarding how to graph two scatter plots and their fitted lines with longitudinal data. Morover, I have the id (xwaveid), the year (wave), whether they receive salary/wages of not (wschave==1), whether they are in the control or treatment group (incontrolgroup==1 or intreatmentgroup==1) and their annual gross income. My data is as below.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long xwaveid float wave byte wschave float(incontrolgroup intreatmentgroup annualgrossincome)
    100018 2006 1 1 0  24232
    100018 2007 1 1 0  22412
    100018 2008 1 1 0  12220
    100018 2009 1 1 0   8840
    100018 2010 1 1 0  14820
    100018 2011 1 1 0  18564
    100018 2012 1 1 0  14560
    100018 2013 1 1 0  17420
    100018 2014 1 1 0  19500
    100019 2006 1 1 0 119600
    100019 2007 1 1 0  72800
    100019 2008 1 1 0 119600
    100019 2009 1 1 0 104000
    100019 2010 1 1 0 140400
    100019 2011 1 1 0  72800
    100019 2012 1 1 0 135200
    100019 2013 1 1 0 140920
    100019 2014 1 1 0 156000
    100100 2006 1 0 1  53820
    100100 2007 1 0 1  65962
    100100 2008 1 0 1  55042
    100100 2009 1 0 1  65000
    100100 2010 1 0 1  59020
    100100 2011 1 0 1  62400
    100100 2012 1 0 1  80600
    100100 2013 1 0 1  78000
    100100 2014 1 0 1  80600
    100107 2006 1 1 0  50700
    100107 2007 1 1 0  46800
    100107 2008 1 1 0  47840
    100107 2009 1 1 0  52000
    100107 2010 1 1 0      .
    100107 2011 1 1 0  50752
    100107 2012 1 1 0  50778
    100107 2013 1 1 0  55328
    100107 2014 1 1 0  70200
    100138 2006 1 0 1  36322
    100138 2007 1 0 1  39000
    100138 2008 2 0 1      .
    100138 2009 1 0 1  38428
    100138 2010 1 0 1  34580
    100138 2011 1 0 1  38584
    100138 2012 1 0 1  56472
    100138 2013 1 0 1  53924
    100138 2014 1 0 1  57200
    100140 2006 1 1 0  28600
    100140 2007 1 1 0  52000
    100140 2008 1 1 0  31200
    100140 2009 1 1 0  26000
    100140 2010 1 1 0  33800
    100140 2011 1 1 0  26000
    100140 2012 1 1 0  39000
    100140 2013 1 1 0  46800
    100140 2014 1 1 0  44200
    100164 2006 1 0 1  65000
    100164 2007 1 0 1  39000
    100164 2008 1 0 1  59982
    100164 2009 1 0 1  80990
    100164 2010 1 0 1  79040
    100164 2011 1 0 1  67600
    100164 2012 1 0 1  82940
    100164 2013 1 0 1  83200
    100164 2014 2 0 1      .
    100165 2006 1 0 1  53118
    100165 2007 1 0 1  41600
    100165 2008 1 0 1  54600
    100165 2009 1 0 1  64792
    100165 2010 1 0 1  52000
    100165 2011 1 0 1  75296
    100165 2012 1 0 1  70200
    100165 2013 1 0 1  72982
    100165 2014 2 0 1      .
    100185 2006 1 0 1  46800
    100185 2007 1 0 1  50544
    100185 2008 1 0 1  38220
    100185 2009 2 0 1      .
    100185 2010 2 0 1      .
    100185 2011 1 0 1  45760
    100185 2012 1 0 1  34476
    100185 2013 2 0 1      .
    100185 2014 2 0 1      .
    100195 2006 1 0 1  49428
    100195 2007 1 0 1  61440
    100195 2008 1 0 1  63912
    100195 2009 1 0 1  51912
    100195 2010 1 0 1  53472
    100195 2011 2 0 1      .
    100195 2012 1 0 1  38480
    100195 2013 1 0 1  44200
    100195 2014 1 0 1  39000
    100196 2006 1 0 1  52000
    100196 2007 1 0 1  67600
    100196 2008 1 0 1  72800
    100196 2009 1 0 1  62400
    100196 2010 1 0 1  65728
    100196 2011 1 0 1  65000
    100196 2012 1 0 1  83200
    100196 2013 1 0 1  93600
    100196 2014 1 0 1  95472
    100338 2006 1 0 1  36000
    end
    format %ty wave
    label values wschave FWSCHAVE
    label def FWSCHAVE 1 "[1] Currently receives wage and salary income", modify
    label def FWSCHAVE 2 "[2] Does not currently receive wage and salary income", modify

    I have tried different combinations based on what I found online. But my graphs look very weird. (since I havent achieved a decent graph yet, I havent inserted a code for the fitted line). For the below code

    Code:
    twoway (scatter annualgrossincome wave if intreatmentgroup==1 & wschave==1,sort) (scatter annualgrossincome wave if incontrolgroup==1 & wschave==1,sort)
    I have obtained the below graph.

    Click image for larger version

Name:	Screen Shot 2019-01-17 at 11.58.24 pm.png
Views:	1
Size:	90.4 KB
ID:	1479264


    My goal is look at trends in the control and treatment groups trends. I hope to see a similar trend until 2010 which diverges for the treatment group afterwards. But since I cannot create the graph properly I cannot look at the existence of a parallel trend prior 2010. Any help would be very highly appreciated... Thank you very much in advance.

    Kind regards.

  • #2
    Evidently you have one outlier at around 6 million (units? AUD?) but even if it's spurious I would always use log scale for income. Here are two ideas, just looking at distributions and just looking at trajectories. Much depends on the size of your complete dataset.

    I start after your helpful data example code. Comments flag community-contributed code that must be installed to work.

    Code:
    set scheme s1color
    
    label define treatment 0 control 1 treatment
    label val intreatment treatment
    label var annual "annual gross income (000 units)"
    
    * -egen- function gmean()- via SSC install egenmore
    * stripplot via SSC install stripplot
    stripplot annual , over(intreat) by(wave, note("") compact) ysc(log) cumul cumprob box centre vertical ///
    yla(10000 "10" 20000 "20" 50000 "50" 100000 "100", ang(h)) aspect(0.5) refline reflevel(gmean) ///
    xtitle("") subtitle(, fcolor(blue*0.1)) name(G1, replace)
    
    xtset xwaveid wave
    
    line annual wave, by(intreat, note("")) c(L) ysc(log) yla(10000 "10" 20000 "20" 50000 "50" 100000 "100", ang(h)) ///
    name(G2, replace) xla(2006/2014, format(%tyY)) subtitle(, fcolor(blue*0.1))
    Click image for larger version

Name:	kucuk1.png
Views:	1
Size:	53.4 KB
ID:	1479345


    Click image for larger version

Name:	kucuk2.png
Views:	1
Size:	55.0 KB
ID:	1479347
    Last edited by Nick Cox; 17 Jan 2019, 13:39.

    Comment


    • #3
      Well, in the example data you posted, your command works just fine and produces a nice looking graph.But in the data you posted, the annual gross incomes are all <= 156,000. The graph you posted show that somewhere in your data you have a value (perhaps more than one) of annualgrossincome that exceeds 6,000,000. Because that point has to be included in the graph, this compresses the y-axis scaling so that all of the other incomes are bunched up at the bottom--hence the result you are seeing.

      First, I would check to see if that 6,000,000 annualgrossincome value is an error in the data--it's an extreme outlier and unless the data is denominated in a rather low-value currency it would characterize very few people in most populations. If it's not an error then you have some options to consider:

      1. Redefine the target population of your study in some reasonable way that excludes this (these) observation(s). Note that simply saying that annual gross income exceeding some amount is an exclusion is not a reasonable way to do this because apparently annual gross income is your outcome variable. It is never reasonable to restrict study samples based on the outcome variable. But you might be able to effectively exclude the offending observation(s) by restricting the allowable occupations or something like that.

      2. If there is no reasonable way to restrict the study in a way that leads to exclusion of the offending observation(s), consider using a log-scale on the vertical axis. That should improve the visual appearance.

      Added: Crossed with #2. Nick makes the same points that I do. In addition, he contributes code that can produce much better graphs from this data--this is an area in which he excels, and where my skills are lackluster.

      Comment


      • #4
        Hi Nick and Clyde,

        Thank you very much for your answers. Indeed there is a massive outlier, which is due to an erroneous calculation. On the search for it I found some more outliers. Thank you very much for your guidance. Once, I have dealt with the mistakes in the data I will apply the code provided by Nick and try to get some results. Thank you very much for your answers.

        Comment


        • #5
          By the way, there is a risk of spurious connections with line in #2

          Suppose one panel had values for 2006 to 2009 only and the next panel had values for 2010 to 2014. Those two panel trajectories would be joined, as line just sees that the wave increases in that section of the data.

          This code should be safe instead:

          Code:
          linkplot annual wave, link(xwaveid) ms(none) by(intreat, note("")) c(L) ysc(log) yla(10000 "10" 20000 "20" 50000 "50" 100000 "100", ang(h)) ///
          name(G2, replace) xla(2006/2014, format(%tyY)) subtitle(, fcolor(blue*0.1))
          where linkplot is from SSC.
          Last edited by Nick Cox; 18 Jan 2019, 06:26.

          Comment


          • #6
            Hi Nick,
            Thank you for your code. I have been trying to implement it but it is too sophisticated for me and I cannot run without errors. I was reluctant on writing again since your answer was so nicely written, but I have reached the point where I said "better to ask again then go crazy". Isnt there a simple way on how i can just put two lines in one graph where one line is for the control group and the other one is for the treatment group? Actually whenever I have something not being in the control group it means it is in the treatment group, so I can say

            incontrolgroup==1 & incontrolgroup!=1 are the two conditions for which i want the two lines. I am sorry for asking over and over again for the same thing and not understanding it.

            Comment


            • #7
              In turn I don't understand what you seek. If you have more than one individual in each group, then you can't show each group as just one line.

              Further, sorry, but without seeing any code or error reports I really can't say what you're doing wrong. See FAQ Advice #12 for how to show what you tried.

              What you could try is different colours for your groups, but a glance at the graph in #2 should underline that it's hard to avoid a tangled mess without something like separate panels.
              Last edited by Nick Cox; 29 Jan 2019, 00:23.

              Comment


              • #8
                I can try to explain myself a bit more, I have an unbalanced longitudinal dataset of individuals (identifies through "xwaveid") over a 9 year time period (identified through "wave") and their total gross annual income. Some individuals belong into the treatment group and some into the control group for the entire 9 year period. Since it is unbalanced, not every xwaveid was interviewed in every wave.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input long xwaveid float(wave totalgrossregularincome) str10 hhidate float intreatmentgroup
                100018 2006  10517 "10/09/2006" 0
                100018 2007   1499 "16/09/2007" 0
                100018 2008   9944 "13/09/2008" 0
                100018 2009   7904 "05/09/2009" 0
                100018 2010  15486 "03/10/2010" 0
                100018 2011  11400 "21/08/2011" 0
                100018 2012  24110 "16/09/2012" 0
                100018 2013  20100 "21/09/2013" 0
                100018 2014  20300 "13/09/2014" 0
                100019 2006 121017 "10/09/2006" 0
                100019 2007 111569 "16/09/2007" 0
                100019 2008 120994 "13/09/2008" 0
                100019 2009 111754 "05/09/2009" 0
                100019 2010 130496 "03/10/2010" 0
                100019 2011 160020 "21/08/2011" 0
                100019 2012 140100 "16/09/2012" 0
                100019 2013 150200 "21/09/2013" 0
                100019 2014 162100 "13/09/2014" 0
                100020 2006  20025 "10/09/2006" 0
                100020 2007   8000 "16/09/2007" 0
                100020 2008  30200 "13/09/2008" 0
                100020 2009  51300 "05/09/2009" 0
                100020 2010  54152 "03/10/2010" 0
                100020 2011  61850 "21/08/2011" 0
                100020 2012  84500 "16/09/2012" 0
                100020 2013  84250 "21/09/2013" 0
                100020 2014  75000 "13/09/2014" 0
                100021 2006      0 ""           0
                100021 2007      0 "16/09/2007" 0
                100021 2008   1500 "13/09/2008" 0
                100021 2009      0 "05/09/2009" 0
                100021 2010      0 "03/10/2010" 0
                100021 2011   1000 "21/08/2011" 0
                100021 2012   6000 "16/09/2012" 0
                100021 2013      0 "21/09/2013" 0
                100021 2014    730 "13/09/2014" 0
                100099 2006 159000 "16/09/2006" 1
                100099 2007  87300 "14/10/2007" 1
                100099 2008  85455 "24/09/2008" 1
                100099 2009 105797 "24/08/2009" 1
                100099 2010 101684 "01/09/2010" 1
                100099 2011  95300 "23/08/2011" 1
                100099 2012 125700 "19/09/2012" 1
                100100 2006  62000 "16/09/2006" 1
                100100 2007  73187 "27/09/2007" 1
                100100 2008  83452 "24/09/2008" 1
                100100 2009  92497 "04/09/2009" 1
                100100 2010  90526 "24/09/2010" 1
                100100 2011  90713 "20/09/2011" 1
                100100 2012 103787 "07/09/2012" 1
                100100 2013 173951 "05/09/2013" 1
                100100 2014 105287 "15/09/2014" 1
                100107 2006  41050 "17/09/2006" 0
                100107 2007  54700 "20/09/2007" 0
                100107 2008  47940 "17/09/2008" 0
                100107 2009  50995 "06/09/2009" 0
                100107 2010  51130 "08/09/2010" 0
                100107 2011  50700 "16/09/2011" 0
                100107 2012  53782 "26/09/2012" 0
                100107 2013  55428 "19/09/2013" 0
                100107 2014  53100 "25/09/2014" 0
                100138 2006  51655 "25/09/2006" 1
                100138 2007  47790 "07/09/2007" 1
                100138 2008  34256 "19/09/2008" 1
                100138 2009  52571 "07/09/2009" 1
                100138 2010  51850 "03/09/2010" 1
                100138 2011  61036 "11/09/2011" 1
                100138 2012  50994 "26/08/2012" 1
                100138 2013  59500 "25/08/2013" 1
                100138 2014  59000 "24/08/2014" 1
                100140 2006  10000 "20/09/2006" 0
                100140 2007  22000 "18/09/2007" 0
                100140 2008  30000 "09/09/2008" 0
                100140 2009  30900 "02/09/2009" 0
                100140 2010  29922 "08/09/2010" 0
                100140 2011  42000 "10/09/2011" 0
                100140 2012  31631 "13/09/2012" 0
                100140 2013  47000 "15/09/2013" 0
                100140 2014  45000 "17/09/2014" 0
                100164 2006  65100 "19/09/2006" 1
                100164 2007  64250 "15/09/2007" 1
                100164 2008  63250 "09/09/2008" 1
                100164 2009  82300 "16/09/2009" 1
                100164 2010  79741 "07/09/2010" 1
                100164 2011  85500 "05/09/2011" 1
                100164 2012 133150 "13/08/2012" 1
                100164 2013 140000 "13/08/2013" 1
                100164 2014 162704 "12/08/2014" 1
                100165 2006  51000 "19/09/2006" 1
                100165 2007  55000 "15/09/2007" 1
                100165 2008  56750 "09/09/2008" 1
                100165 2009  63900 "16/09/2009" 1
                100165 2010  73000 "07/09/2010" 1
                100165 2011  67000 "05/09/2011" 1
                100165 2012  72000 "13/08/2012" 1
                100165 2013  76500 "13/08/2013" 1
                100165 2014  55683 "12/08/2014" 1
                100166 2006      0 "19/09/2006" 1
                100166 2007   2000 "15/09/2007" 1
                100166 2008   1500 "09/09/2008" 1
                end
                format %ty wave
                label values intreatmentgroup treatment
                label def treatment 0 "control", modify
                label def treatment 1 "treatment", modify
                What I want to plot is one line for the control group (intreatmentgroup==0) and one line for the treatment group (intreatmentgroup==1) over the time period (wave==2006 until wave==2014) of the mean of the total annual gross income with the condition that hhidate!="", i.e. they have to have been interviewed at that wave. Because the dataset has assigned them zero income even if they have not been interviewed, however I do not want those zero values to be taken into account.

                I have tried so many different combinations, but I think I am making a mistake.

                twoway (line mean(totalgrossannualincome) if intreatmentgroup==1 & hhidate!="" wave, sort) (line mean(totalgrossannualincome) if intreatmentgroup==0 & hhidate!="" wave, sort)

                This is what I want to do, but I cannot translate my intentions into the Stata language...
                They can be in different colours or also differentiated through labels. However, I have to make a few such graphs which is why I am trying to understand how I can make such a graph. Sorry for taking so long to figure it out. I am obviously a beginner..

                Comment


                • #9
                  That syntax is some way from legal, but what you want seems straightforward.

                  Code:
                  egen wanted = mean(totalgrossregularincome) if hhidate != "", by(wave intreatmentgroup) 
                  separate wanted, by(intreatmentgroup) veryshortlabel 
                  line wanted? wave, sort lc(orange blue)

                  Comment


                  • #10
                    OMG, thank you so much. That was exactly what I wanted. Sorry for asking over and over again for the same thing. And this code I can understand, so I can also apply it to my other outcome variables. Thank you very much.

                    Comment


                    • #11
                      Good. I'd recommend geometric means for incomes, except that you have zeros. So, consider also looking at medians.

                      Comment


                      • #12
                        Code:
                         
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input long xwaveid float(wave totalgrossregularincome) str10 hhidate float intreatmentgroup
                        100018 2006  10517 "10/09/2006" 0
                        100018 2007   1499 "16/09/2007" 0
                        100018 2008   9944 "13/09/2008" 0
                        100018 2009   7904 "05/09/2009" 0
                        100018 2010  15486 "03/10/2010" 0
                        100018 2011  11400 "21/08/2011" 0
                        100018 2012  24110 "16/09/2012" 0
                        100018 2013  20100 "21/09/2013" 0
                        100018 2014  20300 "13/09/2014" 0
                        100019 2006 121017 "10/09/2006" 0
                        100019 2007 111569 "16/09/2007" 0
                        100019 2008 120994 "13/09/2008" 0
                        100019 2009 111754 "05/09/2009" 0
                        100019 2010 130496 "03/10/2010" 0
                        100019 2011 160020 "21/08/2011" 0
                        100019 2012 140100 "16/09/2012" 0
                        100019 2013 150200 "21/09/2013" 0
                        100019 2014 162100 "13/09/2014" 0
                        100020 2006  20025 "10/09/2006" 0
                        100020 2007   8000 "16/09/2007" 0
                        100020 2008  30200 "13/09/2008" 0
                        100020 2009  51300 "05/09/2009" 0
                        100020 2010  54152 "03/10/2010" 0
                        100020 2011  61850 "21/08/2011" 0
                        100020 2012  84500 "16/09/2012" 0
                        100020 2013  84250 "21/09/2013" 0
                        100020 2014  75000 "13/09/2014" 0
                        100021 2006      0 ""           0
                        100021 2007      0 "16/09/2007" 0
                        100021 2008   1500 "13/09/2008" 0
                        100021 2009      0 "05/09/2009" 0
                        100021 2010      0 "03/10/2010" 0
                        100021 2011   1000 "21/08/2011" 0
                        100021 2012   6000 "16/09/2012" 0
                        100021 2013      0 "21/09/2013" 0
                        100021 2014    730 "13/09/2014" 0
                        100099 2006 159000 "16/09/2006" 1
                        100099 2007  87300 "14/10/2007" 1
                        100099 2008  85455 "24/09/2008" 1
                        100099 2009 105797 "24/08/2009" 1
                        100099 2010 101684 "01/09/2010" 1
                        100099 2011  95300 "23/08/2011" 1
                        100099 2012 125700 "19/09/2012" 1
                        100100 2006  62000 "16/09/2006" 1
                        100100 2007  73187 "27/09/2007" 1
                        100100 2008  83452 "24/09/2008" 1
                        100100 2009  92497 "04/09/2009" 1
                        100100 2010  90526 "24/09/2010" 1
                        100100 2011  90713 "20/09/2011" 1
                        100100 2012 103787 "07/09/2012" 1
                        100100 2013 173951 "05/09/2013" 1
                        100100 2014 105287 "15/09/2014" 1
                        100107 2006  41050 "17/09/2006" 0
                        100107 2007  54700 "20/09/2007" 0
                        100107 2008  47940 "17/09/2008" 0
                        100107 2009  50995 "06/09/2009" 0
                        100107 2010  51130 "08/09/2010" 0
                        100107 2011  50700 "16/09/2011" 0
                        100107 2012  53782 "26/09/2012" 0
                        100107 2013  55428 "19/09/2013" 0
                        100107 2014  53100 "25/09/2014" 0
                        100138 2006  51655 "25/09/2006" 1
                        100138 2007  47790 "07/09/2007" 1
                        100138 2008  34256 "19/09/2008" 1
                        100138 2009  52571 "07/09/2009" 1
                        100138 2010  51850 "03/09/2010" 1
                        100138 2011  61036 "11/09/2011" 1
                        100138 2012  50994 "26/08/2012" 1
                        100138 2013  59500 "25/08/2013" 1
                        100138 2014  59000 "24/08/2014" 1
                        100140 2006  10000 "20/09/2006" 0
                        100140 2007  22000 "18/09/2007" 0
                        100140 2008  30000 "09/09/2008" 0
                        100140 2009  30900 "02/09/2009" 0
                        100140 2010  29922 "08/09/2010" 0
                        100140 2011  42000 "10/09/2011" 0
                        100140 2012  31631 "13/09/2012" 0
                        100140 2013  47000 "15/09/2013" 0
                        100140 2014  45000 "17/09/2014" 0
                        100164 2006  65100 "19/09/2006" 1
                        100164 2007  64250 "15/09/2007" 1
                        100164 2008  63250 "09/09/2008" 1
                        100164 2009  82300 "16/09/2009" 1
                        100164 2010  79741 "07/09/2010" 1
                        100164 2011  85500 "05/09/2011" 1
                        100164 2012 133150 "13/08/2012" 1
                        100164 2013 140000 "13/08/2013" 1
                        100164 2014 162704 "12/08/2014" 1
                        100165 2006  51000 "19/09/2006" 1
                        100165 2007  55000 "15/09/2007" 1
                        100165 2008  56750 "09/09/2008" 1
                        100165 2009  63900 "16/09/2009" 1
                        100165 2010  73000 "07/09/2010" 1
                        100165 2011  67000 "05/09/2011" 1
                        100165 2012  72000 "13/08/2012" 1
                        100165 2013  76500 "13/08/2013" 1
                        100165 2014  55683 "12/08/2014" 1
                        100166 2006      0 "19/09/2006" 1
                        100166 2007   2000 "15/09/2007" 1
                        100166 2008   1500 "09/09/2008" 1
                        end
                        format %ty wave
                        label values intreatmentgroup treatment
                        label def treatment 0 "control", modify
                        label def treatment 1 "treatment", modify
                        
                        egen median = median(totalgrossregularincome) if hhidate != "" , by(wave intreatmentgroup) 
                        separate median, by(intreatmentgroup) veryshortlabel 
                        
                        egen mean = mean(totalgrossregularincome) if hhidate != "" , by(wave intreatmentgroup) 
                        separate mean, by(intreatmentgroup) veryshortlabel 
                        
                        twoway connected mean? median? wave, sort lc(orange blue orange blue) ms(O 0 T T) mc(orange blue orange blue) /// 
                        legend(order(- "means" 1 2 - "medians" 3 4) col(1) ring(0) pos(11))
                        Click image for larger version

Name:	merve.png
Views:	1
Size:	35.8 KB
ID:	1481156

                        Comment


                        • #13
                          Thank you so much. This even makes it so much easier for me since I also have to plot the median as well. Now I know how I can insert two different lines (of different measures) into on graph. Thank you Mr. Cox. Very much..

                          Comment

                          Working...
                          X