Problem with Manipulating data to construct a scatter plot (RDD)

Roger More

Join Date: Jul 2017

Posts: 59
#1

Problem with Manipulating data to construct a scatter plot (RDD)

23 Sep 2017, 02:42

Dear all,

I hope you are doing well. I wanted to draw a judge-time graph from case-time data. Specifically, I have the following data set where rows represent cases filed in the courts (e.g. a case 5052013 with two judges Amin Ud Din Khan and Abid Aziz Khan Sheikh forms the first two rows):

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str31 idcaseyear int(yearfiled yeardecision) str32 judgename byte StateWins int caselag byte(lawyer_number AFR criminal constitutional) int pagesjudgenum "5052013" 2013 2013 "Amin Ud Din Khan" . 0 2 1 0 1 . "5052013" 2013 2013 "Abid Aziz Sheikh" 0 0 2 1 0 1 . "162007" 2007 2009 "Syed Iftikhar Hussain Shah" . 6 3 1 0 0 . "162007" 2007 2009 "Shoaib Saeed" . 6 3 1 0 0 . "792013" 2013 2013 "Imtiaz Ahmed" . 0 2 1 0 1 . "792013" 2013 2013 "Amin Ud Din Khan" 0 0 2 1 0 1 . "3632012" 2012 2013 "Syed Iftikhar Hussain Shah" . 1 2 1 0 0 . "3672004" 2004 2007 "Abdus Sattar Asghar" . 9 3 1 0 0 . "102012" 2012 2013 "Syed Mansoor Ali Shah" . 1 4 1 0 0 . "46382011" 2011 2013 "Umar Ata Bandial" . 2 2 1 0 1 . end

I want to draw a scatter plot graph where my "StateWins" is y-axis and my "yeardecision" is x-axis where I have each point as an individual judge (judgename).

Basically, I want to compare judges just before the decision year 2010 with judges just after the threshold 2010 decisonyears.

I am not sure I can sort my data to get judge-time scatter plot?

Any one of your help in this regard will really be appreciated. Thank you.

Kind Regards,
Roger

Last edited by Roger More; 23 Sep 2017, 02:46.
Tags: None

eric_a_booth

Join Date: Apr 2014
Posts: 288

23 Sep 2017, 07:23

Thanks for providing a -dataex-. In terms of just syntax to create a scatterplot, you are looking for something like this syntax

Code:

scatter State year, mlabel (judge)

but based on the structure of your data, I bet that isn't giving you a satisfactory graph.

Here are some other ideas:

Code:

*1. a few data manipulations to make this example work better with a graph since your outcome was always zero in the example:
keep StateWins judgename yeardecision
encode judgename, g(judge)
desc


replace State = 1 in -5/l
replace State = 0 in 1/-5

**You can see how this isnt really helpful, but it's what you requested:
scatter State year, mlabel(judge)

**perhaps you want the predicted estimate of the Statewins from a model rather than the 0/1 condition?, e.g., 

expand 100
replace State = rbinomial(1, .4)
replace year = int(2007+runiform()*5)


logit State i.year#i.judge
margins year#judge
marginsplot, noci


**2.  So, perhaps a better way to think about this is sum up the # of wins by judge/year and in the pre/post-2010 period and then plot them as an bar graph or connected plot like:

bys judge year: egen numwins = total(StateWins)
scatter numwins year, mlabel(judge) //this still isnt ideal


graph hbar (mean) numwins, over(year) over(judge)

gen thresh = cond(year<2010, "<2010", ">=2010", "")
bys judge thresh: egen numwins2 = total(StateWins)
graph hbar (mean) numwins2 , over(thresh) over(judge)

tw (connected numwins year if thresh=="<2010", sort) (connected numwins2 year if thresh==">=2010") , by(judge) legend(order(1 "<2010" 2 ">=2010"))

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

Comment

Roger More

Join Date: Jul 2017

Posts: 59
#3

23 Sep 2017, 08:12

Thank you very much. I will work through these examples and see if I get what I want. Thank you again!

Cheers!
Comment
Roger More

Join Date: Jul 2017

Posts: 59
#4

23 Sep 2017, 16:30

Dear Eric,

Thank you so much for your help. I have went through your examples and learned a lot from them. I think your idea about using a model to compute predicted probability is very intuitive but with 1000 judges it would be hard to have a clear picture of data in the form of line graphs or histograms for each judge. Perhaps it would be possible for you to suggest something.

Basically, I wanted to compare StateWins of judges before and after 2010 threshold, of ONLY those judges that had the identifier AfterReformJudge = 0.
That is I want to compare the state wins of half of my judges sample (about 500) that has AfterReformJudge = = 0 for before and after 2010 decisionyear threshold. The hypothesis I am testing was that after 2010 the state wins of judges reduced or not?

Do you have any suggestions to show it in a single graph if I have many judges (in particular, I would be interested in comparing judges just before and after 2010 for a RDD kind of interpretation).

My example dataset now with the AfterReformJudge is as follows:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int yeardecision str32 judgename byte StateWins int caselag byte AfterReformJudge str31 idcaseyear 2009 "Abdus Sattar Asghar" . 2 0 "131442012" 2010 "Abdus Sattar Asghar" . 0 0 "25212014" 2013 "Abdus Sattar Asghar" . 0 0 "233842013" 2009 "Abid Aziz Sheikh" 0 1 1 "14102" 2009 "Abid Aziz Sheikh" . 3 1 "1452010" 2013 "Abid Aziz Sheikh" . 3 1 "1452010" 2013 "Abid Aziz Sheikh" . 6 1 "1652007" 2013 "Abid Aziz Sheikh" . 0 1 "1322009" end

If it would be possible for you to help me, it would be great, thank you in any case!!

Cheers,
Roger

Last edited by Roger More; 23 Sep 2017, 16:46.
Comment

Announcement

Problem with Manipulating data to construct a scatter plot (RDD)

Comment

Comment

Comment