Simple Python Graphing Question

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#1

Simple Python Graphing Question

28 Mar 2023, 09:37

Hey everyone. I'm working on a project with my friend, and mentor, and he's a Python expert. Thus, I wanna use this opportunity to learn more Python. I'm trying, more precisely, to replicate graphs that I have already done in Stata. My question is this: How do I get Python to know that I want to plot the reference line at the year 1989, instead of the index where 1989 appears at? Consider the following code:

Code:

cls clear * python: import pandas as pd import matplotlib.pyplot as plt hfont = {'fontname':'Times New Roman'} df = pd.read_csv('https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv', sep=';', parse_dates=['Year'], index_col='Year') #Imports our data df = df.sort_values(by=['State']) #For some reason it wasn't sorted- changing that df_Cali = df[df['State'] == 'California'] #For now... we only want California. df_Cali.plot(y='PacksPerCapita', color=[(.17, .27, .57)], legend=None) # Our basic plot plt.title('Tobacco Trends', fontsize=14, **hfont) plt.xlabel('Year', fontsize=14, **hfont) plt.xticks(fontsize=14, rotation=45, **hfont) plt.ylabel('Cigarette Sales Per Capita', fontsize=14, **hfont) plt.vlines(x=1989, ymin=40, ymax=140, color='red', label="Proposition 99") # !! The problem of interest. plt.grid() plt.show() end

The graph may not be constructed quite as "Pythonic", but it does what I want. But, the intervention happened in 1989! Not 1970s-ish. Presumably this is because Python recognizes 1989 as 1970-something on the index, instead of as the variable "Year" that I want for it to be at. How might I get the reference line to be at the correct position, at the year 1989? Perhaps Leonardo Guizzetti or Daniel Schaefer might have thoughts?

Oh and also, if you have any edits you'd suggest to the code itself, like how to make it cleaner/more efficient, I'd appreciate it! I look forward to the day that I'll be fluently bilingual in both Stata and Python.
Tags: None
Daniel Schaefer

Join Date: Mar 2020

Posts: 813
#2

28 Mar 2023, 13:41

Hey Jered,

Here is my solution. I googled around a bit (wasn't going to dive too deep into the documentation) and I couldn't find an obvious way to place the vertical line based on the date label. I know the pandas basics, but I don't know the weeds of pandas all that well, so there may be a cleaner solution that I am unaware of. Additionally, I also have no idea why you got a vertical line within your plot at all, since it seems like matplotlib.pyplot.plt is expecting an index for x, and 1989 is clearly an index outside of the bounds of the domain. Just sorting your data by the date gives a vertical that I believe is out of bounds of your plot and therefore not rendered.

Below, I make a few changes to your code. First, I sort the data by year as well as state, so that the index corresponding to the year is more meaningful.

Code:

df = df.sort_values(by=['State', 'Year'])

Next, I want to find the row index corresponding to the year 1989. So I dereference the date index object from the dataframe (red); convert the index object to a list of datetime objects (green); loop through each date time object, extract the year, and put it in a list (orange); then I find the index in the list corresponding to 1989 (purple). This kind of syntax is referred to as a list comprehension, and can be useful for writing ideomatic and syntactically minimal for loops. Technically, we could save a little bit of processor time by writing this in a less compact way, but almost certainly not enough to matter from a practical standpoint.

Code:

year = [datetime.year for datetime in list(df_Cali.index)].index(1989)

Finally, I just plug the year index in for the x parameter to the vlines method.

Code:

plt.vlines(x=year, ymin=40, ymax=140, color='red', label="Proposition 99")

The full script is below.

Code:

import pandas as pd import matplotlib.pyplot as plt # Constants hfont = {'fontname':'Times New Roman'} #Imports our data df = pd.read_csv('https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv', sep=';', parse_dates=['Year'], index_col='Year') df = df.sort_values(by=['State', 'Year']) # For some reason it wasn't sorted- changing that df_Cali = df[df['State'] == 'California'] # For now... we only want California. # Our basic plot df_Cali.plot(y='PacksPerCapita', color=[(.17, .27, .57)], legend=None) plt.title('Tobacco Trends', fontsize=14, **hfont) plt.xlabel('Year', fontsize=14, **hfont) plt.xticks(fontsize=14, rotation=45, **hfont) plt.ylabel('Cigarette Sales Per Capita', fontsize=14, **hfont) year = [datetime.year for datetime in list(df_Cali.index)].index(1989) plt.vlines(x=year, ymin=40, ymax=140, color='red', label="Proposition 99") # !! The problem of interest. plt.grid() plt.show()
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

28 Mar 2023, 14:09

Yep we essentially got the same solution! I did it like this

Code:

cls
clear *
python:

import pandas as pd

import matplotlib.pyplot as plt

hfont = {'fontname':'Times New Roman'}

df = pd.read_csv('https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv', 
sep=';', parse_dates=['Year'], index_col='Year')


df = df.sort_values(by=['State'])


df_Cali = df[df['State'] == 'California']

df_Cali = df_Cali.sort_values(by=['Year'])


df_Cali.plot(y='PacksPerCapita', color=[(.17, .27, .57)], legend=None)


plt.title('Tobacco Trends', fontsize=14, **hfont)


plt.xlabel('Year', fontsize=14, **hfont)
plt.xticks(fontsize=14, rotation=45, **hfont)


plt.ylabel('Cigarette Sales Per Capita', fontsize=14, **hfont)

x_position = df_Cali.index.searchsorted('1989-01-01')

plt.vlines(x= x_position, ymin=40, ymax=140, color='red', label="Proposition 99")

plt.grid(True)

plt.show()

end

Comment

Daniel Schaefer

Join Date: Mar 2020

Posts: 813
#4

28 Mar 2023, 14:12

Nice! I prefer your solution actually. Better to use the built in methods like that. I'm just feeling a little lazy and didn't want to read the docs!
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2392
#5

28 Mar 2023, 16:53

I have nothing to add here as I learned a little Python some years ago but now have no professional use for if, so it’s forgotten. Seems like you have found your solution though.
Comment

Announcement

Simple Python Graphing Question

Comment

Comment

Comment

Comment