Chapter 15: Hello charts¶
Python has a number of charting tools that can work hand-in-hand with
pandas. The most popular is matplotlib. It isn’t the prettiest thing in the world, but it offers straightfoward tools for exploring your data by making quick charts. And, best of all, it can display in your Jupyter Notebook.
Before we start, we’ll need to make sure
matplotlib is installed. Return to your terminal and try installing it with our buddy pip, as we installed other things before.
$ pip install matplotlib
After that completes, once again restart your notebook.
$ jupyter notebook
Now you can open your notebook and add a new cell below the imports that lets the system know you plan to make some charts and that it’s okay to surface them in the notebook.
Let’s return to where we set our proposition filter at the top and restore it our initial interest, Proposition 64.
prop = props[props.prop_name == 'PROPOSITION 064- MARIJUANA LEGALIZATION. INITIATIVE STATUTE.']
Now rerun the entire notebook, as we learned above. You will need to do this when you halt and restart your notebook on the command line. Reminder, you can do this by pulling down the
Cell menu at the top of the notebook and selecting the
Run all option.
Then scroll down to the bottom of the notebook and pick up where we last left off in the previous chapter.
If we want to chart out the top supporters of the proposition, we first need to select them from the dataset. Using the grouping and sorting tricks we learned earlier, the top 10 can returned like this:
top_supporters = support.groupby( ["contributor_firstname", "contributor_lastname"] ).amount.sum().reset_index().sort_values("amount", ascending=False).head(10)
We can then view them with a trick I bet you remember by now.
Now that matplotlib is installed, making a simple chart is as simple as stringing the
plot method onto the end of your
You can rotate the bar chart so that it is horizontal by subituting in the
The chart can be limited to the first five records by slipping in the
What are those y axis labels? Those are the row number (pandas calls them indexes) of each row. We don’t want that. We want the names. We can swap them in by saving our chart to a variable and then using another matplotlib option,
set_yticklabels to instruct the system which field to use.
chart = top_supporters.head(5).amount.plot.barh() chart.set_yticklabels(top_supporters.contributor_lastname)
Okay, but what if I want to combine the first and last name? We have the data we need in two separate columns, which we can put together simply by inventing a new field on our data frame and, just like a variable, setting it equal to a combination of the other fields.
top_supporters['contributor_fullname'] = top_supporters.contributor_firstname + " " + top_supporters.contributor_lastname
We can see the results right here.
Now using that in the chart is as simple as substituting in the
set_yticklabels method we used above.
chart = top_supporters.head(5).amount.plot.barh() chart.set_yticklabels(top_supporters.contributor_fullname)
That’s all well and good, but this chart is still pretty ugly. If you wanted to hand this data off to your graphics department, or try your hand at a simple chart yourself using something like Chartbuilder, you’d need to export this data into a spreadsheet.
Guess what? It’s this easy.