Chapter 6: Hello columns

In this chapter we’ll begin our analysis by learning how to analyze a column from a DataFrame.

Accessing a column

We’ll begin with the prop_name column where the proposition each committee is seeking to influence is stored.

To see the contents of a column separate from the rest of the DataFrame, add the column’s name to the variable following a period.

props.prop_name

That will list the column out as a Series, just like the ones we created from scratch in chapter three.

And, just as we did in then, you can now start tacking on additional methods that will analyze the contents of the column.

In this case, the column is filled with characters. So we don’t want to calculate statistics like the median and average, as we did before.

Counting a column’s values

There’s another built-in pandas tool that will total up the frequency of values in a column. In this case that could be used to answer the question: Which proposition had the most committees?

The method is called value_counts and it’s just as easy to use as any other method. All you need to do it is add a second period after the column name and chain it on the tail end of your cell.

props.prop_name.value_counts()

Run the code and you should see the lengthy proposition names ranked by their number of committees.

Resetting a DataFrame

You may have noticed that even though the result has two columns, pandas did not return a clean-looking table in the same way as head did for our DataFrame.

That’s because our column, a Series, acts a little bit different than the DataFrame created by read_csv.

In most instances, if you have an ugly Series generated by a method like value_counts and you want to convert it into a pretty DataFrame you can do so by tacking on the reset_index method onto the tail end.

props.prop_name.value_counts().reset_index()

Why do Series and DataFrames behave differently? Why does reset_index have such a weird name?

Like so much in computer programming, the answer is simply “because the people who created the library said so.”

That’s not worth stressing about in this case, but it’s important to learn that all open-source programming tools have their quirks. Over time you’ll learn pandas has more than a few.

As a beginner, you should just figure out the ones you need and roll with it. As you get more advanced, if there’s something about the system you think could be improved you should consider contributing to the Python code that runs pandas.