ASSIGNMENT¶
1) Replicate the lesson code. I recommend that you do not copy-paste.¶
Get caught up to where we got our example in class and then try and take things further. How close to "pixel perfect" can you make the lecture graph?
Once you have something that you're proud of, share your graph in the cohort channel and move on to the second exercise.
2) Reproduce another example from FiveThityEight's shared data repository.¶
WARNING: There are a lot of very custom graphs and tables at the above link. I highly recommend not trying to reproduce any that look like a table of values or something really different from the graph types that we are already familiar with. Search through the posts until you find a graph type that you are more or less familiar with: histogram, bar chart, stacked bar chart, line chart, seaborn relplot, etc. Recreating some of the graphics that 538 uses would be a lot easier in Adobe photoshop/illustrator than with matplotlib.
If you put in some time to find a graph that looks "easy" to replicate you'll probably find that it's not as easy as you thought.
If you start with a graph that looks hard to replicate you'll probably run up against a brick wall and be disappointed with your afternoon.
1-Replicate Lesson Code¶
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
plt.style.use('fivethirtyeight')
fake_data = pd.Series([38, 3, 2, 1, 2, 4, 6, 5, 5, 33], index=range(1,11))
fake_data.plot.bar(color='#ed713a', width=0.9);
fd2 = pd.Series(
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2,
3, 3, 3,
4, 4,
5, 5, 5,
6, 6, 6, 6,
7, 7, 7, 7, 7,
8, 8, 8, 8,
9, 9, 9, 9,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
fd2.value_counts().sort_index().plot.bar(color='#ed713a', width=0.9);
fig = plt.figure(facecolor='white', figsize=(5,4))
ax = fake_data.plot.bar(color='#ed713a', width=0.9)
ax.set(facecolor='white')
ax.patch.set_alpha(0.1)
plt.xlabel('Rating', fontweight='bold')
plt.ylabel('Percent of total votes', fontweight='bold')
plt.title("'An Inconvenient Sequal: Truth to Power' is divise", fontsize=12, loc='left', x=-0.1, y=1.1, fontweight='bold')
plt.text(x=-1.7, y=42, s='IMDb ratings fo for the film as of Aug. 29', fontsize=10)
plt.xticks(rotation=0, color='#a7a7a7')
plt.yticks(range(0, 50, 10), labels=[f'{i}' if i!=40 else f'{i}%' for i in range(0, 50, 10)], color='#a7a7a7');
2-Replicate an Example from 538¶
The Mayweather-McGregor Fight As Told Through Emojis¶
I'm going to try and recreate the first horizontal bar plot found on this article: https://fivethirtyeight.com/features/the-mayweather-mcgregor-fight-as-told-through-emojis/
pip install emoji
pip install regex
import emoji
import regex
mm = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/mayweather-mcgregor/tweets.csv')
mm.head()
mm.tail()
def emoji_creator(text):
emojis = {}
for i in text:
count = 1
if i in emoji.UNICODE_EMOJI:
if i in emojis.keys():
emojis[i] += 1
else:
emojis.update({i : count})
return max(emojis, key=emojis.get)
test_emojis = mm['text'][1]
print(emoji_creator(test_emojis))
mm['emojis_tweeted'] = [emoji_creator(i) for i in mm['text']]
mm.sort_values(by='created_at', ascending=False).head()
mm.loc[mm['text'].str.contains('boring')]
emoji_counts = mm['emojis_tweeted'].value_counts().nlargest(10)
emoji_counts
Two things are wrong here: First, I can't seem to replicate well enough the value counts of most commonly used emojis. I have more or less gotten a similar list of top ten, but the methodology they used clearly is different than mine when accounting for multiple. Second, there appears to be an additional 15 minutes included in the chart on the webpage than in the data set. The subtext on the website says the data goes from 12:05 to 1:30, where the GitHub data says it only includes those tweets up to 1:15. This additional post fight discussion could have skewed the results of my emoji dataset. For example, there may have been significant discussion after the fight was over regarding the purse each fighter took home, which might have significantly raised the value_count of the 💰 emoji in the 15 minutes that wasn't included in the data set on GitHub, but included in the table on the website.
Taking a Left Turn: Now that I've gotten the emoji situation under control and discovered that the first graphic on the page is actually an HTML table, I'm going to try and replicate one of the other images, which thankfully still make use of the Emojis¶
from IPython.display import display, Image
url = 'https://fivethirtyeight.com/wp-content/uploads/2017/08/roedermehtadottle-boxemoji-21.png?w=575'
example = Image(url=url, width=600)
display(example)
mm['created_at'] = pd.to_datetime(mm['created_at'], infer_datetime_format=True).dt.round('S')
mm.head()
fire = mm.loc[mm['emojis_tweeted'] == '🔥'].set_index('created_at').drop(['emojis', 'id', 'link', 'retweeted', 'screen_name', 'text'], axis=1)
fire['num'] = 1
fire_series = pd.Series(fire['num'], index=fire.index)
rolling_fire = fire_series.resample('30S').sum().rolling(8).mean()
snooze = mm.loc[mm['emojis_tweeted'] == '😴'].drop(['emojis', 'id', 'link', 'retweeted', 'screen_name', 'text'], axis=1).set_index('created_at')
snooze['num'] = 1
snooze_series = pd.Series(snooze['num'], index=snooze.index)
rolling_snooze = snooze_series.resample('30S').sum().rolling(8).mean()
plt.style.use('fivethirtyeight')
plt.figure(facecolor='#f0f0f0', figsize=(10,6))
snooze_ax = rolling_snooze.plot.line(color="#b4e3e7")
roll_ax = rolling_fire.plot.line(color='#d63830')
plt.title('Much hype, some boredom', fontsize=12, loc='left', x=-0.1, y=1.1, fontweight='bold')
#still a work in progress
STRETCH OPTIONS¶
1) Reproduce one of the following using the matplotlib or seaborn libraries:¶
- thanksgiving-2015
- candy-power-ranking
- or another example of your choice!
2) Make more charts!¶
Choose a chart you want to make, from Visual Vocabulary - Vega Edition.
Find the chart in an example gallery of a Python data visualization library:
Reproduce the chart. Optionally, try the "Ben Franklin Method." If you want, experiment and make changes.
Take notes. Consider sharing your work with your cohort!
# More Work Here