Making Explanatory Visualizations Assignment

ASSIGNMENT

1) Replicate the lesson code. I recommend that you do not copy-paste.

Get caught up to where we got our example in class and then try and take things further. How close to "pixel perfect" can you make the lecture graph?

Once you have something that you're proud of, share your graph in the cohort channel and move on to the second exercise.

2) Reproduce another example from FiveThityEight's shared data repository.

WARNING: There are a lot of very custom graphs and tables at the above link. I highly recommend not trying to reproduce any that look like a table of values or something really different from the graph types that we are already familiar with. Search through the posts until you find a graph type that you are more or less familiar with: histogram, bar chart, stacked bar chart, line chart, seaborn relplot, etc. Recreating some of the graphics that 538 uses would be a lot easier in Adobe photoshop/illustrator than with matplotlib.

  • If you put in some time to find a graph that looks "easy" to replicate you'll probably find that it's not as easy as you thought.

  • If you start with a graph that looks hard to replicate you'll probably run up against a brick wall and be disappointed with your afternoon.

1-Replicate Lesson Code

In [0]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

plt.style.use('fivethirtyeight')

fake_data = pd.Series([38, 3, 2, 1, 2, 4, 6, 5, 5, 33], index=range(1,11))

fake_data.plot.bar(color='#ed713a', width=0.9);
In [0]:
fd2 = pd.Series(
    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
     2, 2, 2, 
     3, 3, 3,
     4, 4,
     5, 5, 5,
     6, 6, 6, 6,
     7, 7, 7, 7, 7,
     8, 8, 8, 8,
     9, 9, 9, 9, 
     10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10])

fd2.value_counts().sort_index().plot.bar(color='#ed713a', width=0.9);
In [0]:
fig = plt.figure(facecolor='white', figsize=(5,4))
ax = fake_data.plot.bar(color='#ed713a', width=0.9)
ax.set(facecolor='white')
ax.patch.set_alpha(0.1)

plt.xlabel('Rating', fontweight='bold')
plt.ylabel('Percent of total votes', fontweight='bold')
plt.title("'An Inconvenient Sequal: Truth to Power' is divise", fontsize=12, loc='left', x=-0.1, y=1.1, fontweight='bold')

plt.text(x=-1.7, y=42, s='IMDb ratings fo for the film as of Aug. 29', fontsize=10)

plt.xticks(rotation=0, color='#a7a7a7')
plt.yticks(range(0, 50, 10), labels=[f'{i}' if i!=40 else f'{i}%' for i in range(0, 50, 10)], color='#a7a7a7');

2-Replicate an Example from 538

The Mayweather-McGregor Fight As Told Through Emojis

I'm going to try and recreate the first horizontal bar plot found on this article: https://fivethirtyeight.com/features/the-mayweather-mcgregor-fight-as-told-through-emojis/

In [0]:
pip install emoji
Requirement already satisfied: emoji in /usr/local/lib/python3.6/dist-packages (0.5.4)
In [0]:
pip install regex
Requirement already satisfied: regex in /usr/local/lib/python3.6/dist-packages (2019.8.19)
In [0]:
import emoji
import regex
In [0]:
mm = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/mayweather-mcgregor/tweets.csv')
mm.head()
Out[0]:
created_at emojis id link retweeted screen_name text
0 2017-08-27 00:05:34 True 901656910939770881 https://twitter.com/statuses/901656910939770881 False aaLiysr Ringe çıkmadan ateş etmeye başladı 😃#McGregor ...
1 2017-08-27 00:05:35 True 901656917281574912 https://twitter.com/statuses/901656917281574912 False zulmafrancozaf 😲😲😲😲😲 @lalylourbet2 https://t.co/ERUGHhQINE
2 2017-08-27 00:05:35 True 901656917105369088 https://twitter.com/statuses/901656917105369088 False Adriana11D 🇮🇪🇮🇪🇮🇪 💪💪#MayweathervMcgregor
3 2017-08-27 00:05:35 True 901656917747142657 https://twitter.com/statuses/901656917747142657 False Nathan_Caro_ Cest partit #MayweatherMcGregor 💪🏿
4 2017-08-27 00:05:35 True 901656916828594177 https://twitter.com/statuses/901656916828594177 False sahouraxox Low key feeling bad for ppl who payed to watch...
In [0]:
mm.tail()
Out[0]:
created_at emojis id link retweeted screen_name text
12113 2017-08-27 01:14:57 True 901674373635080193 https://twitter.com/statuses/901674373635080193 False tirivashe_md I should have become a golfer 👀 https://t.co/I...
12114 2017-08-27 01:14:58 True 901674378458304516 https://twitter.com/statuses/901674378458304516 False imjellly 😂🔫 https://t.co/VmsbbKmxRc
12115 2017-08-27 01:14:58 True 901674378093613057 https://twitter.com/statuses/901674378093613057 False lorybs_ 😂😂😂😂😂😂😂 rs yo https://t.co/UgMn2HwX9X
12116 2017-08-27 01:14:58 True 901674378500472833 https://twitter.com/statuses/901674378500472833 False ChoateNoah Money Mayweather wit da dub!!🥊💸#TMT#TKO#Maywea...
12117 2017-08-27 01:14:59 True 901674381738258432 https://twitter.com/statuses/901674381738258432 False dudette0114 Brilliant fight,🥊for a novice to go 9 rounds h...
In [0]:
def emoji_creator(text):
  emojis = {}
  for i in text:
    count = 1
    if i in emoji.UNICODE_EMOJI:
      if i in emojis.keys():
        emojis[i] += 1
      else:
        emojis.update({i : count})
  return max(emojis, key=emojis.get)
In [0]:
test_emojis = mm['text'][1]
print(emoji_creator(test_emojis))
😲
In [0]:
mm['emojis_tweeted'] = [emoji_creator(i) for i in mm['text']]
In [0]:
mm.sort_values(by='created_at', ascending=False).head()
Out[0]:
created_at emojis id link retweeted screen_name text emojis_tweeted
12117 2017-08-27 01:14:59 True 901674381738258432 https://twitter.com/statuses/901674381738258432 False dudette0114 Brilliant fight,🥊for a novice to go 9 rounds h... 🥊
12116 2017-08-27 01:14:58 True 901674378500472833 https://twitter.com/statuses/901674378500472833 False ChoateNoah Money Mayweather wit da dub!!🥊💸#TMT#TKO#Maywea... 🥊
12115 2017-08-27 01:14:58 True 901674378093613057 https://twitter.com/statuses/901674378093613057 False lorybs_ 😂😂😂😂😂😂😂 rs yo https://t.co/UgMn2HwX9X 😂
12114 2017-08-27 01:14:58 True 901674378458304516 https://twitter.com/statuses/901674378458304516 False imjellly 😂🔫 https://t.co/VmsbbKmxRc 😂
12113 2017-08-27 01:14:57 True 901674373635080193 https://twitter.com/statuses/901674373635080193 False tirivashe_md I should have become a golfer 👀 https://t.co/I... 👀
In [0]:
mm.loc[mm['text'].str.contains('boring')]
Out[0]:
created_at emojis id link retweeted screen_name text emojis_tweeted
1401 2017-08-27 00:12:03 True 901658544767868929 https://twitter.com/statuses/901658544767868929 False BROCKLESNARRRR inb4 it ends up being a boring ass fight 😂#May... 😂
2718 2017-08-27 00:19:44 True 901660479138840576 https://twitter.com/statuses/901660479138840576 False elmascor #MayweathervMcgregor super boring 💤!!! 💤
3641 2017-08-27 00:25:39 True 901661966388924416 https://twitter.com/statuses/901661966388924416 False Hitch_Atl #MayweathervMcgregor and people say baseball i... 🤔
3825 2017-08-27 00:26:56 True 901662290331631616 https://twitter.com/statuses/901662290331631616 False ydntcr111 Give us the fight that we want #MayweathervMcg... 🙄
4032 2017-08-27 00:28:15 True 901662622071668737 https://twitter.com/statuses/901662622071668737 False CedX44 Ang boring nila. Kapag boring mayweather na su... 😂
4149 2017-08-27 00:28:58 True 901662799151194112 https://twitter.com/statuses/901662799151194112 False ERZEN This fight is boring I'm more excited about th... 🙄
4628 2017-08-27 00:32:20 True 901663647943950336 https://twitter.com/statuses/901663647943950336 False CrimsonGypsies #Mayweather fights are always boring as fuck. ... 😪
4806 2017-08-27 00:33:34 True 901663960356724736 https://twitter.com/statuses/901663960356724736 False Via_loves_Azumi I hope the coming fight is not boring.. and th... 👍
5116 2017-08-27 00:35:36 True 901664470916665345 https://twitter.com/statuses/901664470916665345 False royceangeles This mess boring af 😴 gn #MayweathervMcgregor 😴
5663 2017-08-27 00:38:56 True 901665309320298496 https://twitter.com/statuses/901665309320298496 False ChiCytian Da fight is fvcking boring 😴😴😂 #MayweathervMcg... 😴
6547 2017-08-27 00:44:31 True 901666715062472704 https://twitter.com/statuses/901666715062472704 False GGpeterAtom #MayweathervMcgregor those punches are liquid ... 😠
6856 2017-08-27 00:46:39 True 901667251878850561 https://twitter.com/statuses/901667251878850561 False KingAcer33 People saying its boring seem to be UFC McGreg... 😂
6925 2017-08-27 00:47:02 True 901667349383626752 https://twitter.com/statuses/901667349383626752 False CoachFlintham #Mayweather 100% the most boring of all time. ... 😴
11706 2017-08-27 01:11:44 True 901673564339822592 https://twitter.com/statuses/901673564339822592 False kingmiccooo "Floyd runs", "Floyds boring" LOL. Shhhhhhhhhh... 🙊
In [0]:
emoji_counts = mm['emojis_tweeted'].value_counts().nlargest(10)
emoji_counts
Out[0]:
😂    2949
🥊     939
👊     468
💪     372
👏     309
🇮     290
🤔     279
😭     258
🔥     244
🤣     224
Name: emojis_tweeted, dtype: int64

Two things are wrong here: First, I can't seem to replicate well enough the value counts of most commonly used emojis. I have more or less gotten a similar list of top ten, but the methodology they used clearly is different than mine when accounting for multiple. Second, there appears to be an additional 15 minutes included in the chart on the webpage than in the data set. The subtext on the website says the data goes from 12:05 to 1:30, where the GitHub data says it only includes those tweets up to 1:15. This additional post fight discussion could have skewed the results of my emoji dataset. For example, there may have been significant discussion after the fight was over regarding the purse each fighter took home, which might have significantly raised the value_count of the 💰 emoji in the 15 minutes that wasn't included in the data set on GitHub, but included in the table on the website.

Taking a Left Turn: Now that I've gotten the emoji situation under control and discovered that the first graphic on the page is actually an HTML table, I'm going to try and replicate one of the other images, which thankfully still make use of the Emojis

In [0]:
from IPython.display import display, Image

url = 'https://fivethirtyeight.com/wp-content/uploads/2017/08/roedermehtadottle-boxemoji-21.png?w=575'
example = Image(url=url, width=600)
display(example)
In [0]:
mm['created_at'] = pd.to_datetime(mm['created_at'], infer_datetime_format=True).dt.round('S')
mm.head()
Out[0]:
created_at emojis id link retweeted screen_name text emojis_tweeted
0 2017-08-27 00:05:34 True 901656910939770881 https://twitter.com/statuses/901656910939770881 False aaLiysr Ringe çıkmadan ateş etmeye başladı 😃#McGregor ... 😃
1 2017-08-27 00:05:35 True 901656917281574912 https://twitter.com/statuses/901656917281574912 False zulmafrancozaf 😲😲😲😲😲 @lalylourbet2 https://t.co/ERUGHhQINE 😲
2 2017-08-27 00:05:35 True 901656917105369088 https://twitter.com/statuses/901656917105369088 False Adriana11D 🇮🇪🇮🇪🇮🇪 💪💪#MayweathervMcgregor 🇮
3 2017-08-27 00:05:35 True 901656917747142657 https://twitter.com/statuses/901656917747142657 False Nathan_Caro_ Cest partit #MayweatherMcGregor 💪🏿 💪
4 2017-08-27 00:05:35 True 901656916828594177 https://twitter.com/statuses/901656916828594177 False sahouraxox Low key feeling bad for ppl who payed to watch... 🤣
In [0]:
fire = mm.loc[mm['emojis_tweeted'] == '🔥'].set_index('created_at').drop(['emojis', 'id', 'link', 'retweeted', 'screen_name', 'text'], axis=1)
fire['num'] = 1
fire_series = pd.Series(fire['num'], index=fire.index)
rolling_fire = fire_series.resample('30S').sum().rolling(8).mean()
In [0]:
snooze = mm.loc[mm['emojis_tweeted'] == '😴'].drop(['emojis', 'id', 'link', 'retweeted', 'screen_name', 'text'], axis=1).set_index('created_at')
snooze['num'] = 1
snooze_series = pd.Series(snooze['num'], index=snooze.index)
rolling_snooze = snooze_series.resample('30S').sum().rolling(8).mean()
In [0]:
plt.style.use('fivethirtyeight')
plt.figure(facecolor='#f0f0f0', figsize=(10,6))

snooze_ax = rolling_snooze.plot.line(color="#b4e3e7")
roll_ax = rolling_fire.plot.line(color='#d63830')

plt.title('Much hype, some boredom', fontsize=12, loc='left', x=-0.1, y=1.1, fontweight='bold')

#still a work in progress
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-272e8de9a751> in <module>()
----> 1 plt.style.use('fivethirtyeight')
      2 plt.figure(facecolor='#f0f0f0', figsize=(10,6))
      3 
      4 snooze_ax = rolling_snooze.plot.line(color="#b4e3e7")
      5 roll_ax = rolling_fire.plot.line(color='#d63830')

NameError: name 'plt' is not defined

STRETCH OPTIONS

1) Reproduce one of the following using the matplotlib or seaborn libraries:

2) Make more charts!

Choose a chart you want to make, from Visual Vocabulary - Vega Edition.

Find the chart in an example gallery of a Python data visualization library:

Reproduce the chart. Optionally, try the "Ben Franklin Method." If you want, experiment and make changes.

Take notes. Consider sharing your work with your cohort!

In [0]:
# More Work Here

links

social