Assignment Answers
1) The task at hand was to demonstrate a basic understanding of the iconic trio of Data Science Python libraries. I first utilized Numpy, a library used for advanced mathematical operations, to build two sets of 25 random numbers between 1 and 100. I then used Pandas to organize these datasets into what's called a Data Frame, which is Pandas-speak for an table that you might see in an spreadsheet program like Excel or Google Sheets. Using Numpy's absolute value function I was able to calculate the distance between the values in each of the datasets and assign it to a third column. Finally, I used Matplotlib to visualize all of the data. The X and Y axes are the original random values, while the size of the plots represent the squared and halved values of our absolute distance value in the third column.
2) The most challenging part of this assignment was trying to find a way to incorporate a third dataset into a 2D scatterplot. Matplotlib's website provides many examples, and I found what I was looking for there.
3) The most interesting thing I learned were the additional parameters for the scatterplot. There are many more than what I used, but using anymore seemed like overkill for this assignment.
4) I'm interested to see more about the colormap attribute for scatterplots. It seems like it could be a very unique way to visualize certain sets of data.