%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
= pd.read_csv(
fb '../data/fb_stock_prices_2018.csv', index_col='date', parse_dates=True
)
The pandas.plotting
module
The pandas.plotting
module
Pandas provides some extra plotting functions for some new plot types.
About the Data
In this notebook, we will be working with Facebook’s stock price throughout 2018 (obtained using the stock_analysis
package).
Setup
Scatter matrix
Easily create scatter plots between all columns in the dataset:
from pandas.plotting import scatter_matrix
=(10, 10)) scatter_matrix(fb, figsize
array([[<Axes: xlabel='open', ylabel='open'>,
<Axes: xlabel='high', ylabel='open'>,
<Axes: xlabel='low', ylabel='open'>,
<Axes: xlabel='close', ylabel='open'>,
<Axes: xlabel='volume', ylabel='open'>],
[<Axes: xlabel='open', ylabel='high'>,
<Axes: xlabel='high', ylabel='high'>,
<Axes: xlabel='low', ylabel='high'>,
<Axes: xlabel='close', ylabel='high'>,
<Axes: xlabel='volume', ylabel='high'>],
[<Axes: xlabel='open', ylabel='low'>,
<Axes: xlabel='high', ylabel='low'>,
<Axes: xlabel='low', ylabel='low'>,
<Axes: xlabel='close', ylabel='low'>,
<Axes: xlabel='volume', ylabel='low'>],
[<Axes: xlabel='open', ylabel='close'>,
<Axes: xlabel='high', ylabel='close'>,
<Axes: xlabel='low', ylabel='close'>,
<Axes: xlabel='close', ylabel='close'>,
<Axes: xlabel='volume', ylabel='close'>],
[<Axes: xlabel='open', ylabel='volume'>,
<Axes: xlabel='high', ylabel='volume'>,
<Axes: xlabel='low', ylabel='volume'>,
<Axes: xlabel='close', ylabel='volume'>,
<Axes: xlabel='volume', ylabel='volume'>]], dtype=object)
Changing the diagonal from histograms to KDE:
=(10, 10), diagonal='kde') scatter_matrix(fb, figsize
array([[<Axes: xlabel='open', ylabel='open'>,
<Axes: xlabel='high', ylabel='open'>,
<Axes: xlabel='low', ylabel='open'>,
<Axes: xlabel='close', ylabel='open'>,
<Axes: xlabel='volume', ylabel='open'>],
[<Axes: xlabel='open', ylabel='high'>,
<Axes: xlabel='high', ylabel='high'>,
<Axes: xlabel='low', ylabel='high'>,
<Axes: xlabel='close', ylabel='high'>,
<Axes: xlabel='volume', ylabel='high'>],
[<Axes: xlabel='open', ylabel='low'>,
<Axes: xlabel='high', ylabel='low'>,
<Axes: xlabel='low', ylabel='low'>,
<Axes: xlabel='close', ylabel='low'>,
<Axes: xlabel='volume', ylabel='low'>],
[<Axes: xlabel='open', ylabel='close'>,
<Axes: xlabel='high', ylabel='close'>,
<Axes: xlabel='low', ylabel='close'>,
<Axes: xlabel='close', ylabel='close'>,
<Axes: xlabel='volume', ylabel='close'>],
[<Axes: xlabel='open', ylabel='volume'>,
<Axes: xlabel='high', ylabel='volume'>,
<Axes: xlabel='low', ylabel='volume'>,
<Axes: xlabel='close', ylabel='volume'>,
<Axes: xlabel='volume', ylabel='volume'>]], dtype=object)
Lag plot
Lag plots let us see how the variable correlates with past observations of itself. Random data has no pattern:
from pandas.plotting import lag_plot
0) # make this repeatable
np.random.seed(=200))) lag_plot(pd.Series(np.random.random(size
Data with some level of correlation to itself (autocorrelation) may have patterns. Stock prices are highly autocorrelated:
lag_plot(fb.close)
The default lag is 1, but we can alter this with the lag
parameter. Let’s look at a 5 day lag (a week of trading activity):
=5) lag_plot(fb.close, lag
Autocorrelation plots
We can use the autocorrelation plot to see if this relationship may be meaningful or is just noise. Random data will not have any significant autocorrelation (it stays within the bounds below):
from pandas.plotting import autocorrelation_plot
0) # make this repeatable
np.random.seed(=200))) autocorrelation_plot(pd.Series(np.random.random(size
Stock data, on the other hand, does have significant autocorrelation:
autocorrelation_plot(fb.close)
Bootstrap plot
This plot helps us understand the uncertainty in our summary statistics:
from pandas.plotting import bootstrap_plot
= bootstrap_plot(fb.volume, fig=plt.figure(figsize=(10, 6))) fig