An Introduction to Jupyter Notebooks

Jupyter Notebooks are a file format (*.ipynb) that you can execute and explain your code in a step-wise format. > Jupyter Notebooks supports not only code execution in Python, but over 40 languages including R, Lua, Rust, and Julia with numerous kernels.

We can write in Markdown to write text with some level of control over your formatting. - Here’s a Link to Basic Markdown - Here’s a link to Markdown’s Extended Syntax

Topics We Will Cover - Importing different files and filetypes with pandas - Basic Statistical Analysis of tabular data with pandas and numpy - Creating Charts with python packages from the Matplotlib, Plotly, or HoloViz Ecosystem - Evaluate the potential usecases for each visualization package

EvidenceOfLearning
This is you, enjoying the learning process.

Step 1: Import pandas into your python program.

import pandas as pd

# This will import the pandas and numpy packages into your Python program.

df_json = pd.read_json('../data/food-waste-pilot/food-waste-pilot.json')
df_csv = pd.read_csv('../data/food-waste-pilot/food-waste-pilot.csv')
df_xlsx = pd.read_excel('../data/food-waste-pilot/food-waste-pilot.xlsx')

df_csv.shape

(152, 3)

df_csv.head() # Grabs the top 5 items in your Dataframe by default.

	Collection Date	Food Waste Collected	Estimated Earned Compost Created
0	2022-02-25	250.8	25
1	2022-03-02	298.8	30
2	2022-03-21	601.2	60
3	2022-03-28	857.2	86
4	2022-03-30	610.8	61

df_csv.tail() # Grabs the bottom 5 items in your Dataframe by default.

	Collection Date	Food Waste Collected	Estimated Earned Compost Created
147	2022-10-12	385.8	39
148	2022-10-28	713.6	71
149	2022-10-31	953.4	95
150	2022-12-14	694.4	69
151	2023-01-06	968.6	97

df_csv.columns

Index(['Collection Date', 'Food Waste Collected',
       'Estimated Earned Compost Created'],
      dtype='object')

df_csv.dtypes # Returns the data types of your columns.

Collection Date                      object
Food Waste Collected                float64
Estimated Earned Compost Created      int64
dtype: object

df_csv.describe()

	Food Waste Collected	Estimated Earned Compost Created
count	152.000000	152.000000
mean	526.873684	52.611842
std	197.838075	19.787631
min	0.000000	0.000000
25%	398.050000	39.750000
50%	531.500000	53.000000
75%	658.900000	66.000000
max	1065.800000	107.000000

df_csv.info() # Returns index, column names, a count of Non-Null values, and data types.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 152 entries, 0 to 151
Data columns (total 3 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Collection Date                   152 non-null    object 
 1   Food Waste Collected              152 non-null    float64
 2   Estimated Earned Compost Created  152 non-null    int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 3.7+ KB

There are multiple methods to do type conversion using pandas as well.

# Oh no, we can see that our Collection Date is not the data type that we want, we need to convert it to a date value.

df_csv['Collection Date'] = pd.to_datetime(df_csv['Collection Date'])

df_csv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 152 entries, 0 to 151
Data columns (total 3 columns):
 #   Column                            Non-Null Count  Dtype         
---  ------                            --------------  -----         
 0   Collection Date                   152 non-null    datetime64[ns]
 1   Food Waste Collected              152 non-null    float64       
 2   Estimated Earned Compost Created  152 non-null    int64         
dtypes: datetime64[ns](1), float64(1), int64(1)
memory usage: 3.7 KB

# An alternative way to do this date conversion:

df_csv['Collection Date'] = df_csv['Collection Date'].apply(pd.to_datetime)

# astype() is more generic method to convert data types

df_csv['Collection Date'] = df_csv['Collection Date'].astype('datetime64[ns]')

df_csv.dtypes

Collection Date                     datetime64[ns]
Food Waste Collected                       float64
Estimated Earned Compost Created             int64
dtype: object

# Now that we have converted our Collection Date column to a datetime data type, we can use the dt.day_name() method to create a new column that contains the day of the week.

df_csv['Day of Week'] = df_csv['Collection Date'].dt.day_name()

# What if we want to know the date that we collected the most food waste?

df_csv.loc[
    df_csv['Food Waste Collected'].idxmax(),
    ['Collection Date']
]

Collection Date    2022-08-10 00:00:00
Name: 20, dtype: object

# If you wanted to see our top 10 collection dates, you could do this:

df_csv.nlargest(10,'Food Waste Collected')

	Collection Date	Food Waste Collected	Estimated Earned Compost Created	Day of Week
20	2022-08-10	1065.8	107	Wednesday
124	2022-12-27	987.4	99	Tuesday
48	2022-09-12	977.8	98	Monday
151	2023-01-06	968.6	97	Friday
149	2022-10-31	953.4	95	Monday
3	2022-03-28	857.2	86	Monday
123	2022-11-28	844.4	84	Monday
57	2022-12-05	834.4	83	Monday
91	2022-12-30	815.4	82	Friday
137	2022-07-18	807.8	81	Monday

df_csv.nsmallest(10,'Food Waste Collected')

	Collection Date	Food Waste Collected	Estimated Earned Compost Created	Day of Week
53	2022-11-11	0.0	0	Friday
95	2023-01-16	0.0	0	Monday
114	2022-09-05	0.0	0	Monday
97	2022-02-09	59.0	6	Wednesday
36	2022-02-16	102.8	10	Wednesday
63	2022-02-21	183.8	18	Monday
61	2022-02-11	197.0	20	Friday
39	2022-04-15	200.8	20	Friday
62	2022-02-18	202.8	20	Friday
29	2022-12-10	205.8	21	Saturday

df_csv.plot()

You have to make sure that `pandas` parses your dates

df_csv_parsed_dates = pd.read_csv('../data/food-waste-pilot/food-waste-pilot.csv', parse_dates=True, index_col="Collection Date")

df_csv_parsed_dates.plot()

An Introduction to Jupyter Notebooks

You have to make sure that pandas parses your dates

You have to make sure that `pandas` parses your dates