Project At A Glance

Objective: Iteratively forecast hour-wise temperature for a region over a fairly large time-period (8 years in this case).

Data: Time-Series Weather Dataset at the Max Planck Institute in Jena, Germany. [Download]

Implementation: Time-Series Forecasting, Seqeuntial Long Short-Term Memory (LSTM)

Results:

Clear trends in data showing changes in the climate across the time of the year.
DataFrame with variance between Actual Values and Predicted Values for the test and validation sets.
Visualizations to judge model's performance.

Deployment: View this project on GitHub.

Dependencies

import os
import numpy as np
import pandas as pd
import tensorflow as tf

Dataset Initialization

Loading

zip_path = tf.keras.utils.get_file(
    origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip',
    fname='jena_climate_2009_2016.csv.zip',
    extract=True)

csv_path, _ = os.path.splitext(zip_path)

df = pd.read_csv(csv_path)

df

Hour-Wise Slicing

df = df[5::6]
df

DateTime Indexing

df.index = pd.to_datetime(df['Date Time'], format = '%d.%m.%Y %H:%M:%S')
df[:6]

Temperature Plot (degC)

df1 = df['T (degC)']

df1.plot()

<AxesSubplot:xlabel='Date Time'>

Time-Series Window

def create_dataset(df, window_size=5):
  df_as_np = df.to_numpy()
  X = []
  y = []
  for i in range(len(df_as_np)-window_size):
    row = [[a] for a in df_as_np[i:i+window_size]]
    X.append(row)
    label = df_as_np[i+window_size]
    y.append(label)
  return np.array(X), np.array(y)

window_size = 5
X, y = create_dataset(df1, window_size)
X.shape, y.shape

((70086, 5, 1), (70086,))

Train-Test Split

X_train, y_train = X[:60000], y[:60000]
X_val, y_val = X[60000:65000], y[60000:65000]
X_test, y_test = X[65000:], y[65000:]
X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape, y_test.shape

((60000, 5, 1), (60000,), (5000, 5, 1), (5000,), (5086, 5, 1), (5086,))

Model Setup, Layers and Callbacks

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.losses import MeanSquaredError
from tensorflow.keras.metrics import RootMeanSquaredError
from tensorflow.keras.optimizers import Adam

model = Sequential()
model.add(InputLayer((5, 1)))
model.add(LSTM(64))
model.add(Dense(8, 'relu'))
model.add(Dense(1, 'linear'))

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm (LSTM)                 (None, 64)                16896     
                                                                 
 dense (Dense)               (None, 8)                 520       
                                                                 
 dense_1 (Dense)             (None, 1)                 9         
                                                                 
=================================================================
Total params: 17,425
Trainable params: 17,425
Non-trainable params: 0
_________________________________________________________________

cp1 = ModelCheckpoint('model/', save_best_only=True)
model.compile(loss=MeanSquaredError(), optimizer=Adam(learning_rate=0.0001), metrics=[RootMeanSquaredError()])

model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=12, callbacks=[cp1])

Epoch 1/12
1875/1875 [==============================] - ETA: 0s - loss: 96.9206 - root_mean_squared_error: 9.8448

WARNING:absl:Found untraced functions such as lstm_cell_layer_call_fn, lstm_cell_layer_call_and_return_conditional_losses while saving (showing 2 of 2). These functions will not be directly callable after loading.

INFO:tensorflow:Assets written to: model\assets

INFO:tensorflow:Assets written to: model\assets
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x00000253ED78C0D0> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.

1875/1875 [==============================] - 12s 5ms/step - loss: 96.9206 - root_mean_squared_error: 9.8448 - val_loss: 7.7739 - val_root_mean_squared_error: 2.7882
Epoch 2/12
1869/1875 [============================>.] - ETA: 0s - loss: 12.2745 - root_mean_squared_error: 3.5035

WARNING:absl:Found untraced functions such as lstm_cell_layer_call_fn, lstm_cell_layer_call_and_return_conditional_losses while saving (showing 2 of 2). These functions will not be directly callable after loading.

INFO:tensorflow:Assets written to: model\assets

INFO:tensorflow:Assets written to: model\assets
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x00000253ED78C0D0> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.

1875/1875 [==============================] - 9s 5ms/step - loss: 12.2448 - root_mean_squared_error: 3.4993 - val_loss: 0.9448 - val_root_mean_squared_error: 0.9720
Epoch 3/12
1864/1875 [============================>.] - ETA: 0s - loss: 2.3387 - root_mean_squared_error: 1.5293

WARNING:absl:Found untraced functions such as lstm_cell_layer_call_fn, lstm_cell_layer_call_and_return_conditional_losses while saving (showing 2 of 2). These functions will not be directly callable after loading.

Predictions and Variance

from tensorflow.keras.models import load_model
model = load_model('model')

Training

train_predictions = model.predict(X_train).flatten()
train_results = pd.DataFrame(data={'Train Predictions':train_predictions, 'Actuals':y_train})
train_results

Test

test_predictions = model.predict(X_test).flatten()
test_results = pd.DataFrame(data={'Test Predictions':test_predictions, 'Actuals':y_test})
test_results

Validation

val_predictions = model.predict(X_val).flatten()
val_results = pd.DataFrame(data={'Val Predictions':val_predictions, 'Actuals':y_val})
val_results

Results and Visualization

Training

import matplotlib.pyplot as plt
plt.plot(train_results['Train Predictions'][50:100])
plt.plot(train_results['Actuals'][50:100])

[<matplotlib.lines.Line2D at 0x253842f9ca0>]

Test

plt.plot(test_results['Test Predictions'][:100])
plt.plot(test_results['Actuals'][:100])

[<matplotlib.lines.Line2D at 0x253eda4e5b0>]

Validation

plt.plot(val_results['Val Predictions'][:100])
plt.plot(val_results['Actuals'][:100])

[<matplotlib.lines.Line2D at 0x2538435bd00>]

	Date Time	p (mbar)	T (degC)	Tpot (K)	Tdew (degC)	rh (%)	VPmax (mbar)	VPact (mbar)	VPdef (mbar)	sh (g/kg)	H2OC (mmol/mol)	rho (g/m**3)	wv (m/s)	max. wv (m/s)	wd (deg)
0	01.01.2009 00:10:00	996.52	-8.02	265.40	-8.90	93.30	3.33	3.11	0.22	1.94	3.12	1307.75	1.03	1.75	152.3
1	01.01.2009 00:20:00	996.57	-8.41	265.01	-9.28	93.40	3.23	3.02	0.21	1.89	3.03	1309.80	0.72	1.50	136.1
2	01.01.2009 00:30:00	996.53	-8.51	264.91	-9.31	93.90	3.21	3.01	0.20	1.88	3.02	1310.24	0.19	0.63	171.6
3	01.01.2009 00:40:00	996.51	-8.31	265.12	-9.07	94.20	3.26	3.07	0.19	1.92	3.08	1309.19	0.34	0.50	198.0
4	01.01.2009 00:50:00	996.51	-8.27	265.15	-9.04	94.10	3.27	3.08	0.19	1.92	3.09	1309.00	0.32	0.63	214.3
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
420546	31.12.2016 23:20:00	1000.07	-4.05	269.10	-8.13	73.10	4.52	3.30	1.22	2.06	3.30	1292.98	0.67	1.52	240.0
420547	31.12.2016 23:30:00	999.93	-3.35	269.81	-8.06	69.71	4.77	3.32	1.44	2.07	3.32	1289.44	1.14	1.92	234.3
420548	31.12.2016 23:40:00	999.82	-3.16	270.01	-8.21	67.91	4.84	3.28	1.55	2.05	3.28	1288.39	1.08	2.00	215.2
420549	31.12.2016 23:50:00	999.81	-4.23	268.94	-8.53	71.80	4.46	3.20	1.26	1.99	3.20	1293.56	1.49	2.16	225.8
420550	01.01.2017 00:00:00	999.82	-4.82	268.36	-8.42	75.70	4.27	3.23	1.04	2.01	3.23	1296.38	1.23	1.96	184.9

	Date Time	p (mbar)	T (degC)	Tpot (K)	Tdew (degC)	rh (%)	VPmax (mbar)	VPact (mbar)	VPdef (mbar)	sh (g/kg)	H2OC (mmol/mol)	rho (g/m**3)	wv (m/s)	max. wv (m/s)	wd (deg)
5	01.01.2009 01:00:00	996.50	-8.05	265.38	-8.78	94.40	3.33	3.14	0.19	1.96	3.15	1307.86	0.21	0.63	192.7
11	01.01.2009 02:00:00	996.62	-8.88	264.54	-9.77	93.20	3.12	2.90	0.21	1.81	2.91	1312.25	0.25	0.63	190.3
17	01.01.2009 03:00:00	996.84	-8.81	264.59	-9.66	93.50	3.13	2.93	0.20	1.83	2.94	1312.18	0.18	0.63	167.2
23	01.01.2009 04:00:00	996.99	-9.05	264.34	-10.02	92.60	3.07	2.85	0.23	1.78	2.85	1313.61	0.10	0.38	240.0
29	01.01.2009 05:00:00	997.46	-9.63	263.72	-10.65	92.20	2.94	2.71	0.23	1.69	2.71	1317.19	0.40	0.88	157.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
420521	31.12.2016 19:10:00	1002.18	-0.98	272.01	-5.36	72.00	5.69	4.09	1.59	2.54	4.08	1280.70	0.87	1.36	190.6
420527	31.12.2016 20:10:00	1001.40	-1.40	271.66	-6.84	66.29	5.51	3.65	1.86	2.27	3.65	1281.87	1.02	1.92	225.4
420533	31.12.2016 21:10:00	1001.19	-2.75	270.32	-6.90	72.90	4.99	3.64	1.35	2.26	3.63	1288.02	0.71	1.56	158.7
420539	31.12.2016 22:10:00	1000.65	-2.89	270.22	-7.15	72.30	4.93	3.57	1.37	2.22	3.57	1288.03	0.35	0.68	216.7
420545	31.12.2016 23:10:00	1000.11	-3.93	269.23	-8.09	72.60	4.56	3.31	1.25	2.06	3.31	1292.41	0.56	1.00	202.6

	Date Time	p (mbar)	T (degC)	Tpot (K)	Tdew (degC)	rh (%)	VPmax (mbar)	VPact (mbar)	VPdef (mbar)	sh (g/kg)	H2OC (mmol/mol)	rho (g/m**3)	wv (m/s)	max. wv (m/s)	wd (deg)
Date Time
2009-01-01 01:00:00	01.01.2009 01:00:00	996.50	-8.05	265.38	-8.78	94.4	3.33	3.14	0.19	1.96	3.15	1307.86	0.21	0.63	192.7
2009-01-01 02:00:00	01.01.2009 02:00:00	996.62	-8.88	264.54	-9.77	93.2	3.12	2.90	0.21	1.81	2.91	1312.25	0.25	0.63	190.3
2009-01-01 03:00:00	01.01.2009 03:00:00	996.84	-8.81	264.59	-9.66	93.5	3.13	2.93	0.20	1.83	2.94	1312.18	0.18	0.63	167.2
2009-01-01 04:00:00	01.01.2009 04:00:00	996.99	-9.05	264.34	-10.02	92.6	3.07	2.85	0.23	1.78	2.85	1313.61	0.10	0.38	240.0
2009-01-01 05:00:00	01.01.2009 05:00:00	997.46	-9.63	263.72	-10.65	92.2	2.94	2.71	0.23	1.69	2.71	1317.19	0.40	0.88	157.0
2009-01-01 06:00:00	01.01.2009 06:00:00	997.71	-9.67	263.66	-10.62	92.7	2.93	2.71	0.21	1.69	2.72	1317.71	0.05	0.50	146.0

	Train Predictions	Actuals
0	-9.957811	-9.67
1	-9.741530	-9.17
2	-8.840203	-8.10
3	-7.372804	-7.66
4	-7.201422	-7.04
...	...	...
59995	6.082879	6.07
59996	7.174989	9.88
59997	12.125348	13.53
59998	15.739516	15.43
59999	16.349472	15.54

	Test Predictions	Actuals
0	14.317657	13.99
1	13.151199	13.46
2	12.813556	12.93
3	12.440071	12.43
4	12.000203	12.17
...	...	...
5081	-1.141928	-0.98
5082	-1.407380	-1.40
5083	-1.605908	-2.75
5084	-3.087414	-2.89
5085	-3.146893	-3.93

	Val Predictions	Actuals
0	15.523746	14.02
1	13.245414	13.67
2	12.967385	12.27
3	11.410678	11.19
4	10.355401	10.85
...	...	...
4995	17.438112	18.27
4996	17.329201	17.85
4997	17.202988	16.65
4998	15.849868	15.85
4999	14.996454	15.09