Commit 488b642e authored by Roberto Ugolotti's avatar Roberto Ugolotti
Browse files

Add ML cookbook

parent 27347bb7
This repository contains configuration files or data that can be of interest to personalise some of the JEODPP services
\ No newline at end of file
# Content
This repository contains files or data that can be of interest to run and personalise JEODPP services.
## jeodpp-text-terminal-service
This folder contains configurations file for `screen`.
## ml-cookbook
This folder contains some examples of using deep learning libraries inside Jupyter Notebooks.
\ No newline at end of file
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Jupyter Notebooks
.ipynb_checkpoints
# Image Classification
This folder contains notebooks that perform image classification using PyTorch, Keras, and MXNet.
## How to run the examples
To run the script in [JeoLab](https://jeodpp.jrc.ec.europa.eu/jhub/) copy a notebook and the zip file containing data (`images.zip`) in your folder and launch it with an environment in which the library needed is installed (see [available environments](https://jeodpp.jrc.ec.europa.eu/jhub/).
It will automatically extract the data contained in `images.zip` and train a simple Convolutional Neural Network to distinguish between satellite images of forestal and industrial areas.
Each notebook contains some references to the documentation of the package used.
## How to use your own data
In case you want to use this script as a base to train your own dataset, the images must be divided into different folders according to their classes
```
main_directory/
...class_a/
......a_image_1.jpg
......a_image_2.jpg
...class_b/
......b_image_1.jpg
......b_image_2.jpg
```
%% Cell type:markdown id:d4beb900-5ae3-43b0-8517-c89876a895fa tags:
# Keras Example
This notebook shows a simple classification example using Keras
Code has been written starting from https://keras.io/examples/vision/image_classification_from_scratch/
Data is a selection of http://madm.dfki.de/files/sentinel/EuroSAT.zip
%% Cell type:code id:f6284ec5-d53f-44d5-8b93-5c2229fa0496 tags:
``` python
# Example taken from https://keras.io/examples/vision/image_classification_from_scratch/
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
```
%% Cell type:code id:1863a3b6-f1f1-4172-b319-dacf28a91538 tags:
``` python
# Check that Tensorflow will use GPU
from tensorflow.python.client import device_lib
assert 'GPU' in str(device_lib.list_local_devices())
```
%% Cell type:markdown id:ec21d233-06b1-4d75-b4eb-04ff183a897c tags:
Reads data from disk. Data must be structured in this way:
```
main_directory/
...class_a/
......a_image_1.jpg
......a_image_2.jpg
...class_b/
......b_image_1.jpg
......b_image_2.jpg
```
%% Cell type:code id:f1b91f72-e6d8-4037-84c4-4ed824d9be1d tags:
``` python
!unzip -qo images
!rm -rf data/.ipynb_checkpoints/ # Otherwise Keras will try to read images from this directory and get the wrong number of classes
image_size = 64, 64
batch_size = 32
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"images/",
validation_split=0.2,
subset="training",
seed=1234,
image_size=image_size,
batch_size=batch_size,
label_mode='categorical',
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
"images/",
validation_split=0.2,
subset="validation",
seed=1234,
image_size=image_size,
batch_size=batch_size,
label_mode='categorical',
)
```
%% Cell type:markdown id:298ecb95-e700-43d2-a901-877d8417c827 tags:
Plot some images
%% Cell type:code id:1ea29621-1865-4127-b384-ad043238d220 tags:
``` python
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(np.argmax(labels[i]))
plt.axis("off")
```
%% Cell type:markdown id:317c35b1-a4de-447d-9e63-3959a6a6b97f tags:
Augment training data with flipping and rotation
%% Cell type:code id:f75972b2-cdd2-4fe2-9e90-cc5453a8b564 tags:
``` python
data_augmentation = keras.Sequential(
[
layers.experimental.preprocessing.RandomFlip("horizontal"),
layers.experimental.preprocessing.RandomRotation(0.1),
]
)
```
%% Cell type:markdown id:d6841041-a27c-44fe-8e07-60e842adbb9c tags:
Plot some augmented data
%% Cell type:code id:bb6ac2af-4194-400d-853c-f30f36bbcf8e tags:
``` python
plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
for i in range(9):
augmented_images = data_augmentation(images)
ax = plt.subplot(3, 3, i + 1)
plt.imshow(augmented_images[0].numpy().astype("uint8"))
plt.axis("off")
```
%% Cell type:markdown id:c43e5b56-f98f-4e46-a2db-2aaa5aa4dc57 tags:
Create a deep network for classification. It contains the level of data_augmentation created before, a rescaling layer, and five convolutional layers, followed by the softmax layer used for classification.
%% Cell type:code id:dd137900-44bd-4536-bb3b-ecd0602fc9c3 tags:
``` python
def make_model(input_shape, num_classes):
inputs = keras.Input(shape=input_shape)
# Image augmentation block
x = data_augmentation(inputs)
# Entry block
x = layers.experimental.preprocessing.Rescaling(1.0 / 255)(x)
x = layers.Conv2D(32, 3, strides=2, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.Conv2D(64, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
previous_block_activation = x # Set aside residual
for size in [128, 256, 512, 728]:
x = layers.Activation("relu")(x)
x = layers.SeparableConv2D(size, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D(3, strides=2, padding="same")(x)
# Project residual
residual = layers.Conv2D(size, 1, strides=2, padding="same")(
previous_block_activation
)
x = layers.add([x, residual]) # Add back residual
previous_block_activation = x # Set aside next residual
x = layers.SeparableConv2D(1024, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.GlobalAveragePooling2D()(x)
activation = "softmax"
units = num_classes
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(units, activation=activation)(x)
return keras.Model(inputs, outputs)
model = make_model(input_shape=image_size + (3,), num_classes=2)
# keras.utils.plot_model(model, show_shapes=True) # Needed pydot for this
```
%% Cell type:code id:fcb0538a-7f79-450a-8f65-a45967561704 tags:
``` python
epochs = 10
model.compile(
optimizer=keras.optimizers.Adam(1e-3),
loss="binary_crossentropy",
metrics=["accuracy"],
)
model.fit(
train_ds, epochs=epochs, validation_data=val_ds,
)
```
%% Cell type:code id:c37c970d-b1d7-42d6-a97f-0b040cf8a85e tags:
``` python
img = keras.preprocessing.image.load_img(
"images/Forest/Forest_1.jpg", target_size=image_size
)
img_array = keras.preprocessing.image.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create batch axis
predictions = model.predict(img_array)
score = predictions[0]
print(
"This image is %.1f percent Forest and %.1f percent Industrial."
% (100 * score[0], 100 * score[1])
)
```
%% Cell type:markdown id:11a99532-17de-4450-8ffa-5247496c622f tags:
# MxNet Example
This notebook shows a simple classification example using MxNet
Code has been written starting from https://cv.gluon.ai/build/examples_classification/index.html
Data is a selection of http://madm.dfki.de/files/sentinel/EuroSAT.zip
%% Cell type:code id:be891056-f8ed-4340-a920-87bb1bbd72f4 tags:
``` python
import mxnet as mx
from mxnet import gluon
from mxnet import autograd as ag
from mxnet.gluon import nn
from mxnet.gluon.data.vision import transforms
```
%% Cell type:code id:b386909e-261f-4971-b0ba-1e8b5c584a02 tags:
``` python
mx.random.seed(1234)
batch_size = 32
n_classes = 2
```
%% Cell type:code id:59057db7-c23a-4bc1-ae9e-87624f3dd7d6 tags:
``` python
!unzip -qo images.zip
!rm -rf images/.ipynb_checkpoints/ # Otherwise gluon will try to read images from this directory and get the wrong number of classes
# Images read from disk will be converted to tensor and normalized. Data augmentation is also performed
transform_train = transforms.Compose([
transforms.RandomFlipLeftRight(),
transforms.RandomFlipTopBottom(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) # ImageNet mean and stddev
])
train_data = gluon.data.DataLoader(
gluon.data.vision.ImageFolderDataset('images').transform_first(transform_train),
batch_size=batch_size, shuffle=True, num_workers=1)
```
%% Cell type:markdown id:eedb5f93-7163-4652-9499-4887998a20a0 tags:
Define a Convolutional Neural Network
%% Cell type:code id:9c534007-8507-49fd-afaa-dc3e2aaffb14 tags:
``` python
net = nn.Sequential()
# Add a sequence of layers.
net.add(
nn.Conv2D(channels=6, kernel_size=5, activation='relu'),
nn.MaxPool2D(pool_size=2, strides=2),
nn.Conv2D(channels=16, kernel_size=3, activation='relu'),
nn.MaxPool2D(pool_size=2, strides=2),
# The dense layer will automatically reshape the 4-D output of last
# max pooling layer into the 2-D shape: (x.shape[0], x.size/x.shape[0])
nn.Dense(120, activation="relu"),
nn.Dense(84, activation="relu"),
nn.Dense(n_classes)
)
```
%% Cell type:markdown id:78e13e93-75bd-466d-a138-a813618205b6 tags:
Get the GPU and inizialize the network
%% Cell type:code id:2ab8b7d8-7135-4114-991e-290bab6e1f86 tags:
``` python
assert mx.context.num_gpus()
ctx = mx.gpu(0)
net.initialize(ctx=ctx)
```
%% Cell type:markdown id:9df40855-e429-424f-a877-386f2d27910c tags:
Define training parameter: optimization, loss, and metric
%% Cell type:code id:0cfbc537-f514-4933-a329-bcb0ad058945 tags:
``` python
# Nesterov accelerated gradient descent
optimizer = 'nag'
# Set parameters
optimizer_params = {'learning_rate': 0.1, 'wd': 0.0001, 'momentum': 0.9}
# Define our trainer for net
trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params)
# Define loss and metric
loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()
train_metric = mx.metric.Accuracy()
```
%% Cell type:markdown id:6c880933-cc88-44aa-8d61-d8c9ba1b5508 tags:
Train the network
%% Cell type:code id:74edac2b-57ef-4b93-8517-5b1c309f4117 tags:
``` python
epochs = 10
for epoch in range(epochs):
train_metric.reset()
train_loss = 0
# Loop through each batch of training data
for i, batch in enumerate(train_data):
# Extract data and label
data = gluon.utils.split_and_load(batch[0], ctx_list=[ctx], batch_axis=0)
label = gluon.utils.split_and_load(batch[1], ctx_list=[ctx], batch_axis=0)
# AutoGrad
with ag.record():
output = [net(X) for X in data]
loss = [loss_fn(yhat, y) for yhat, y in zip(output, label)]
# Backpropagation
for l in loss:
l.backward()
# Optimize
trainer.step(batch_size)
# Update metrics
train_loss += sum([l.sum().asscalar() for l in loss])
train_metric.update(label, output)
name, acc = train_metric.get()
print('Epoch %d - accuracy on training set %.3e' % (epoch + 1, acc))
```
%% Cell type:markdown id:adab7ae6-fc15-4bac-aa25-506628d78abd tags:
# Torch Example
This notebook shows a simple classification example using Torch
Code has been written starting from https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
Data is a selection of http://madm.dfki.de/files/sentinel/EuroSAT.zip
%% Cell type:code id:5613e36b-5d23-464f-9abb-914fdd1ecfff tags:
``` python
import torch
import torchvision
from torchvision import datasets, transforms
import numpy as np
import pylab as plt
```
%% Cell type:markdown id:cb030ffd-505b-4d52-a8eb-6c88443d2cde tags:
Ensure that CUDA is available and that PyTorch sees the GPU. Then, create a device object. PyTorch requires to explicitely send networks and data to GPU.
%% Cell type:code id:6bd24fe2-bfd0-4e4c-9240-28c47c74ebc6 tags:
``` python
assert torch.cuda.is_available()
assert torch.cuda.device_count() > 0
device = torch.device("cuda:0")
```
%% Cell type:code id:ca5f3f2b-5fcc-461e-899e-0eaaab0552bc tags:
``` python
batch_size = 8
classes = ['Forest', 'Industrial']
n_classes = len(classes)
n_epochs = 10
np.random.seed(1234)
```
%% Cell type:markdown id:5d34c385-c824-42f3-9a18-3aa1f34986e6 tags:
Read data from disk. Convert it to a tensor (otherwise it will be read as a PIL image), normalize and resize.
Then data is split between training and test set.
%% Cell type:code id:c49ed5b8-41a9-418a-a431-3da3db19e233 tags:
``` python
img_transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
transforms.Resize((32, 32))]
)
!unzip -qo images.zip
!rm -rf images/.ipynb_checkpoints/ # Otherwise PyTorch will try to read images from this directory and get the wrong number of classes
dataset = datasets.ImageFolder('images', transform=img_transform)
train_set, test_set = torch.utils.data.random_split(dataset, [int(.9 * len(dataset)), int(.1 * len(dataset))])
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=True)
train_iter = iter(train_loader)
```
%% Cell type:markdown id:945b0555-ab79-4cfa-ab3b-83411601f68f tags:
Show some examples of training data
%% Cell type:code id:8822e7f6-30f4-4c04-8da4-06d3aca05b6b tags:
``` python
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
images, labels = train_iter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(batch_size)))
```
%% Cell type:markdown id:f5917040-e0d6-4a12-9017-91bfa1848ea4 tags:
Creates a simple Convolutional Neural Network
%% Cell type:code id:78e10cb5-2714-40d4-8957-521376ef2719 tags:
``` python
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self, n_classes):
super().__init__()
self.conv1 = nn.Conv2d(3, 36, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(36, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, n_classes)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net(n_classes).to(device) # Send network to be executed on GPU
```
%% Cell type:markdown id:3431cb5a-4a0e-456b-ad1e-d3e3b102e519 tags:
Defines the optimizer
%% Cell type:code id:27fa5f98-9f92-43b4-a8f1-25d1662b66a0 tags:
``` python
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)
```
%% Cell type:markdown id:c6fc47b1-25d9-418c-9fc4-c67ca7b4fe4b tags:
Train the network. Iterate over epochs, and over training set.
%% Cell type:code id:e5e0c966-72f5-43b9-9aa5-3d09a62e027e tags:
``` python
for epoch in range(n_epochs): # loop over the dataset multiple times
print('Epoch %d' % (epoch + 1))
running_loss = 0.0
for i, data in enumerate(train_loader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
# Remember to send data to GPU
outputs = net(inputs.to(device))
loss = criterion(outputs, labels.to(device))
loss.backward()
optimizer.step()