Saturday, September 11, 2021

Automating the configuration of a GAN in PyTorch

Target audience: Expert
Estimated reading time: 60'

This post illustrates the automation of creating deep convolutional generative adversarial networks (DCGAN) by inferring the configuration of generator from the discriminator. We will use the ubiquituous real vs. fake images detection scenario for our GAN model. 

This post does not dwell in details into generative adversarial networks or convolutional networks. It focuses on automating the configuration of some of their components. It is assumed the reader has some basic understanding of convolutional neural networks and Pytorch library.

The challenge

For those not familiar with GANs..... 
GANs are unsupervised learning models that discover patterns in data and use those patterns to generate new samples (data augmentation) that are almost indistinguishable from the original data. GANs are part of the generative models family along with variational auto-encoders or MLE. The approach reframes the problem as a supervised learning problem using two adversarial networks:
  • Generator model trained to generate new samples
  • Discriminator model that attempts to classify the new samples as real (from the original dataset) or fake (generated)
Please refer to the reference section to learn more about generative adversarial networks.

Designing and configuring the generator and discriminator of a generative adversarial networks (GAN) or the encoder and decoder layers of a variational convolutional auto-encoders (VAE) can be a very tedious and repetitive task. 
Actually some of the steps can be fully automated knowing that the generative network of the convolutional GAN for example can be configured as the mirror (or inversion) of the discriminator using a de-convolutional network. The same automation technique applies to the instantiation of a decoder of a VAE given an encoder.
Functional representation of a simple deep convolutional GAN

Neural component reusability is key to generate a de-convolutional network from a convolutional network. To this purpose we break down a neural network into computational blocks.

Convolutional neural blocks

At the highest level, a generative adversarial network is composed of at least two neural networks: A generator and a discriminator.
These two neural networks can be broken down into neural block or group of PyTorch modules: hidden layer, batch normalization, regularization, pooling mode and activation function. Let's consider a discriminator built using a convolutional neural network followed by a fully connected (restricted Boltzmann machine) network. The PyTorch modules associated with any given layer are assembled as a neural block class.
A PyTorch modules of the convolutional neural block are:
  • Conv2d: Convolutional layer with input, output channels, kernel, stride and padding
  • Dropout: Drop-out regularization layer
  • BatchNorm2d: Batch normalization module
  • MaxPool2d Pooling layer
  • ReLu, Sigmoid, ... Activation functions

Representation of a convolutional neural block

The constructor for the neural block initializes all its parameters and its modules in the proper oder. For the sake of simplicity, regularization elements such as drop-out (bagging of sub-network) is omitted.
class ConvNeuralBlock(nn.Module):
def __init__(self,
bool = False):
super(ConvNeuralBlock, self).__init__()
        # Assertions are omitted
# 1- initialize the input and output channels
self.in_channels = in_channels
self.out_channels = out_channels
self.is_spectral = is_spectral
modules = []
        conv_module = nn.Conv2d(   # 2- create a 2 dime convolution layer

if self.is_spectral: # 6- if this is a spectral norm block
conv_module = nn.utils.spectral_norm(conv_module)
if batch_norm: # 3- Batch normalization
if activation is not None: # 4- Activation function
if max_pooling_kernel > 0: # 5- Pooling module
self.modules = tuple(modules)
We considering the case of a generative model for images. The first step (1) is to initialize the number of input and output channels, then create the 2-dimension convolution (2), a batch normalization module (3) an activation function (4) and finally a Max  pooling module (5). The spectral norm regularization (6) is optional.

The convolutional neural network is assembled from convolutional and feedback forward neural blocks, in the following build method.
class ConvModel(NeuralModel):
def __init__(self, # 1- Default constructor
nn.Sequential, # 2- PyTorch convolutional modules
int = -1,
nn.Sequential = None):# 3- PyTorch RBM modules
super(ConvModel, self).__init__(model_id)
self.input_size = input_size
self.output_size = output_size
self.conv_model = conv_model
self.dff_model_input_size = dff_model_input_size
self.dff_model = dff_model
def build(cls,
list) -> NeuralModel:
# 4- Initialize the input and output size for the convolutional layer
input_size = conv_neural_blocks[0].in_channels
output_size = conv_neural_blocks[
len(conv_neural_blocks) - 1].out_channels

# 5- Generate the model from the sequence of conv. neural blocks
conv_modules = [conv_module for conv_block in conv_neural_blocks
for conv_module in conv_block.modules]
conv_model = nn.Sequential(*conv_modules)

# 6- If a fully connected RBM is included in the model ..
if dff_neural_blocks is not None and not is_vae:
dff_modules = [dff_module
for dff_block in dff_neural_blocks
for dff_module in dff_block.modules]
dff_model_input_size = dff_neural_blocks[
dff_model = nn.Sequential(*
dff_model_input_size = -
dff_model = None
      return cls(model_id, conv_dimension, input_size, output_size, 
conv_model,dff_model_input_size, dff_model)

The default constructor (1) initializes the number of input/output channels, the PyTorch modules for the convolutional layers (2) and the fully connected layers (3).
The class method, build, instantiate the convolutional model from the convolutional neural blocks and feed forward neural blocks. It initializes the size of input and output layers from the first and last neural blocks (4), generate the PyTorch convolutional modules (5) and fully-connected layers modules (6) from the neural blocks.
Next we build the de-convolutional neural network from the convolutional blocks.

Inverting a convolutional block

The process to build a GAN is as follow:
  1. Specify components (PyTorch modules) for each convolutional layer 
  2. Assemble these modules into a convolutional neural block
  3. Create a generator and discriminator network by aggregating the blocks
  4. Wire the generator and discriminator to product a fully functional GAN
The goal is create a builder for generating the de-convolutional network implementing the GAN generator from the convolutional network defined in the previous section. 
The first step is to extract the de-convolutional block from an existing convolutional block

Conceptual conversion of a convolutional block into a de-convolutional block

The default constructor for the neural block of a de-convolutional network defines all the key parameters used in the network except the pooling module (not needed). The following code snippet illustrates the instantiation of a De convolutional neural block using the convolution parameters such as number of input, output channels, kernel size, stride and passing, batch normalization and activation function. 
class DeConvNeuralBlock(nn.Module):
# The default constructor
def __init__(self,
bool) -> object:
super(DeConvNeuralBlock, self).__init__()

self.in_channels = in_channels
self.out_channels = out_channels
        modules = []
# Two dimension de-convolution layer
de_conv = nn.ConvTranspose2d(
if batch_norm: # Add the batch normalization
        self.modules = modules
Note that the de-convolution block does have any pooling capabilities

The class method, auto_build, takes a convolutional neural block, number of input and output channels and an optional activation function to generate a de-convolutional neural block of type DeConvNeuralBlock. The number of input and output channels in the output deconvolution layer is computed in the private method __resize
def auto_build(cls,
conv_block: ConvNeuralBlock,
in_channels: int,
out_channels: int = None,
activation: nn.Module = None) -> nn.Module:
# Extract the parameters of the source convolutional block
kernel_size, stride, padding, batch_norm, activation = \
DeConvNeuralBlock.__resize(conv_block, activation)

# Override the number of input_tensor channels for this block if defined
next_block_in_channels = in_channels if in_channels is not None \
else conv_block.out_channels

# Override the number of output-channels for this block if specified
next_block_out_channels = out_channels if out_channels is not None \
else conv_block.in_channels
return cls(

Sizing de-convolutional layers

The next task consists of computing the size of the component of the de-convolutional block from the original convolutional block. 
def __resize(conv_block: ConvNeuralBlock,
updated_activation: nn.Module) -> (int, int, int, bool, nn.Module):
conv_modules = list(conv_block.modules)
# 1- Extract the various components of the convolutional neural block
_, batch_norm, activation = DeConvNeuralBlock.__de_conv_modules(conv_modules)
    # 2- override the activation function for the output layer, if necessary
if updated_activation is not None:
activation = updated_activation
# 3- Compute the parameters for the de-convolutional layer, from the conv. block

kernel_size, _ = conv_modules[
stride, _ = conv_modules[
padding = conv_modules[

return kernel_size, stride, padding, batch_norm, activation
The __-resize method extracts the PyTorch modules for the de-convolutional layers from the original convolutional block (1), adds the activation function to the block (2) and finally initialize the parameters of the de-convolutional (3).

The helper method,  __de_conf_modules, extracts the PyTorch modules related to the convolutional layer, batch normalization module and activation function for the de-convolution from the PyTorch modules of the convolution.
def __de_conv_modules(conv_modules: list) -> \
(torch.nn.Module, torch.nn.Module, torch.nn.Module):
activation_function = None
deconv_layer = None
batch_norm_module = None
    # 4- Extract the PyTorch de-convolutional modules from the convolutional ones
for conv_module in conv_modules:
if DeConvNeuralBlock.__is_conv(conv_module):
deconv_layer = conv_module
elif DeConvNeuralBlock.__is_batch_norm(conv_module):
batch_norm_moduled = conv_module
elif DeConvNeuralBlock.__is_activation(conv_module):
activation_function = conv_module
return deconv_layer, batch_norm_module, activation_function

One key step is to compute the size of the image along the various convolutional and de-convolutional neural layers.

Convolutional layers
Given a padding p, kernel size k, a stride s, the width of a two dimension output data related to the image is

and the height of the two dimension output data is

De-convolutional layers
As expected, the formula to computed the size of the output of a de-convolutional layer is the mirror image of the formula for the output size of the convolutional layer.


Assembling the de-convolutional network

Finally, de-convolutional model, of type DeConvModel  is created using the sequence of PyTorch module, de_conv_model. Once again, the default constructor (1) initializes the size of the input layer (2) and output layer (3) and load the PyTorch modules, de_conv_modules, for all de-convolutional layers.
class DeConvModel(NeuralModel, ConvSizeParams):
def __init__(self, # 1 - Default constructor
int, # 2 - Size first layer
int, # 3 - Size output layer
super(DeConvModel, self).__init__(model_id)
self.input_size = input_size
self.output_size = output_size
self.de_conv_modules = de_conv_modules

def build(cls,
list, # 4- Input to the builder
int = None,
torch.nn.Module = None) -> NeuralModel:

de_conv_neural_blocks = []

# 5- Need to reverse the order of convolutional neural blocks

# 6- Traverse the list of convolutional neural blocks
for idx in range(len(conv_neural_blocks)):
conv_neural_block = conv_neural_blocks[idx]
new_in_channels =
activation = None
last_out_channels = None

# 7- Update num. input channels for the first de-convolutional layer
if idx == 0:
new_in_channels = in_channels
# 8- Defined, if necessary the activation function for the last layer
elif idx == len(conv_neural_blocks) - 1:
if last_block_activation is not None:
activation = last_block_activation
if out_channels is not None:
last_out_channels = out_channels

# 9- Apply transposition to the convolutional block
de_conv_neural_block = DeConvNeuralBlock.auto_build(conv_neural_block,
        # 10- Instantiate the Deconvolutional network from its neural blocks
de_conv_model = DeConvModel.assemble
(model_id, de_conv_neural_blocks)
del de_conv_neural_blocks
return de_conv_model
The alternate constructor, build, creates and configures the de-convolutional model from the convolutional blocks, conv_neural_blocks (4). The order of the de-convolutional layers requires the list of convolutional blocks to be reversed (5). 
For each block of the convolutional network (6), the method updates the number of input channels from the number of input channels of the first layer (7). The method updates the activation function for the output layer (8) and weaves the de-convolutional blocks (9)
Finally, the de-convolutional neural network is assembled from these blocks (10).
def assemble(cls, model_id: str, de_conv_neural_blocks: list):
input_size = de_conv_neural_blocks[
output_size = de_conv_neural_blocks[
len(de_conv_neural_blocks) - 1].out_channels
    # 11- Generate the PyTorch convolutional modules used by the default constructor
conv_modules =
tuple([conv_module for conv_block in de_conv_neural_blocks
for conv_module in conv_block.modules
if conv_module is not None])
de_conv_model = torch.nn.Sequential(*conv_modules)
return cls(model_id, input_size, output_size, de_conv_model)
The assemble method constructs the final de-convolutional neural network from the blocks         de_conv_neural_blocks by aggregating the PyTorch modules associated with each block (11).


  • Python 3.8
  • PyTorch 1.7.2


Monday, June 21, 2021

Open Source Lambda architecture for deep learning

Target audience: Beginner
Estimated reading time: 15'

The world of data scientists accustomed to Python scientific libraries have been shaken up by the emergence of ’big data’ framework such as Apache Hadoop, Spark and Kafka. This presentation introduces a variant of the Lambda architecture and describes, very briefly the seamless integration of various open source components. This post is a high level overview of  to the key services of a typical architecture.

Core data flow

The concept and architecture are versatile enough to accommodate a variety of open source, commercial solutions and services beside the frameworks prescribed in this presentation. The open source framework PyTorch is used to illustrate the integration of big data framework such as Apache Kafka and Spark with deep learning library to train, validate and test deep learning models.

Alternative libraries such as Keras or Tensor Flow could be also used.

Let's consider the use case of training and validating a deep learning model, using Apache Spark to load, parallelize and preprocess the data. Apache Spark takes advantage of large number of servers and CPU cores.

In this simple design, the workflow is broken down into 6 steps
  1. Apache Spark load then parallelize training data from AWS S3 
  2. Spark distributed the data pre-processing, cleansing, normalization across multiple worker nodes
  3. Spark forward the processed data to PyTorch cluster
  4. Flask converts requests to prediction query to PyTorch model
  5. PyTorch model generate a prediction
  6. Run-time metrics are broadcast through Kafka

Key services

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.It extends the functionality of Numpy and Scikit-learn to support the training, evaluation and commercialization of complex machine learning models.

Apache Spark is an open source cluster computing framework for fast real-time processing. 
It supports Scala, Java, Python and R programming languages and includes streaming, graph and machine learning libraries.

Apache Kafka is an open-source distributed event streaming framework to large scale, real-time data processing and analytics. 
It captures data from various sources in real-time as a continuous flow and routes it to the appropriate processor. 

Ray-tune is a distributed hyper-parameters tuning framework particularly suitable to deep learning models.  It reduces significantly the cost of optimizing the configuration of a model. It is a wrapper around other open source library 

Apache Hive is an open source data warehouse platform that facilitates reading,  writing, and managing large datasets residing in distributed storages such as Hadoop and Apache Spark

Flask is Python-based web development platform built as a micro-framework to support REST protocol. Its minimalist approach to web interface makes is a very intuitive tool to be build micro-services.

Amazon Simple Storage Server (S3) is a highly available, secure object storage service with a very high durability factor (11 sigma) and scalability and support for versioning. It is versatile enough to accommodate any kind of data format.


Apache Spark   
Apache Kafka
Ray Tune
Apache Hive
Flask - Pallets project
Amazon S3

This informational post introduced the high level components of a Lambda architecture. Such orchestration of services is the foundation of iterative machine learning modeling concept known as MLOps. MLOps will be discussed in a future post.

Sunday, March 28, 2021

MLOps for data scientists

Target audience: Beginner
Estimated reading time: 20'

This post is a high level introduction of key components of MLOps from the data scientist perspective. MLOps addresses the issue of lack of reliability and transparency in the development and deployment of machine learning models.

AI Productization overview

MLOps is collection of tools supporting the lifecycle of data-centric AI: train models, conduct error analysis to identify the type of data the algorithm does poorly on, acquire more data via data augmentation, addresses inconsistent definition for the data labels, and ultimately use production data  for continuous refinement of the model.

MLOps seeks to automate the training, validation of ML models  and improve their quality, while also focusing on business and regulatory requirements. It integrates the functions of data engineering, data science and dev-ops into a single predictable process in the following areas:
  • Deployment and automation
  • Reproducibility of models and predictions
  • Diagnostics
  • Governance and regulatory compliance (Socs-2, HIPAA)
  • Scalability and latency
  • Collaboration
  • Business use cases & metrics
  • Monitoring and management
  • Technical support

Predictable ML lifecycle

MLOps defines the ML lifecycle management, such as integration with model generation, software development cycle (Jira, Github), continuous testing and delivery, orchestration, and deployment, health, diagnostics, performance governance, and business metrics. From the data science perspective, MLOps defines the continuous and iterative collection/pre-processing of data, model training and evaluation and deployment in production.

Data-centric AI

Andrew Ng introduced the concept of data-centric AI.  He propose to shift the focus of AI practitioners from model/algorithm development to the quality of the data they use to train the models. In the traditional, model-centric approach to AI, data is collected to train and validate a given model with limited regard for the quality of the data.
Data-centric AI improves the odds AI projects and machine learning models  succeed when they are deployed in the real world.
MLOps defines the continuous and iterative collection/pre-processing of data, model training and evaluation and deployment in production.

Fig 1. Overview of continuous development in data-centric AI - courtesy Andrew Ng

There are several difference between the traditional Model-centric AI and Data centric AI

Model-centric AI

Data-centric AI

Goal is to collect all the data you can and develop a model good enough to deal with noise to avoid overfitting.

Goal is to select a subset of the training data with the highest consistency and reliability so multiple models performs well.

Hold the data fixed and iteratively improve the model and code.

Hold the model and code fixes and iteratively improve the data.

Repeatable processes

Predictable delivery of ML products or services relies on three elements
  • Repeatable process
  • Lifecycle management tools
  • Product management
The goal is to apply known repeatable software development (Scrum, Kaban,..) and DevOps best practices to the training and validation of ML models. Moving model training, tuning and validation to operations and automation increases the number of tasks that are controllable and predictable.

Fig 2. Productization of training and validation of models

As illustrated in fig. 1, the deployment process in the model-centric AI leaves little room for integrating the training and validation of the model with new data. In the data-centric AI approach, the model is deployed very early in the development cycle, allowing for continuous integration and update of the model(s) with feedback and new data. 

AI lifecycle management tools

Quite a few open source have been introduced over the last 3 years to support introduction and implementation of MLOps throughout the entire engineering organization.

Although most of the development tools commonly used in software engineers are applicable to MLOps, some ML lifecycle tools have been introduced over the last couple of years.

  • DVC manages version control for ML projects
  • Polyaxon provides data scientists with lifecycle automation in a collaborative environment
  • MLFlow manages the entire ML lifecycle, from experimentation to deployment. It includes a model registry for managing various versions of model
  • Kubeflow is the workflow automation and deployment in Kubernetes containers
  • Metaflow manages the automation pipeline and deployment

AutoML frameworks are increasing used for rapid ML development similar to GUI development

Canary, frictionless release

A robust testing and deployment process is critical to the success of any AI project. The canary release allows the migration of model from development/staging environment to production be frictionless. The process consists of routing % of requests to a new version or sandbox according to a criteria defined by product manager (Modality, Customer, Metrics,…). This approach reduces the risk of failure in deployment to production because there is no need for roll-back. It is just matter of stopping the traffic to the new version.