Introduction

What is Ava ML?

Ava ML is a toolkit that simplifies supervised training of neural networks for research purposes. It is designed facilitate many tasks common in applying deep neural networks.

Features of Ava ML include:

Standard models, datasets and metrics: Avoids re-implementations and allows for a quick start. Datasets can be downloaded with a single command and shared with other users on the same system.
Feature Extractors: A large number implemented feature extractors pre-trained on ImageNet (via the pretrainedmodels python package)
Command Line Interface: Train and evaluate models by running simple instructions in the command line.
Experiment definitions: Facilitates reproducibility and describes hyperparameters in a transparent way. Direct output of latex tables is possible.
Visual inspection of datasets: Enables verification that datasets (including their parameters) are properly implemented.
Standard Structure: The division into models, datasets and metrics facilitates collaboration in projects when multiple persons are involved.

Note, this is an early preview version not meant to be used in production.

Installation

To install ava, first clone the git repository using

git clone https://gitlab.gwdg.de/cns-group-public/ava-ml

Now the package can be installed by:

python3 setup.py develop --user

The setup will create a folder ~/.ava containing the file paths.yaml.

TRAINED_MODELS_PATH location where trained models are stored. To prevent overwriting important models consider changing the access rights chmod a-w <model-file>.
CACHE_PATH location where data is cached (sometimes required for fast training)
DATASETS location of datasets

Also the path to every individual dataset can be set. E.g. by adding a line:

MNIST: /path/to/mnist

Next Steps

Here are a few ideas of what to do next:

To generate and view the documentation run ./build-doc.py and open doc/index.htm in a browser.
Verify that the installation was successful by running test is tests.
Start training by ava train <model> <dataset>.

Features

Training and Scoring

We differentiate two phases: training and scoring (or evaluation). Training will result in a model and a model-args file being created that encode the obtained parameters (weights).

Models, Datasets and Metrics

Models are trained and evaluated on datasets using a metric. All three components are defined separately and can be exchanged almost freely (given they are compatible). This means that once an image classification model is designed, it can be trained on all image classification datasets.

Feature Extractors

These are re-usable components of the model.

Transformations

Common operations such as image resizing.

Inspection

Use ava inspect <dataset> <dataset-args> to explore datasets.

Caching

Training and evaluation are expensive. Hence, many intermediate results are cached.

Models

Writing a Model

Required Methods

__init__(): Initialization of the model (e.g. initialize the neural network)
forward(*x): Receive inputs *x and do forward pass

Baselines

Baselines are special models that are not trained but have direct access to the Dataset. They are indicated by returning True in the method is_baseline. Using the access to the dataset, e.g. the mode of the label distribution can be computed.

Available Models

Classification

ImageFEClassifier (self, base, multiclass=None, pretrained=False, n_classes=10, freeze_level=None, pre_pool_out=False, ignore_input=False, kwargs)

Base class that turns feature extractors into classifiers.

    Args:
        - base: name of the feature extractor the classifier is based on
        for other arguments see ImageClassifierBase

    Returns (forward method):
        - either Vector of n_classes or Matrix (n_classes x multiclass)

Segmentation

ResNetSegmentation (self, base_network, decoder_shape=m, out_channels=10, init_state_dict=None, pretrained=False, multipredict=False, thresholds=None, pyramid=False, dropout=None, small_resnet=False)

decoder shape defines different factors for the number of maps in the decoder layers.

PSPNet (self, out_channels=10, decoder_shape=m, dropout=None, base=drn105, pretrained=False, with_skip=False)

Own implementation of PSPNet. Does not use auxiliary outputs for loss and synchronized batch norm.

Video

Recurrent (self, n_classes, base=resnet50, multiclass=None, encoder_freeze=None, hop_sizes=(1,), resize_input=False, hidden_size=512, dropout=None, tpp=None, lstm_layers=1, no_pretrain=False, ignore_input=False, input_channels=3, feature_extraction_size=None, rnn_type=lstm, output_hidden=False, vector_input=None, kwargs)

No description yet

SpaceTime (self, n_classes, multiclass=None, input_channels=3, dense=1024, size=small, ignore_input=False, kwargs)

Following the LTC model by Varol et al. (slightly modified)

SingleFrameFeature (self, n_classes, base=resnet50, dropout=None, ignore_image=False)

No description yet

MultiFramePooling (self, n_classes, base=resnet50, dropout=None, ignore_image=False, vector_input=None, input_channels=3, no_pretrain=False, encoder_freeze=None, feature_extraction_size=None, kwargs)

No description yet

Blocks

Blocks are re-useable components of neural networks.

Available Blocks

DatasetNormalize (self, in_range, mean, std, dims=4, color_dim=1)

No description yet

Feature Extraction

The feature extraction API provides access several state-of-the-art feature extractors. For example, these can be combined with an image classifier (see ImageClassifier for details) or initialized in custom code using

from ava.models.feature_extraction import get_feature_extractor
fe = get_feature_extractor('resnet50')

Available FeatureExtractors

.dilated_resnet: drn38, drn54, drn105
.inception: bninception, xception, inception4, inception3
.lightweight: mnist_net, light_unet, light_convnet
.resnet: resnet18, resnet50, resnet101, resnet152
.se_resnet: se_resnet50, se_resnet101, se_resnext50, se_resnext101
.squeezenet: squeezenet
.yolo: yolo, yolo_openimages

Loss

The loss function tells us how good the predictions of the model match the ground truth provided by the dataset. Several loss functions have been implemented.

Available Losses

Standard

cross_entropy (y_true, y_pred)

No description yet

binary_cross_entropy (y_true, y_pred)

No description yet

prob_cross_entropy (y_pred, y_gt, with_mask=False)

No description yet

splitted_cross_entropy (y_pred, y_gt, split_ranges)

No description yet

binary_cross_entropy_masked (input, target, channel_weights, mask, class_mean=False)

No description yet

Dense

dense_cross_entropy (y_pred, y_gt, loss_weights=None, out_channel_weights=None)

Cross entropy for pixel-wise predictions.

dense_binary_cross_entropy (y_pred, y_gt, loss_weights=None, class_mean=False, use_mask=False, out_channel_weights=None, apply_sigmoid=False)

Binary cross entropy for pixel-wise predictions.

Datasets

Custom implementations for several dataset are provided. A list can be found below.

Writing a Dataset

Data Type

The data type describes the input and ground truth of the dataset. For image classification this would be image(input_image)->C(class). Names can be optionally provided in brackets.

super().__init__('image(input_image)->C(class)')

C: categorical
M: multi-label
denseC: pixel-wise categorical
denseM: pixel-wise multi-label
image: RGB image
video: RGB video
videoGS: greyscale video

Required Methods

__init__(subset, seed): Initialization of the dataset
- subset: Defines the subset.
- seed: Can be used to randomly generate different subsets. Useful for cross-validation.
__getitem__(idx):
- idx: index into the samples array, i.e. draws the idx-th sample.

Special Attributes

sample_ids: List of sample identifiers. Its length determines the dataset length. This attribute should be set in the __init__ method.
default_loss: Default loss function to be used.
default_metrics: List of default metrics.
model_config: Arguments to be passed to the model. E.g. number of classes
visualization_hooks: Dictionary of functions to apply on input and outputs. E.g. convert classification id back to name.
additional_visualizations: Additional visualizations, combining different inputs and outputs.

Available Datasets

Image Classification

MNIST (self, subset, factor=1, augmentation=None, resize_factor=None, seed=None)

No description yet

Cifar10 (self, subset, greyscale=False)

No description yet

Image Segmentation

ADE150 (self, subset, augmentation=False, image_size=(352, 352), image_maxbox=False, intermediate_size=None, bgr=False)

No description yet

Video

UCF101 (self, subset, augmentation=False, image_size=(100, 150), n_frames=5)

http://crcv.ucf.edu/data/UCF101.php

    Installation:
      mkdir UCF101; cd UCF101
      wget http://crcv.ucf.edu/data/UCF101/UCF101.rar
      unrar UCF101.rar
      http://crcv.ucf.edu/data/UCF101/UCF101TrainTestSplits-RecognitionTask.zip
      unzip UCF101TrainTestSplits-RecognitionTask.zip

SomethingFrames (self, subset, n_frames, resize_factor=None, image_size=(100, 150), crop_padding=20, early=None, cache=False, with_optical_flow=False, feature_extractor=None, no_avgpool=False, augmentation=False, multi_seq=None, fixed_frame=None, sampling=binned)

Frames of 20BN's SomethingSomething dataset.
`feature_extractor`: Extract features in the dataset such that they can be cached. Augmentation is not
possible in that case.

Transformations

Here some common data transformations are defined that can be used to build a Dataset object.

Available Transformations

Resize and Crop

tensor_resize (tensor, target_size_or_scale_factor, interpret_as_min_bound=False, interpret_as_max_bound=False, channel_dim=None, interpolation=bilinear, autoconvert=False, keep_channels_last=False)

Resizes a tensor (e.g. an image) along two dimensions while
    dimension `channel_dim` remains constant.

    Args:
        tensor: The input tensor
        target_size_or_scale_factor: Depending on the type...
         - a tuple (int, int) it specifies the target size.
           One dimension can be set to None if interpret_as_bound is not set.
         - a float it specifies the scale factor
        interpret_as_min_bound: TODO
        interpret_as_max_bound: TODO
        channel_dim: TODO
        interpolation: Used interpolation mode
        autoconvert: TODO
        keep_channels_last: TODO

    Returns:
        The resized tensor

    Best performance is obtained when `channel_index` is 2.

resize_by_mode (img, mode, length)

Resizes the input image depending on the provided mode to length.

    Args:
        img: Input image
        mode: Used mode. Can be
            - max_side:
            - max_width:
            - max_height:
            - size:
        length: Length value to be used

    Returns:
        Resized images

pad_to_square (img)

add padding such that a squared image is returned

random_crop (tensor, target_size, image_dimensions=(0, 1))

Randomly samples a crop of size `target_size` from
`tensor` along `image_dimensions`

random_crop_slices (origin_size, target_size)

Gets slices of a random crop.

sample_random_patch (img_shape, target_size, importance_map=None, random_shift=True, rescale=None)

Takes an image `img` (HxWx3)  and randomly samples a
    smaller (`target size`) patch according to the probability density `importance_map`.
    If random_scaling is True, before cropping, the image is scaled.
    If random shift is True, the patch is slightly shifted from the center.
    importance_map_size can speed up the sampling process by using a smaller map than the image.
    Returns:
        numpy slice object to be applied on the image dimensions

Color

get_gamma_offset_lut (gamma, offset)

No description yet

apply_gamma_offset (img, gamma=None, offset=None, lut=None, channel_dim=2, keep_channels_last=False)

No description yet

add_noise (img, std, pos_noise=None, neg_noise=None)

Adds noise to an image in the range of 0-255.
Generating random numbers is very time-consuming. It can be avoided by providing a pre-computed noise template
from which a crop is sampled

randomize_color_hs (img, max_h_shift, max_s_shift, max_v_shift)

No description yet

rescale_intensities (img, data_type, in_intensity_range, out_intensity_range)

No description yet

Sampling

sample_binned (sequence, bins, bin_randomness=0, fill_gaps=None)

Divides the input sequence into `k` bins and draws a random sample from each bin such that the resulting sequence
has length `k`

sample_equidistant (sequence, n_samples, offset=None)

Samples from sequence such that the interval between samples are equal.

sample_spaced (sequence, n_samples, min_dist)

Sample from `sequence` such that indices have a distance of at least `min_dist`.

    The problem looks like this:
    let "|" be shapes and md the minimal distance: a | md + b1 | md + b2 | c
    now a, b1,.. and c can be chosen freely as long as: n_frames - n_shapes * md = a + b1 + ... + c

    We have some base_positions. These are the smallest indices possible for the respective sample.
    E.g. if the spacing is 3, base positions are: 0, 3, 6, ...
    To this we add randomly sampled offsets which need to sum to the remaining free space.

    Here the offsets are accumulative, i.e. each offset is larger than previous ones. By sampling from
    an arange and subsequent sorting, we obtain such offset that do not exceed the free space.

sample_limited_repetitions (sequence, n_samples, max_repetitions)

No description yet

Python API

Components of ava can be called within python scripts.

To load a pre-trained model:

from ava.models import load_pretrained_model
model = load_pretrained_model('model_name')

It is also straightforward to use transformations:

from ava.transformations import tensor_resize, random_crop
img = tensor_resize(img, (100, 100), interpret_as_min_bound=True)
img = random_crop(img, (100, 100))

Command Line Interface

The command line interface of ava offers several options:

ava train <model> <dataset> <options>
ava score <model> <dataset> <options>
ava inspect <dataset> check the dataset implemention
ava benchmark <dataset> assess the speed and potential bottlenecks in datasets
ava experiment <experiment-definition> run experiment

Customized Datasets

You can define customized datasets outside the ava source code tree by defining datasets or models in python modules ending with _dataset.py or _model.py in your project folder. When these are available the defined datasets and models are available to be used in the CLI or experiment files.

See the examples/sample_project folder for an example.

Experiment Files

Experiments can be defined in yaml by specifying a pairs of models and datasets as well as their parameters. You can find examples of experiment definitions ins the experiments folder, e.g. run

ava experiment experiments/mnist.yaml

Important variables are:

common_train_args: arguments that hold for all model-dataset pairs for training (e.g. batch size)
common_test_args: arguments that hold for all model-dataset pairs for evaluation
configurations: A list of individual (model, dataset, parameter) tuples for training.
test_dataset: A list of datasets (with parameters) for evaluation.
individual_test_dataset: A list of individual evaluation datasets (with parameters) for each configuration.

The experiment can generate tables which can be directly used in tex files. These are the relevant parameters:

tex_transpose

for more parameters see the example experiment files

Preparation Tools

There are some useful preparation tools available using

ava tool <tool>

Here <tool> can be one of these:

rescale Rescale Images
extract-features extract features from images
extract-frames extract frames from videos
list-files writes an index of text files
validate-dataset check if subsets overlap
clean-models remove models
download-video downloads video

Development

This folder and its subfolder contain modules that are relevant for the core functions of the library, e.g. training and scoring models.

General Structure

Ava is strcutured into several sub-packages.

core (see below)
datasets: Dataset defintions
metrics: Metrics to measure performance
models (see below)
scripts: Useful tools
third_party: Code that was not written by us
transformations: Data preprocessing routines

Core

This sub-package contains core functions such as for launching training and experiments. It involves the following subpackages:

common:
data_types: Definitions of data types for visualization
training: Training logic
visualize: Framework for visualizing input data and predictions

Additionally these modules exist:

arguments: The argparse configuration for the CLI
benchmarks:
dataset:
experiment:
model:
plots:
logging: Logging to console, log file and visdom
score:
table:
visualize:

Model

The models directory contains implementations of several models.

Folders

video: Models for video classification.
dense: Models that predict densely, e.g. for semantic segmentation.
deprecated: Old models that should no longer be used. These models will not be available on the CLI.