Introduction

What is Ava ML?

Ava ML is a toolkit that simplifies supervised training of neural networks for research purposes. It is designed facilitate many tasks common in applying deep neural networks.

Features of Ava ML include:

Note, this is an early preview version not meant to be used in production.

Installation

To install ava, first clone the git repository using

git clone https://gitlab.gwdg.de/cns-group-public/ava-ml

Now the package can be installed by:

python3 setup.py develop --user

The setup will create a folder ~/.ava containing the file paths.yaml.

Also the path to every individual dataset can be set. E.g. by adding a line:

MNIST: /path/to/mnist

Next Steps

Here are a few ideas of what to do next:

Features

Training and Scoring

We differentiate two phases: training and scoring (or evaluation). Training will result in a model and a model-args file being created that encode the obtained parameters (weights).

Models, Datasets and Metrics

Models are trained and evaluated on datasets using a metric. All three components are defined separately and can be exchanged almost freely (given they are compatible). This means that once an image classification model is designed, it can be trained on all image classification datasets.

Feature Extractors

These are re-usable components of the model.

Transformations

Common operations such as image resizing.

Inspection

Use ava inspect <dataset> <dataset-args> to explore datasets.

Caching

Training and evaluation are expensive. Hence, many intermediate results are cached.


Models

Writing a Model

Required Methods

Baselines

Baselines are special models that are not trained but have direct access to the Dataset. They are indicated by returning True in the method is_baseline. Using the access to the dataset, e.g. the mode of the label distribution can be computed.

Available Models

Classification

ImageFEClassifier (self, base, multiclass=None, pretrained=False, n_classes=10, freeze_level=None, pre_pool_out=False, ignore_input=False, kwargs)
Base class that turns feature extractors into classifiers.

    Args:
        - base: name of the feature extractor the classifier is based on
        for other arguments see ImageClassifierBase

    Returns (forward method):
        - either Vector of n_classes or Matrix (n_classes x multiclass)

Segmentation

ResNetSegmentation (self, base_network, decoder_shape=m, out_channels=10, init_state_dict=None, pretrained=False, multipredict=False, thresholds=None, pyramid=False, dropout=None, small_resnet=False)
decoder shape defines different factors for the number of maps in the decoder layers.
PSPNet (self, out_channels=10, decoder_shape=m, dropout=None, base=drn105, pretrained=False, with_skip=False)
Own implementation of PSPNet. Does not use auxiliary outputs for loss and synchronized batch norm.

Video

Recurrent (self, n_classes, base=resnet50, multiclass=None, encoder_freeze=None, hop_sizes=(1,), resize_input=False, hidden_size=512, dropout=None, tpp=None, lstm_layers=1, no_pretrain=False, ignore_input=False, input_channels=3, feature_extraction_size=None, rnn_type=lstm, output_hidden=False, vector_input=None, kwargs)
No description yet
SpaceTime (self, n_classes, multiclass=None, input_channels=3, dense=1024, size=small, ignore_input=False, kwargs)
Following the LTC model by Varol et al. (slightly modified)
SingleFrameFeature (self, n_classes, base=resnet50, dropout=None, ignore_image=False)
No description yet
MultiFramePooling (self, n_classes, base=resnet50, dropout=None, ignore_image=False, vector_input=None, input_channels=3, no_pretrain=False, encoder_freeze=None, feature_extraction_size=None, kwargs)
No description yet

Blocks

Blocks are re-useable components of neural networks.

Available Blocks

DatasetNormalize (self, in_range, mean, std, dims=4, color_dim=1)
No description yet

Feature Extraction

The feature extraction API provides access several state-of-the-art feature extractors. For example, these can be combined with an image classifier (see ImageClassifier for details) or initialized in custom code using

from ava.models.feature_extraction import get_feature_extractor
fe = get_feature_extractor('resnet50')

Available FeatureExtractors

.dilated_resnet: drn38, drn54, drn105
.inception: bninception, xception, inception4, inception3
.lightweight: mnist_net, light_unet, light_convnet
.resnet: resnet18, resnet50, resnet101, resnet152
.se_resnet: se_resnet50, se_resnet101, se_resnext50, se_resnext101
.squeezenet: squeezenet
.yolo: yolo, yolo_openimages

Loss

The loss function tells us how good the predictions of the model match the ground truth provided by the dataset. Several loss functions have been implemented.

Available Losses

Standard

cross_entropy (y_true, y_pred)
No description yet
binary_cross_entropy (y_true, y_pred)
No description yet
prob_cross_entropy (y_pred, y_gt, with_mask=False)
No description yet
splitted_cross_entropy (y_pred, y_gt, split_ranges)
No description yet
binary_cross_entropy_masked (input, target, channel_weights, mask, class_mean=False)
No description yet

Dense

dense_cross_entropy (y_pred, y_gt, loss_weights=None, out_channel_weights=None)
Cross entropy for pixel-wise predictions.
dense_binary_cross_entropy (y_pred, y_gt, loss_weights=None, class_mean=False, use_mask=False, out_channel_weights=None, apply_sigmoid=False)
Binary cross entropy for pixel-wise predictions.

Datasets

Custom implementations for several dataset are provided. A list can be found below.

Writing a Dataset

Data Type

The data type describes the input and ground truth of the dataset. For image classification this would be image(input_image)->C(class). Names can be optionally provided in brackets.

super().__init__('image(input_image)->C(class)')

Required Methods

Special Attributes

Available Datasets

Image Classification

MNIST (self, subset, factor=1, augmentation=None, resize_factor=None, seed=None)
No description yet
Cifar10 (self, subset, greyscale=False)
No description yet

Image Segmentation

ADE150 (self, subset, augmentation=False, image_size=(352, 352), image_maxbox=False, intermediate_size=None, bgr=False)
No description yet

Video

UCF101 (self, subset, augmentation=False, image_size=(100, 150), n_frames=5)
http://crcv.ucf.edu/data/UCF101.php

    Installation:
      mkdir UCF101; cd UCF101
      wget http://crcv.ucf.edu/data/UCF101/UCF101.rar
      unrar UCF101.rar
      http://crcv.ucf.edu/data/UCF101/UCF101TrainTestSplits-RecognitionTask.zip
      unzip UCF101TrainTestSplits-RecognitionTask.zip
SomethingFrames (self, subset, n_frames, resize_factor=None, image_size=(100, 150), crop_padding=20, early=None, cache=False, with_optical_flow=False, feature_extractor=None, no_avgpool=False, augmentation=False, multi_seq=None, fixed_frame=None, sampling=binned)
Frames of 20BN's SomethingSomething dataset.
        `feature_extractor`: Extract features in the dataset such that they can be cached. Augmentation is not
        possible in that case.

Transformations

Here some common data transformations are defined that can be used to build a Dataset object.

Available Transformations

Resize and Crop

tensor_resize (tensor, target_size_or_scale_factor, interpret_as_min_bound=False, interpret_as_max_bound=False, channel_dim=None, interpolation=bilinear, autoconvert=False, keep_channels_last=False)
Resizes a tensor (e.g. an image) along two dimensions while
    dimension `channel_dim` remains constant.

    Args:
        tensor: The input tensor
        target_size_or_scale_factor: Depending on the type...
         - a tuple (int, int) it specifies the target size.
           One dimension can be set to None if interpret_as_bound is not set.
         - a float it specifies the scale factor
        interpret_as_min_bound: TODO
        interpret_as_max_bound: TODO
        channel_dim: TODO
        interpolation: Used interpolation mode
        autoconvert: TODO
        keep_channels_last: TODO

    Returns:
        The resized tensor

    Best performance is obtained when `channel_index` is 2.
resize_by_mode (img, mode, length)
Resizes the input image depending on the provided mode to length.

    Args:
        img: Input image
        mode: Used mode. Can be
            - max_side:
            - max_width:
            - max_height:
            - size:
        length: Length value to be used

    Returns:
        Resized images
pad_to_square (img)
add padding such that a squared image is returned
random_crop (tensor, target_size, image_dimensions=(0, 1))
Randomly samples a crop of size `target_size` from
    `tensor` along `image_dimensions`
random_crop_slices (origin_size, target_size)
Gets slices of a random crop.
sample_random_patch (img_shape, target_size, importance_map=None, random_shift=True, rescale=None)
Takes an image `img` (HxWx3)  and randomly samples a
    smaller (`target size`) patch according to the probability density `importance_map`.
    If random_scaling is True, before cropping, the image is scaled.
    If random shift is True, the patch is slightly shifted from the center.
    importance_map_size can speed up the sampling process by using a smaller map than the image.
    Returns:
        numpy slice object to be applied on the image dimensions

Color

get_gamma_offset_lut (gamma, offset)
No description yet
apply_gamma_offset (img, gamma=None, offset=None, lut=None, channel_dim=2, keep_channels_last=False)
No description yet
add_noise (img, std, pos_noise=None, neg_noise=None)
Adds noise to an image in the range of 0-255.
    Generating random numbers is very time-consuming. It can be avoided by providing a pre-computed noise template
    from which a crop is sampled
randomize_color_hs (img, max_h_shift, max_s_shift, max_v_shift)
No description yet
rescale_intensities (img, data_type, in_intensity_range, out_intensity_range)
No description yet

Sampling

sample_binned (sequence, bins, bin_randomness=0, fill_gaps=None)
Divides the input sequence into `k` bins and draws a random sample from each bin such that the resulting sequence
    has length `k`
sample_equidistant (sequence, n_samples, offset=None)
Samples from sequence such that the interval between samples are equal.
sample_spaced (sequence, n_samples, min_dist)
Sample from `sequence` such that indices have a distance of at least `min_dist`.

    The problem looks like this:
    let "|" be shapes and md the minimal distance: a | md + b1 | md + b2 | c
    now a, b1,.. and c can be chosen freely as long as: n_frames - n_shapes * md = a + b1 + ... + c

    We have some base_positions. These are the smallest indices possible for the respective sample.
    E.g. if the spacing is 3, base positions are: 0, 3, 6, ...
    To this we add randomly sampled offsets which need to sum to the remaining free space.

    Here the offsets are accumulative, i.e. each offset is larger than previous ones. By sampling from
    an arange and subsequent sorting, we obtain such offset that do not exceed the free space.
sample_limited_repetitions (sequence, n_samples, max_repetitions)
No description yet

Python API

Components of ava can be called within python scripts.

To load a pre-trained model:

from ava.models import load_pretrained_model
model = load_pretrained_model('model_name')

It is also straightforward to use transformations:

from ava.transformations import tensor_resize, random_crop
img = tensor_resize(img, (100, 100), interpret_as_min_bound=True)
img = random_crop(img, (100, 100))

Command Line Interface

The command line interface of ava offers several options:

Customized Datasets

You can define customized datasets outside the ava source code tree by defining datasets or models in python modules ending with _dataset.py or _model.py in your project folder. When these are available the defined datasets and models are available to be used in the CLI or experiment files.

See the examples/sample_project folder for an example.


Experiment Files

Experiments can be defined in yaml by specifying a pairs of models and datasets as well as their parameters. You can find examples of experiment definitions ins the experiments folder, e.g. run

ava experiment experiments/mnist.yaml

Important variables are:

The experiment can generate tables which can be directly used in tex files. These are the relevant parameters:

for more parameters see the example experiment files


Preparation Tools

There are some useful preparation tools available using

ava tool <tool>

Here <tool> can be one of these:


Development

This folder and its subfolder contain modules that are relevant for the core functions of the library, e.g. training and scoring models.

General Structure

Ava is strcutured into several sub-packages.

Core

This sub-package contains core functions such as for launching training and experiments. It involves the following subpackages:

Additionally these modules exist:

Model

The models directory contains implementations of several models.

Folders