Eztorch#

Introduction#

Eztorch is a library to make training, validation, and testing in Pytorch easy to perform image and video self-supervised representation learning and evaluate those representations on downstream tasks.

It was first developed to factorize code during Julien Denize’s PhD thesis which was on Self-supervised representation learning and applications to image and video analysis. It led to several academic contributions:

To ease the use of the code, documentation has been built.

How to Install#

To install this repository you need to install a recent version of Pytorch (>= 2.) and all Eztorch dependencies.

You can just launch the following command:

cd eztorch
conda create -y -n eztorch
conda activate eztorch
conda install -y pip
conda install -y -c conda-forge libjpeg-turbo
pip install -e .
pip uninstall -y pillow
CC="cc -mavx2" pip install -U --force-reinstall pillow-simd

The argument -e makes a dev installation that allows you to make changes in the repository without needing to install the package again. It is optional.

If you want a lighter installation that only installs the main dependencies you need the requirement file by requirements_lite.txt and then launch the pip install.

How to use#

  1. Read tutorials on Pytorch-Lightning and Hydra to be sure to understand those libraries.

  2. Take a look at Eztorch documentation.

  3. Use configs in eztorch/configs/run/ or make your own

  4. Pass your config to running scripts in run/ folder.

Eztorch is a library, therefore you can import its components from anywhere as long as your Python environment has Eztorch installed.

from eztorch.models.siamese import SCEModel

model = SCEModel(...)

Dependencies#

Eztorch relies on various libraries to handle different parts of the pipeline:

Why do something worse than people who know best?

Its main dependencies are:

  • Pytorch-lightning for easy setup of:

    • Preparing data through the datamodules

    • Models through the Lightning modules

    • Training, validating, and testing on various device types (CPU, GPU, TPU) with or without distributed training through the trainer

  • Hydra to make configuration of your various experiments:

    • Write configurations in Python or Yaml

    • Enjoy hierarchical configuration

    • Let Hydra instantiate

    • Speak the same language in Bash or Python to configure your jobs

  • Torchaug for efficient GPU and batched data augmentations as a replacement to Torchvision when relevant.

For specific dependencies, we can cite:

  • Timm to instantiate image models

  • Pytorchvideo for video pipeline:

    • Clip samplers to select one or multiple clips per video

    • Datasets with decoders to read videos

    • Specific transforms for videos

    • Models for videos

How to contribute#

To contribute follow this process:

  1. Make an issue if you find it necessary to discuss the changes with maintainers.

  2. Checkout to a new branch.

  3. Make your modifications.

  4. Document your changes.

  5. Ask for merging to main.

  6. Follow the merging process with maintainers.

Issue#

If you found an error, have trouble making this work or have any questions, please open an issue to describe your problem.

License#

This project is under the CeCILL license 2.1.