Distributed Deep Learning

The following tools are libraries, which provide the communication functions necessary to perform distributed training. Primarily allReduce and broadcast functions.

IBM Spectrum MPI: Classic tool for distributed computing. Still commonly used for distributed deep learning.
NVIDIA NCCL: Nvidia’s gpu-to-gpu communication library. Since NCCL2, between-node communication is supported.
IBM DDL: Provides a topology-aware all-Reduce. Capable of optimally dividing communication across hierarchies of fabrics. Utilizes different communication protocols at different hierarchies. When WMLCE is installed all related frameworks are comming with IBM DDL support, you don’t have to compile additional software packages, only to modify your training scripts to make use of the need distributed deep learning APIs.

Integrations into deep learning frameworks to enable distributed training is using common communication libraries such as:

TensorFlow Distribution Strategies. Native Tensorflow distribution methods.
IBM DDL. Provides integrations into common frameworks, including a Tensorflow operator that integrates IBM DDL with Tensorflow and similar for Pytorch.
Horovod [Sergeev et al. 2018]. Provides integration libraries into common frameworks which enable distributed training with common communication libraries, including. IBM DDL can be used as backend for Horovod implementation.

IBM DDL - Documentation and Tutorial:

IBM DDL integration with TensorFlow/Keras
IBM DDL integration with Pytorch
IBM DDL integration with Horovod
IBM DDL APIs for a better integration

Examples:

Keras/TensorFlow
Pytorch

How to get Horovod with DDL? follow bellow instructions (optional 0 - 2 if you have already install WMLCE):

Add ppc64le conda channel for WMLCE

conda config --prepend channels \
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/

Create Conda Virtual Environment

conda create --name horovod python=3.6

Install WMLCE (TF, Pytorch, DDL etc)

conda install powerai

Install the packages to build Horovod

conda install gxx_linux-ppc64le=7.3.0 cffi cudatoolkit-dev

Install Horovod with DDL backend

HOROVOD_CUDA_HOME=$CONDA_PREFIX HOROVOD_GPU_ALLREDUCE=DDL pip install horovod --no-cache-dir

or with NCCL direct support (recomanded for Pytorch)

env HOROVOD_CUDA_HOME=$CONDA_PREFIX HOROVOD_NCCL_HOME=$CONDA_PREFIX HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL pip install --no-cache horovod

Original IBM DDL paper, can be found at this URL: https://arxiv.org/pdf/1708.02188.pdf