IBM Watson Machine Learning Community Edition (WML-CE) and Open Cognitive Environment (Open-CE)

Watson Machine Learning Community Edition is an IBM Cognitive Systems offering that is designed for the rapidly growing and quickly evolving AI category of deep learning. WML-CE and Open-CE brings a suite of capabilities from the open source community and combines them into a single enterprise distribution of software that incorporates complete lifecycle management from installation and configuration; data ingest and preparation; building, optimizing, and training the model; to inference; testing; and moving the model into production. WML-CE and Open-CE takes advantage of a distributed architecture to help enable your teams to quickly iterate through the training cycle on more data to help continuously improve the model over time.

WML-CE and Open-CE are designed for scale, with software optimised for both single server and cluster deep learning training. It offers many optimizations that can ease installation and management, and can help accelerate performance:

  • Ready-to-use deep learning frameworks (TensorFlow, PyTorch, Caffe, Caffe2, ONNX, and Keras).

  • Powerful and scalable machine learning libraries (Snap ML and NVIDIA RAPIDS).

  • Distributed as prebuilt containers, or on demand through the Conda provisioning process.

  • Includes dependencies and libraries.

  • Easy updates: Code updates arrive from a repository.

  • Validated deep learning platform with each release.

  • Dedicated support teams for deep learning.

  • Designed for enterprise scale with multisystem cluster performance and large memory support.

  • Supported on GPU accelerated IBM AC922 servers; and also supported on accelerated x86 architecture servers.

[1] Install Anaconda

All users on Satori will have two folders:

/home/<username>
/nobackup/users/<username>

If you choose to use your own Anaconda install, rather than the system-wide modules, please download and install Anaconda3 in: /nobackup/users/<your-username>/anaconda3. This is because the /nobackup disk partition has much more space compared with /home. In addition all files in /home will be automaticlay backuped compared with /nobackup partition. Anaconda3 can be install at any time in less then 10 minutes, therefore no backup is need.

cd /nobackup/users/$(whoami)
wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-ppc64le.sh
sh Anaconda3-2022.05-Linux-ppc64le.sh -f -p /nobackup/users/$(whoami)/anaconda3
source ~/.bashrc

By default Anaconda will be insalled in your home folder under anaconda3 and all the WMLCE packages will be installed in a sub-directory on chosen virtual name/folder (ie. anaconda3/envs/wmlce-1.7.0)

You may also refer to the Using Anaconda page for steps on how to use the system-wide environment modules (suggested method).

[2] WML-CE and Open-CE: Setting up the software repository

The WML-CE and Open-CE MLDL packages are distributed as conda packages in an online conda repository. conda must be configured to give priority to installing packages from this channel.

Add the IBM WML-CE channel to the conda configuration by running the following command:

conda config --prepend channels \
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/

Add the MIT Open-CE channel to the conda configuration by running the following command:

conda config --prepend channels \
https://opence.mit.edu

NOTE: Moving forward with new AI frameworks and new related packages, the OPENCE.MIT.EDU conda channel will the prefered choice.

If you are not finding the packages you need through these channels, add the following open-ce channel that has some of the newer versions of things like Pytorch:

conda config --prepend channels https://ftp.osuosl.org/pub/open-ce/current

[4] WML-CE: Installing all frameworks at the same time

All the MLDL frameworks except RAPIDS packages can be installed at the same time by using the powerai meta-package. All the RAPIDS packages can be installed using the powerai-rapids meta-package.

conda install powerai

Additionaly pachages can be installed with the conda environment activated, runining the following command:

conda install <package name>

For example:

conda install tensorflow
conda install pytorch
conda install powerai-rapids
conda install dali
conda install apex

For specific python package versions you can search with conda as in example bellow (by indicaticating the channel if not included as described above) :

conda search pytorch
conda search pytorch==1.7.1 -c https://opence.mit.edu
conda search 'pytorch>=1.6' -c https://opence.mit.edu

NOTE: During the conda install, the packages are downloaded from the internet and after downloading, the license agreement is presented. Read the license agreement and accept the terms and conditions to complete the install. If you decline the license agreement the packages are not installed. After you finish reading the license agreement, future installations can be automated to silently accept the license agreement by running the following command before running the conda install command:

export IBM_POWERAI_LICENSE_ACCEPT=yes

The license accept has to be done only once on a per user basis.

[5] WML-CE: Testing ML/DL frameworks (Pytorch, TensorFlow etc) installation

conda activate wmlce-1.7.0
python
  1. PYTORCH

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
torch.manual_seed(1)
lin = nn.Linear(5, 3)  # maps from R^5 to R^3, parameters A, b
# data is 2x5.  A maps from 5 to 3... can we map "data" under A?
data = torch.randn(2, 5)
print(lin(data))  # yes
  1. TensorFlow

import tensorflow as tf
from __future__ import print_function
# bellow two rows are for TF1.x compatibility mode in TF2.x - don't use them with TF1.x
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
# Create a Constant op
# The op is added as a node to the default graph.
#
# The value returned by the constructor represents the output
# of the Constant op.
hello = tf.constant('Hello, TensorFlow!')
# Start tf session
sess = tf.Session()
# Run the op
print(sess.run(hello))

c. Caffe Test with LSF workload manager; this will run remote in one of the Satori compute nodes available

cd ~/
conda install keras
wget https://raw.githubusercontent.com/mit-satori/getting-started/master/lsf-templates/template-caffetest-singlenode.lsf
bsub < template-caffetest-singlenode.lsf
bjobs
bjobs
bpeek
bpeek
bpeek
bjobs

The template-caffe-test-singlenode.lsf consist in the following LSF file:

#BSUB -L /bin/bash
#BSUB -J "caffe-test"
#BSUB -o "caffe-test_o.%J"
#BSUB -e "caffe-test_e.%J"
#BSUB -n 4
#BSUB -R "span[ptile=4]"
#BSUB -gpu "num=4"
#BSUB -q "normal"
#BSUB -x

HOME2=/nobackup/users/$(whoami)
PYTHON_VIRTUAL_ENVIRONMENT=wmlce-1.7.0
CONDA_ROOT=$HOME2/anaconda3
source ${CONDA_ROOT}/etc/profile.d/conda.sh
conda activate $PYTHON_VIRTUAL_ENVIRONMENT

caffe-test

You can try even your custom ML/DL code; in case you have missing libraries don’t forget to install them with:

conda instal <package name>
pip install <package name>

If you don’t have any more errors you are ready to submit jobs on the compute nodes :)

Controlling WML-CE release packages

The conda installer uses a set of rules to determine which packages to install. Channel priorities and package versions are weighted heavily, but the installer also considers factors such as the number of packages that would need to be installed, whether any packages would need to be upgraded or removed, and so on.

The conda installer will sometimes come up with a surprising installation solution. It may prefer to install: Packages from Anaconda channels over the WML CE channel in spite of channel priorities. Packages from an older release of WML CE in spite of newer versions being available. You can guide the conda installer to ensure that it chooses the desired WML CE package using the strict channel priority option and the powerai-release meta-package.

  1. Strict channel priority

The strict channel priority option forces the conda installer to give additional weight to the priority of channels defined in the configuration. It is useful in cases where the conda installer is preferring packages from lower-priority channels. The simplest use is just to add –strict-channel-priority to the install command:

conda install --strict-channel-priority tensorflow

You can check the priority of the channels in the configuration by running the following:

conda config --show
...
channel_priority: flexible
channels:
  - https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
  - defaults
...

You could permanently change the channel priority setting to strict:

conda config --set channel_priority strict
  1. WML-CE release meta-package

The powerai-release meta-package can be used to specify the WML CE release you want to install from. It is useful when the installer prefers packages from an earlier release, or if you intentionally want to install packages from an older release. Examples:

(my-wmlce-env) $ conda install pytorch powerai-release=1.7.0
(my-wmlce-env) $ conda install pytorch powerai-release=1.6.2

The –strict-channel-priority option can be used with powerai-release for greater control:

conda install --strict-channel-priority pytorch powerai-release=1.7.0

Additional conda channels

The main WML CE conda channel is described above. That channel includes the formal, supported WML CE releases.

Additional conda channels are available to complement the main channel. Packages in these channels are not formally supported. Both of these channels are optional. WML CE will install and run fine without either. They can also be used independently of each other (Supplementary does not need Early Access or vice versa). Use them if you want the packages they provide and do not need formal support.

The WML CE Supplementary channel is available at: https://anaconda.org/powerai/.

This channel includes packages that are not part of WML CE, but which may be useful to WML CE users. The packages are built from recipes in the WML CE GitHub repository: https://github.com/ibm/powerai.

Problem reports and recipe contributions from the community are welcome. More information about the Supplementary channel can be found in the PowerAI Supplementary Channel README.

The WML-CE Early Access channel is available at: https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda-early-access/.

This channel is updated occasionally with latest versions of various packages included in WML CE. The purpose of the channel is to make new versions of frameworks available in advance of formal WML CE releases. Packages published in the Early Access channel may not exactly match a later WML-CE release. For example, package and prerequisite versions may differ.

Packages in the Early Access channel might depend on packages in the main channel, so both channels might be needed in the conda config.

Example of getting EA WML-CE software:

conda config --prepend channels https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda-early-access/
conda create -n wmlce-ea python=3.7
conda activate wmlce-ea
conda install tensorflow

Alternative:

conda config --prepend channels https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda-early-access/
conda create -n wmlce-ea python=3.6
conda activate wmlce-ea
conda install tensorflow=2.1.0=gpu_py36_914.g4f6e601

To test your TF2 code you can use the this Deep Convolutional Generative Adversarial Network jupyter notebook from this link or go to https://github.com/tensorflow/docs/tree/master/site/en/tutorials