A Tutorial on IBM LSF Scheduler with Examples

Introduction to IBM LSF Scheduler:

IBM Spectrum LSF (Load Sharing Facility) is a powerful workload management and job scheduling system designed to optimize the utilization of computing resources in high-performance computing (HPC) environments. LSF provides a robust set of features that enable efficient resource allocation, job scheduling, and management, making it an essential tool for large-scale parallel computing.

In this tutorial, we will explore the basics of using IBM LSF Scheduler, covering key concepts and command-line examples to help you get started with managing your HPC workload effectively.

  1. LSF Terminology:

  2. Host: A computing resource available in the cluster, which can be a physical machine or a virtual machine.

  3. Queue: A group of hosts sharing similar characteristics, such as hardware configuration or software environment.
  4. Job: A unit of work submitted to the scheduler, which consists of executable code and its resource requirements.
  5. Job ID: A unique identifier assigned to each submitted job by the scheduler.
  6. Job Submission Script: A script that defines job-specific settings and resource requirements, submitted to the scheduler for execution.
  7. bsub: Command to submit a job to the LSF scheduler.

  8. Submitting a Simple Batch Job:

To submit a batch job, create a job submission script (e.g., my_job.sh) with the following content:

#!/bin/bash
#BSUB -J my_job_name
#BSUB -n 4
#BSUB -R "span[ptile=2]"
#BSUB -W 1:00

echo "Running my batch job on LSF!"
  • #BSUB -J my_job_name: Sets the job name.
  • #BSUB -n 4: Requests 4 CPU cores for the job.
  • #BSUB -R "span[ptile=2]": Requests 2 CPU cores per host.
  • #BSUB -W 1:00: Sets the maximum wall clock time to 1 hour.

Submit the job using the bsub command:

bsub < my_job.sh
  1. Checking Job Status:

Use bjobs to check the status of your jobs:

bjobs
  1. Deleting a Job:

If you want to remove a submitted job from the queue, use bkill:

bkill JOB_ID
  1. Job Dependencies:

You can set job dependencies using the -w option in the bsub command. For example, to run "job2" only after "job1" completes:

bsub -w "ended(job1)" < job2.sh
  1. Using Parallel Environment:

To run parallel jobs, define a parallel environment and use it in your job submission script:

#!/bin/bash
#BSUB -J my_parallel_job
#BSUB -n 8
#BSUB -R "span[ptile=4]"
#BSUB -W 2:00
#BSUB -q my_queue
#BSUB -env "OMP_NUM_THREADS=2"

mpiexec -n 8 ./my_mpi_program
  1. Job Arrays:

Job arrays allow you to submit multiple jobs with similar characteristics:

#!/bin/bash
#BSUB -J my_array_job[1-10]
#BSUB -n 4
#BSUB -R "span[ptile=2]"
#BSUB -W 0:30

echo "Running array job ${LSB_JOBINDEX} on LSF!"

Submit the job array:

bsub < my_array_job.sh

Conclusion:

IBM LSF Scheduler is a robust and feature-rich tool for managing HPC workloads in large-scale environments. This tutorial provided an introduction to key LSF concepts and demonstrated how to submit batch jobs, manage job dependencies, use parallel environments, and utilize job arrays. With this knowledge, you can effectively leverage the power of IBM LSF to optimize resource utilization and streamline your high-performance computing tasks.

Comments

Popular posts from this blog

PyTorch Tutorial: Using ImageFolder with Code Examples

Explaining Chrome Tracing JSON Format