Products
SmartSuspend

SmartSuspend Resources:

SmartSuspend provides batch job preemption capability to maximize data center efficiency.

Technology

SmartSuspend provides job preemption capability. Data center administrators within a batch-computing environment need to balance multiple goals, including: the need to improve job throughput, meet service level agreements, and maximize resource utilization. These goals often conflict with each other, resulting in a compromise in one or more areas.

Evergrid's SmartSuspend provides an innovative solution to meet these seemingly conflicting requirements. SmartSuspend enables a running job to be safely preempted by a higher priority job. It achieves this by suspending the current lower priority job and relinquishing both the memory and license (if applicable) to enable a set amount of system resources on the server node to be dedicated to the higher priority job. When the higher priority job completes (assuming no other waiting higher priority jobs), the allocated amount of system resource is automatically reassigned to the lower priority job, and that job continues from where it left off. This solution ensures no compute cycles are lost, thereby increasing job throughput while maximizing server utilization.

How SmartSuspend Works

With SmartSuspend, two jobs can be suspended when a higher priority job needs immediate access to a set amount of server resources. Upon the completion of the high priority job, the two suspended jobs are resumed automatically from the point at which they were suspended.

Features

Based on Evergrid's ground-breaking OS Abstraction Layer, SmartSuspend preempts a running job by suspending the application's CPU usage, paging out memory, and keeping only a minimal footprint, while keeping sockets alive. Traditional job suspend methods send SIGSTOP or SIGTSTP to the jobs. However, one challenge with this approach is that the suspended jobs are still holding on to the system memory, as well as software licenses allocated to them. SmartSuspend relinquishes both the memory and licenses (if applicable) to enable the system resources on the server node to be dedicated to the higher priority job. SmartSuspend can be used with popular queuing systems, such as Platform LSF or PBS Professional. SmartSuspend has a very simple command line interface enabling integration with 3rd party queuing systems with minimal effort. The command line utility can be used interchangeably with both parallel and serial applications, and can be run from anywhere on the network as long as ssh access is available. SmartSuspend is a lightweight solution with less than 1% performance overhead, and it is completely transparent to both applications and the operating system. SmartSuspend supports all serial applications, such as EDA applications, as well as CAE, CFD, and other MPI-based parallel applications.

System Requirements

Contact sales@evergrid.com or call 866.993.4743 to learn more about how SmartSuspend can add value in your environment