Researchers Optimize Codes for Polaris at Latest Hackathon

[ad_1]

Nov. 8, 2022 — The highly effective computing assets that drive innovation in scientific analysis evolve quickly, with new {hardware} and software program applied sciences rising continuously.

The ALCF-NVIDIA GPU Hackathon hosted a complete of 11 groups to assist them get their functions working effectively on high-performance computing machines corresponding to Polaris. Credit score: Argonne.

To assist hold the consumer group apprised of such developments, the Argonne Management Computing Facility (ALCF), a U.S. Division of Power (DOE) Workplace of Science consumer facility at DOE’s Argonne Nationwide Laboratory, hosts a number of annual occasions to coach researchers on find out how to greatest make the most of varied software program, techniques, and machines to additional discover the probabilities of science.

This yr, the ALCF in collaboration with NVIDIA and OpenACC Group, hosted a multi-day digital hackathon, the primary occasion with entry to Argonne’s new Polaris system, an HPE Apollo Gen10+ machine geared up with NVIDIA  A100 Tensor Core GPUs (graphics processing models) and AMD EPYC processors.

The hackathon is designed to assist groups of three to 6 builders speed up their codes on ALCF assets utilizing a transportable programming mannequin or an AI framework of their selection. Every workforce is assigned mentors from ALCF and NVIDIA, who make the most of their experience and expertise to information contributors on porting their code to GPUs and optimizing its efficiency.

A complete of 11 groups participated this yr, researching an enormous array of subjects together with black gap imaging, fusion plasma dynamics, and the event of artificial genes that can assist predict the viral escape of SARS-CoV-2 genomes. With entry to Polaris, these groups have been in a position to optimize their codes on the ALCF’s largest GPU-powered system so far.

The Polaris software program surroundings is supplied with the HPE Cray programming surroundings, HPE Efficiency Cluster Supervisor (HPCM) system software program, and the flexibility to check programming fashions, corresponding to OpenMP and SYCL, that shall be obtainable on Aurora and the following technology of DOE’s excessive efficiency computing (HPC) techniques. Those that utilized Polaris this yr additionally benefited from the NVIDIA HPC Software program Growth Equipment (SDK), a collection of compilers, libraries, and instruments to assist GPU acceleration of HPC modeling and simulation functions

Nevertheless, the customers weren’t the one ones who benefited from utilizing Polaris through the hackathon, as they have been additionally in a position to assist stress check the system and establish software program points, offering data that helped the ALCF workers enhance the software program surroundings on Polaris forward of its deployment to the broader HPC group. The hackathon attendees additionally requested a number of questions on find out how to use the brand new system for a number of circumstances corresponding to submitting jobs, compiling codes, utilizing efficiency and debugging instruments, which in flip helped ALCF workers to enhance assist documentation.

“Whereas we try for having all of the {hardware} and software program kinks labored out of a brand new system earlier than opening it up, there are sadly at all times some points that new customers will expertise on a system given the good number of workloads we assist at ALCF,” says Chris Knight, ALCF computational scientist. “Opening Polaris entry for this group of utility builders spanning simulation, knowledge, and studying workloads through the hackathon, the place they might work straight with ALCF workers to resolve points, tremendously improved the preliminary consumer expertise for the remainder of the group once they gained entry.”

Argonne’s Brian Homerding gives an summary of Polaris {hardware} on the Hackathon. Credit score: Argonne.

For the Black Gap Hunter workforce from the Middle for Astrophysics | Harvard & Smithsonian, the hackathon provided a chance to advance the event of their GPU-based utility for processing knowledge noticed by the next-generation Occasion Horizon Telescope (ngEHT) to reconstruct black gap pictures. Working with their mentors and ALCF computing useful resource, the researchers got down to improve the computational effectivity of GPU kernel capabilities and enhance end-to-end throughput by tuning enter/output (I/O).

The workforce realized find out how to use the NVIDIA Nsight Programs software to investigate the efficiency of every element of their utility, in addition to the significance of cautious profiling to isolate and deal with efficiency bottlenecks. They found that their utility spent far more time on reminiscence copies than on GPU computing, indicating a necessity to extend the concurrency of the 2 processes. After eradicating the redundant time monitor code of their GPU module, the workforce was in a position to cut back the time consumed per knowledge block from 5 milliseconds to 1.4 milliseconds.

“Our mentors gave us very worthwhile recommendations and recommendation find out how to optimize the modules,” says Wei Yu, a member of the Black Gap Hunter workforce. “We made wonderful connections with our ALCF mentors and your entire hackathon workforce at Argonne and NVIDIA, in addition to the contributors.”

The workforce plans to additional optimize their utility with their hackathon mentors serving to out as advisors alongside the best way.

One other workforce attended the hackathon to proceed their work to enhance the efficiency of the Nek5000/NekRS computational fluid dynamics code on varied superior architectures. The workforce, consisting of researchers from Argonne and the College of Illinois, was in a position to scale to all of Polaris demonstrating an 80% strong-scale effectivity at 3 million factors per GPU.

At scale, the workforce made algorithmic enhancements that led to a ten% enchancment in time-to-solution. The researchers additionally demonstrated that working NekRS on Polaris was in a position to cut back time-to-solution for a Nek5000 benchmark downside for the U. S. Nuclear Regulatory Fee by an order of magnitude in comparison with the ALCF’s Theta system.

Keep tuned to the ALCF occasions webpage for particulars on upcoming facility workshops and coaching occasions.

About Argonne

The Argonne Management Computing Facility gives supercomputing capabilities to the scientific and engineering group to advance elementary discovery and understanding in a broad vary of disciplines. Supported by the U.S. Division of Power’s (DOE’s) Workplace of Science, Superior Scientific Computing Analysis (ASCR) program, the ALCF is one among two DOE Management Computing Amenities within the nation devoted to open science.

Argonne Nationwide Laboratory seeks options to urgent nationwide issues in science and expertise. The nation’s first nationwide laboratory, Argonne conducts modern fundamental and utilized scientific analysis in nearly each scientific self-discipline. Argonne researchers work intently with researchers from a whole lot of firms, universities, and federal, state and municipal companies to assist them resolve their particular issues, advance America’s scientific management and put together the nation for a greater future. With staff from greater than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Division of Power’s Workplace of Science.


Supply: Logan Ludwig, ALCF

[ad_2]

Source_link

Leave a Reply

Your email address will not be published. Required fields are marked *