Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
  • Artificial Intelligence
  • Engineering
  • Research

Architecting The Future Of Supercomputing

  • August 23, 2023
  • Dean Marc

​As chief architect and principal investigator for the Aurora supercomputer at Argonne National Laboratory in Illinois, Olivier Franza plays a leading role in bringing one of the most ambitious scientific instruments – not to mention the world’s largest GPU cluster – into existence.

Aurora is among the most anticipated and highly visible projects Intel has been a part of in recent memory – a bold bet on Intel’s entire system portfolio. The machine is expected to be the first supercomputer with a peak performance reaching 2 exaflops, or 2×1018, floating point operations per second.


Partner with liwaiwai.com
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

That puts a bit of pressure on Franza, a 22-year Intel veteran who joined the Aurora project as system hardware architect in 2016, oversaw the pivot to a GPU-based machine and became chief architect in 2021.

“The chief architect is responsible for defining the overall system architecture of the supercomputer, according to the customer’s high-level requirements,” Franza explains. “There are fundamental ones like general performance metrics and power envelope, but also inherent features like RAS – reliability, availability, serviceability – that are essential to building a scalable system.”

His responsibilities also encompass the details of the system topology from a node to a rack to the complete system, including its networking fabric and storage components.

A Roadmap Pivot Opens Opportunity to Shape Future Products

When initial planning began for Aurora, a U.S. Department of Energy-sponsored system, the design consisted of a collection of Intel technologies. However, changes to Intel’s product roadmap, notably the end of the Xeon Phi and Omnipath product families, required a restart. As Intel made plans to build data center GPUs, Franza became enmeshed in discussions on the design of the Intel® Data Center GPU Max Series (code-named Ponte Vecchio).

Read More  9 Ways We Use AI In Our Products

In this way, Aurora isn’t just a one-off system. Rather, it helped inform the Intel-wide strategy and product portfolio to address scale and performance at the highest level.

“We infused all the Aurora system-level requirements down to the components’ level,” Franza says.

The architecture and concept for the Intel® Xeon® CPU Max Series with high bandwidth memory, for instance, was spawned by some features from the Intel Xeon Phi platform, the first product to integrate an innovative memory architecture for high bandwidth and high capacity on package.

Additionally, the need for high performance drove further advances across all subsystems, from the compute blade’s thermo-mechanical solution to its dense physical integration, to storage.

“Intel ended up architecting a completely new storage concept, DAOS (distributed asynchronous object storage),” Franza says. It’s an open source software ecosystem to enable high-speed storage on traditional hardware. “Aurora will be among the first systems to use it, and by far the largest.”

From Designing Components to Bolting Together Thousands of Systems

The Aurora project drove system-level thinking and broad collaboration across various business units inside Intel, as well as with Argonne scientists and engineers at Hewlett Packard Enterprise, the project’s other main partner.

“Getting the whole team to align and deliver a machine like Aurora is, for many of us, a once-in-a-lifetime experience,” Franza says.

Although engineers installed the final blade in June, the project continues to keep Franza up at night as the system passes through the stages of testing, stabilization and validation at scale.

He provides guidance to a large team working on system bring-up, validation, stabilization, optimization and enablement of full-system performance workloads. Most notable is the High Performance Linpack (HPL) benchmark that determines the top systems in the world, as certified by the bi-annual Top500 list.

Read More  AI Bottlenecks You Can Clear In 2021

Each morning, Franza joins the daily standup meeting to scrutinize nightly runs on every single node and makes a game plan for the next day’s work and beyond. Each afternoon, a daily closeout meeting summarizes progress and hurdles. The work never stops; the machine always runs.

“We have a step-by-step approach to methodically validate and stabilize at scale,” he explains. “You start with the blade, then move to the rack, then multiple racks, and you scale from there.”

Aurora is made up of 10,624 compute blades, boasting 63,744 Intel Max Series GPUs – more GPUs than any other system in the world – and 21,248 Intel Xeon Max CPUs across 166 racks.

“It’s the size of four tennis courts, which sounds like a lot, right?” he says. “But it’s only when you actually go see it that you just realize the sheer magnitude of the project.”

Franza must ensure the vast system is stable, functional and performing. It’s a daunting task, but the end is within reach.

“Walking through the aisles, with all the lights on, and feeling that the machine is running is impressive and obviously extremely rewarding,” he says. “It’s a very tangible achievement that speaks for itself.”

A ‘Once-in-a-Lifetime’ Effort, a Science-Shaping Supercomputer

What keeps him going, through engineering hurdles and unexpected roadblocks, is the opportunity to build “an extraordinary machine” that will power impactful research. He cites Aurora’s enormous potential for cancer research as an area where the project will benefit us all.

“I think that’s something that is going to make us very proud,” he says.

Read More  Internews, Microsoft, USAID to develop Media Viability Accelerator

Not only will Aurora work on solving some of the most complex scientific and engineering problems in the world, it will also be an ideal platform for running generative AI and applying it to research. “It will enable one of the biggest large language models planned to date, the 1 trillion parameter Aurora GenAI project, enhancing, enabling and easing the lives of scientists,” Franza says.

But it’s the teamwork and camaraderie he enjoys more than anything else.

“It’s an extended effort, and it requires a lot of perseverance,” he says. “The core team has maintained a marathon mentality where it’s not over until it’s over. We needed the kind of people that can effectively focus for a long time on something immensely challenging. And in the end, the accomplishment is something that very few can say they have achieved.”

Source: cyberpogo.com


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

Dean Marc

Part of the more nomadic tribe of humanity, Dean believes a boat anchored ashore, while safe, is a tragedy, as this denies the boat its purpose. Dean normally works as a strategist, advisor, operator, mentor, coder, and janitor for several technology companies, open-source communities, and startups. Otherwise, he's on a hunt for some good bean or leaf to enjoy a good read on some newly (re)discovered city or walking roads less taken with his little one.

Related Topics
  • AI
  • Argonne National Laboratory
  • Artificial Intelligence
  • Aurora supercomputer
  • Data Center
  • Generative AI
  • GPU
  • High Performance Computing
  • Intel
  • Olivier Franza
  • Supercomputer
  • Supercomputing
  • U.S. Department of Energy
You May Also Like
View Post
  • Artificial Intelligence

3 Ways AI Can Help Communities Adapt To Climate Change In Africa

  • September 25, 2023
Robotic Hand | Lights
View Post
  • Artificial Intelligence
  • Technology

Nvidia H100 Tensor Core GPUs Come To Oracle Cloud

  • September 24, 2023
View Post
  • Artificial Intelligence
  • Engineering
  • Technology

AI-Driven Tool Makes It Easy To Personalize 3D-Printable Models

  • September 22, 2023
View Post
  • Artificial Intelligence
  • Data

Applying Generative AI To Product Design With BigQuery DataFrames

  • September 21, 2023
View Post
  • Artificial Intelligence
  • Platforms

Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes

  • September 21, 2023
Microsoft and Adobe
View Post
  • Artificial Intelligence
  • Machine Learning
  • Platforms

Microsoft And Adobe Partner To Deliver Cost Savings And Business Benefits

  • September 21, 2023
View Post
  • Artificial Intelligence
  • Technology

Huawei Connect 2023: Accelerating Intelligence For Shared Success

  • September 20, 2023
View Post
  • Artificial Intelligence
  • Engineering
  • Platforms
  • Tools

Document AI Workbench Is Now Powered By Generative AI To Structure Document Data Faster

  • September 15, 2023
A Field Guide To A.I.
Navigate the complexities of Artificial Intelligence and unlock new perspectives in this must-have guide.
Now available in print and ebook.

charity-water



Stay Connected!
LATEST
  • 1
    3 Ways AI Can Help Communities Adapt To Climate Change In Africa
    • September 25, 2023
  • Robotic Hand | Lights 2
    Nvidia H100 Tensor Core GPUs Come To Oracle Cloud
    • September 24, 2023
  • 3
    AI-Driven Tool Makes It Easy To Personalize 3D-Printable Models
    • September 22, 2023
  • 4
    Applying Generative AI To Product Design With BigQuery DataFrames
    • September 21, 2023
  • 5
    Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes
    • September 21, 2023
  • Microsoft and Adobe 6
    Microsoft And Adobe Partner To Deliver Cost Savings And Business Benefits
    • September 21, 2023
  • Coffee | Laptop | Notebook | Work 7
    First HP Work Relationship Index Shows Majority of People Worldwide Have an Unhealthy Relationship with Work
    • September 20, 2023
  • 8
    Huawei Connect 2023: Accelerating Intelligence For Shared Success
    • September 20, 2023
  • 9
    Document AI Workbench Is Now Powered By Generative AI To Structure Document Data Faster
    • September 15, 2023
  • Data 10
    UK Space Sector Has Sights Set On Artificial Intelligence And Machine Learning Professionals
    • September 15, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • Intel Innovation 1
    Intel Innovation 2023
    • September 15, 2023
  • 2
    Microsoft And Oracle Expand Partnership To Deliver Oracle Database Services On Oracle Cloud Infrastructure In Microsoft Azure
    • September 14, 2023
  • 3
    Real-Time Ubuntu Is Now Available In AWS Marketplace
    • September 12, 2023
  • 4
    IBM Brings Watsonx To ESPN Fantasy Football With New Waiver Grades And Trade Grades
    • September 13, 2023
  • 5
    IBM Announced As A Sponsor Of 2023 U.N. Climate Change Conference (COP28)
    • September 13, 2023
  • /
  • Artificial Intelligence
  • Explore
  • About
  • Contact Us

Input your search keywords and press Enter.