Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
  • Data

Scaling Machine Learning Inference With NVIDIA Tensorrt And Google Dataflow

  • January 26, 2023
  • relay

A collaboration between Google Cloud and NVIDIA has enabled Apache Beam users to maximize the performance of ML models within their data processing pipelines, using NVIDIA TensorRT and NVIDIA GPUs alongside the new Apache Beam TensorRTEngineHandler.

The NVIDIA TensorRT SDK provides high-performance, neural network inference that lets developers optimize and deploy trained ML models on NVIDIA GPUs with the highest throughput and lowest latency, while preserving model prediction accuracy. TensorRT was specifically designed to support multiple classes of deep learning models, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformer-based models.

Deploying and managing end-to-end ML inference pipelines while maximizing infrastructure utilization and minimizing total costs is a hard problem. Integrating ML models in a production data processing pipeline to extract insights requires addressing challenges associated with the three main workflow segments:

  1. Preprocess large volumes of raw data from multiple data sources to use as inputs to train ML models to “infer / predict” results, and then leverage the ML model outputs downstream for incorporation into business processes.
  2. Call ML models within data processing pipelines while supporting different inference use-cases: batch, streaming, ensemble models, remote inference, or local inference. Pipelines are not limited to a single model and often require an ensemble of models to produce the desired business outcomes.
  3. Optimize the performance of the ML models to deliver results within the application’s accuracy, throughput, and latency constraints. For pipelines that use complex, computate-intensive models for use-cases like NLP or that require multiple ML models together, the response time of these models often becomes a performance bottleneck. This can cause poor hardware utilization and requires more compute resources to deploy your pipelines in production, leading to potentially higher costs of operations.
Read More  Better Data For Better AI: New Speech Datasets And Benchmarks For Data

Google Cloud Dataflow is a fully managed runner for stream or batch processing pipelines written with Apache Beam. To enable developers to easily incorporate ML models in data processing pipelines, Dataflow recently announced support for Apache Beam’s generic machine learning prediction and inference transform, RunInference. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use models in production pipelines without needing lots of boilerplate code.

You can see an example of its usage with Apache Beam in the following code sample. Note that the engine_handler is passed as a configuration to the RunInference transform, which abstracts the user from the implementation details of running the model.

engine_handler = TensorRTEngineHandlerNumPy(
          min_batch_size=4,
          max_batch_size=4,
          engine_path=
          'gs://gcp_bucket/single_tensor_features_engine.trt')

pcoll = pipeline | beam.Create(SINGLE_FEATURE_EXAMPLES)
predictions = pcoll | RunInference(engine_handler)

Along with the Dataflow runner and the TensorRT engine, Apache Beam enables users to address the three main challenges. The Dataflow runner takes care of pre-processing data at scale, preparing the data for use as model input. Apache Beam’s single API for batch and streaming pipelines means that RunInference is automatically available for both use cases. Apache Beam’s ability to define complex multi-path pipelines also makes it easier to create pipelines that have multiple models. With TensorRT support, Dataflow now also has the ability to optimize the inference performance of models on NVIDIA GPUs.

For more details and samples to start using this feature today please have a look at the NVIDIA Technical Blog, “Simplifying and Accelerating Machine Learning Predictions in Apache Beam with NVIDIA TensorRT.” Documentation for RunInference can be found at the Apache Beam document site and for Dataflow docs.

Read More  Google Cloud Next 2019 | Think Big, Think Global

By: Reza Rokni (Senior Staff Developer Advocate Dataflow) and Ruichao Ren (Deep Learning Specialist)
Source: Google Cloud Blog

relay

Related Topics
  • Apache Beam
  • Google Cloud
  • Google Dataflow
  • Machine Learning
  • NVIDIA
  • TensorRT
You May Also Like
View Post
  • Data
  • Design
  • Engineering
  • Tools

Sumitovant More Than Doubles Its Research Output In Its Quest To Save Lives

  • March 21, 2023
View Post
  • Data
  • Platforms
  • Technology

How Osmo Is Digitizing Smell With Google Cloud AI Technology

  • March 20, 2023
View Post
  • Data
  • Engineering
  • Tools

Built With BigQuery: How Sift Delivers Fraud Detection Workflow Backtesting At Scale

  • March 20, 2023
View Post
  • Data

Understand And Trust Data With Dataplex Data Lineage

  • March 17, 2023
View Post
  • Big Data
  • Data

The Benefits And Core Processes Of Data Wrangling

  • March 17, 2023
View Post
  • Artificial Intelligence
  • Data
  • Machine Learning
  • Technology

ChatGPT: How To Prevent It Becoming A Nightmare For Professional Writers

  • March 16, 2023
View Post
  • Data
  • Engineering
  • Machine Learning

Sentiment Analysis With BigQuery ML

  • March 13, 2023
View Post
  • Artificial Intelligence
  • Data

Introducing Casual Conversations v2: A More Inclusive Dataset To measure Fairness

  • March 13, 2023
Stay Connected!
LATEST
  • 1
    6 ways Google AI Is Helping You Sleep Better
    • March 21, 2023
  • 2
    AI Could Make More Work For Us, Instead Of Simplifying Our Lives
    • March 21, 2023
  • 3
    Microsoft To Showcase Purpose-Built AI Infrastructure At NVIDIA GTC
    • March 21, 2023
  • 4
    The Next Generation Of AI For Developers And Google Workspace
    • March 21, 2023
  • 5
    Sumitovant More Than Doubles Its Research Output In Its Quest To Save Lives
    • March 21, 2023
  • 6
    How Osmo Is Digitizing Smell With Google Cloud AI Technology
    • March 20, 2023
  • 7
    Built With BigQuery: How Sift Delivers Fraud Detection Workflow Backtesting At Scale
    • March 20, 2023
  • 8
    Building The Most Open And Innovative AI Ecosystem
    • March 20, 2023
  • 9
    Understand And Trust Data With Dataplex Data Lineage
    • March 17, 2023
  • 10
    Limits To Computing: A Computer Scientist Explains Why Even In The Age Of AI, Some Problems Are Just Too Difficult
    • March 17, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • 1
    The Benefits And Core Processes Of Data Wrangling
    • March 17, 2023
  • 2
    We Cannot Even Agree On Dates…
    • March 17, 2023
  • 3
    Financial Crisis: It’s A Game & We’re All Being Played
    • March 17, 2023
  • 4
    Using ML To Predict The Weather And Climate Risk
    • March 16, 2023
  • 5
    Google Is A Leader In The 2023 Gartner® Magic Quadrant™ For Enterprise Conversational AI Platforms
    • March 16, 2023
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
  • About

Input your search keywords and press Enter.