Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
  • Data
  • Engineering

Shorten The Path To Insights With Aiven For Apache Kafka And Google BigQuery

  • March 9, 2023
  • relay

Every company aims to be data driven, but bringing accurate data in front of the right stakeholders in a timely manner can be quite complex. The challenge arises even more when the source data resides in different technologies, with various access interfaces and data formats.

This is where the combination of Aiven for Apache Kafka® and Google BigQuery excels, by providing the ability to source the data from a wide ecosystem of tools and, in streaming mode, push it to BigQuery where datasets can be organized, manipulated and queried.

From data sources to Apache Kafka with Kafka Connect

Aiven, together with Apache Kafka, offers the ability to create a managed Kafka Connect cluster. The range of 30+ connectors available enables integrating Kafka with a wide set of various technologies as both source and sink using a JSON configuration file. Even more, if the connector for the particular technology needed isn’t in the list, an integration with a self-managed Kafka Connect cluster provides complete freedom on the connector selection, while keeping the benefit of the fully-managed Apache Kafka cluster.

If the datasource is a database, connectors like the Debezium source for PostgreSQL can enable a reliable and fast change data capture mechanism using the native database replication features, thereby adding minimal load on the source system.

Data in Apache Kafka

During the ingestion phase, to optimize throughput, connectors can use the Avro data format and store the data’s schema in Karapace, Aiven’s open source tool for schema registry and REST API endpoints.

Data in Apache Kafka is stored in topics which can have an associated retention period defining the amount of time or space for which the data will be kept. The topics can be read by one or more consumers independently or in competition as part of the same application (“consumer group” in Apache Kafka terms).

If some reshaping of the data is needed, before it lands on the target datastore, Aiven for Apache Flink allows, in streaming mode, to perform such transformations by using SQL statements. Cleansing or enrichment projects with data coming from different technologies are common examples.

Read More  How We Build With And For People With Disabilities

Push data to Google BigQuery

Once the data is in the right shape to be analyzed, the Apache Kafka topic can be pushed to BigQuery in streaming mode using the dedicatedsink connector. The connector has a wide range of configuration options including the timestamp to be used for partitioning and the thread pool size defining the number of concurrent writing threads.

The data, coming in streaming mode via Apache Kafka, is now landed in one or more BigQuery tables, ready for further analysis and processing. BigQuery offers a rich set of SQL functions allowing to parse nested datasets, apply complex geographical transformations, and even train and use machine learning models amongst others. The depth of BigQuery SQL functions enable analysts, data engineers and scientists to perform their work in a unique platform using the common SQL language.

A streaming solution for fast analytics

With the wide set of source connectors available and its streaming capabilities, Aiven for Apache Kafka is the perfect fit to enable the data to flow from a huge variety of data sources to BigQuery for analytics.

One example of a customer using this pattern is the retail media platform Streaem, part of Wunderman Thompson. Streaem provides a self-service retail media platform for retailers and their brands to monetise areas of their site and in store digital assets by combining intelligence about their products and signals from their customers along with campaign information provided by advertisers. For example, a user might type “Coke” into a search box, and as well as seeing the regular results they will also see some sponsored listings. Then, as they browse around the site, there could be promoted products based on their previous interaction.

Read More  5 Strategies That Can Improve Your Performance In Machine Learning

Streaem are fully committed to using Google Cloud as their platform of choice, but their technology is event-driven and based around Kafka as a message broker which is not natively available. Using Aiven’s Apache Kafka offering on top of Google Cloud lets Streaem get the best of both worlds; industry-standard event streaming on their preferred cloud, without the headache of managing Kafka themselves. With multiple microservices deployed, all of which need a consistent and up-to-date view of the world, Kafka is an obvious service to place at the center of their world to make sure everything has the latest information in a way which will scale effortlessly as Streaem itself reaches new heights.

“At Streaem we use Kafka as a core part of our platform where event-based data is a key enabler for the services we deliver to our clients” says Garry Turkington, CTO. “Using hosted data services on GCP allows us to focus on our core business logic while we rely on Aiven to deliver a high-performance and reliable platform we can trust.”

Analytics is still a challenge in a Kafka-only world, so Streaem uses a managed open-source Kafka Connector on the Aiven platform to stream the microservices data into Google BigQuery. This means that data about customer activity or keyword auctions or anything else in the live platform are available with low latency into BigQuery, powering Streaem’s reporting dashboards and providing up-to-date aggregations for live decisions to be made. By using Google Cloud Platform, Aiven for Apache Kafka, and BigQuery, Streaem can be confident that their production systems are running smoothly whilst they concentrate their efforts on growing their business.

Other use cases

Aiven for Apache Kafka along with Google Cloud BigQuery is driving crucial insights across a range of industry verticals and use cases. For example:

  • Retail: Demand Planning with BQML, Recommendation Engines, Product Search
    • Aiven is leveraged at a large European retail chain for open source database and event streaming infrastructure (Aiven for Apache Kafka, Aiven for OpenSearch, Aiven for Postgres, Aiven for Redis). The data is then fed to trained models in BigQuery ML to recommend products to purchase. These models can be exposed as APIs managed in Vertex AI for production applications.
  • E-commerce: Real-Time Dynamic Pricing
    • A global travel booking site uses Aiven for data streaming infrastructure (Aiven for Apache Kafka), handling global pricing and demand data in near real-time, and Aiven for OpenSearch for SIEM and application search use cases. Data then flows into BigQuery for analytics, giving the customer a best-in-class enterprise data warehouse.
  • Gaming: Player Analytics
    • Aiven powers data streaming (via Aiven for Apache Kafka) for a Fortune 500 gaming company, supporting multiple gaming titles and more than 100 million players globally. Analytics in BigQuery drives critical insights using player metadata.
Read More  Document AI Adds Three New Capabilities To Its OCR Engine

Conclusion / Next Steps

The combination of Aiven for Apache Kafka and Google BigQuery drives analytics on the latest data in near real time, minimizing the time to insight and maximizing the impact. Customers of Aiven and Google are already taking advantage of this powerful combination, and seeing the benefits to their business. If you would like to experience this for yourself, sign up for Aiven and use the following links to learn more:

  • Aiven for Apache Kafka to discover the features, plans and options available for a managed Apache Kafka service
  • Apache Kafka BigQuery sink connector to review the settings and examples of pushing data from Apache Kafka to BigQuery
  • To learn more about Google Cloud BigQuery, click here.
  • Ready to give it a try? Click here to check out Aiven’s listing on Google Cloud Marketplace, and let us know what you think.

By: Kevin Bowman (Solution Architect, Aiven) and Ritika Suri (Technology Partnerships Director, Google Cloud)
Originally published at Google Cloud Blog

Source: Cyberpogo

relay

Related Topics
  • Aiven
  • Apache Kafka
  • BigQuery
  • Google Cloud
  • Machine Learning
You May Also Like
View Post
  • Artificial Intelligence
  • Engineering
  • Tools

The Next Generation Of AI For Developers And Google Workspace

  • March 21, 2023
View Post
  • Data
  • Design
  • Engineering
  • Tools

Sumitovant More Than Doubles Its Research Output In Its Quest To Save Lives

  • March 21, 2023
View Post
  • Data
  • Platforms
  • Technology

How Osmo Is Digitizing Smell With Google Cloud AI Technology

  • March 20, 2023
View Post
  • Data
  • Engineering
  • Tools

Built With BigQuery: How Sift Delivers Fraud Detection Workflow Backtesting At Scale

  • March 20, 2023
View Post
  • Data

Understand And Trust Data With Dataplex Data Lineage

  • March 17, 2023
View Post
  • Big Data
  • Data

The Benefits And Core Processes Of Data Wrangling

  • March 17, 2023
View Post
  • Artificial Intelligence
  • Data
  • Machine Learning
  • Technology

ChatGPT: How To Prevent It Becoming A Nightmare For Professional Writers

  • March 16, 2023
View Post
  • Engineering
  • People
  • Software Engineering

A Guide To Managing Your Agile Engineering Team

  • March 15, 2023
Stay Connected!
LATEST
  • 1
    6 ways Google AI Is Helping You Sleep Better
    • March 21, 2023
  • 2
    AI Could Make More Work For Us, Instead Of Simplifying Our Lives
    • March 21, 2023
  • 3
    Microsoft To Showcase Purpose-Built AI Infrastructure At NVIDIA GTC
    • March 21, 2023
  • 4
    The Next Generation Of AI For Developers And Google Workspace
    • March 21, 2023
  • 5
    Sumitovant More Than Doubles Its Research Output In Its Quest To Save Lives
    • March 21, 2023
  • 6
    How Osmo Is Digitizing Smell With Google Cloud AI Technology
    • March 20, 2023
  • 7
    Built With BigQuery: How Sift Delivers Fraud Detection Workflow Backtesting At Scale
    • March 20, 2023
  • 8
    Building The Most Open And Innovative AI Ecosystem
    • March 20, 2023
  • 9
    Understand And Trust Data With Dataplex Data Lineage
    • March 17, 2023
  • 10
    Limits To Computing: A Computer Scientist Explains Why Even In The Age Of AI, Some Problems Are Just Too Difficult
    • March 17, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • 1
    The Benefits And Core Processes Of Data Wrangling
    • March 17, 2023
  • 2
    We Cannot Even Agree On Dates…
    • March 17, 2023
  • 3
    Financial Crisis: It’s A Game & We’re All Being Played
    • March 17, 2023
  • 4
    Using ML To Predict The Weather And Climate Risk
    • March 16, 2023
  • 5
    Google Is A Leader In The 2023 Gartner® Magic Quadrant™ For Enterprise Conversational AI Platforms
    • March 16, 2023
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
  • About

Input your search keywords and press Enter.