Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
  • Artificial Intelligence
  • Machine Learning

Feathr: LinkedIn’s Feature Store Is Now Available On Azure

  • April 18, 2022
  • liwaiwai.com

Feature store motivation

With the advance of AI and machine learning, companies start to use complex machine learning pipelines in various applications, such as recommendation systems, fraud detection, and more. These complex systems usually require hundreds to thousands of features to support time-sensitive business applications, and the feature pipelines are maintained by different team members across various business groups.

In these machine learning systems, we see many problems that consume lots of energy of machine learning engineers and data scientists, in particular duplicated feature engineering, online-offline skew, and feature serving with low latency.


Partner with liwaiwai.com
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Figure 1: Illustration on problems that feature store solves

Figure 1: Illustration on problems that feature store solves.

Duplicated feature engineering

  • In an organization, thousands of features are buried in different scripts and in different formats; they are not captured, organized, or preserved, and thus cannot be reused and leveraged by teams other than those who generated them.
  • Because feature engineering is so important for machine learning models and features cannot be shared, data scientists must duplicate their feature engineering efforts across teams.

Online-offline skew

  • For features, offline training and online inference usually require different data serving pipelines—ensuring consistent features across different environments is expensive.
  • Teams are deterred from using real-time data for inference due to the difficulty of serving the right data.
  • Providing a convenient way to ensure data point-in-time correctness is key to avoid label leakage.

Serving features with low latency

  • For real-time applications, getting feature lookups from database for real-time inference without compromising response latency and with high throughput can be challenging.
  • Easily accessing features with very low latency is key in many machine learning scenarios, and optimizations needs to be done to combine different REST API calls to features.

To solve those problems, a concept called feature store was developed, so that:

  • Features are centralized in an organization and can be reused
  • Features can be served in a synchronous way between offline and online environment
  • Features can be served in real-time with low latency
Read More  Cash App Uses Google Cloud To Power Mobile Payments Innovation And research

Introducing Feathr, a battle-tested feature store

Developing a feature store from scratch takes time, and it takes much more time to make it stable, scalable, and user-friendly. Feathr is the feature store that has been used in production and battle-tested in LinkedIn for over 6 years, serving all the LinkedIn machine learning feature platform with thousands of features in production.

At Microsoft, the LinkedIn team and the Azure team have worked very closely to open source Feathr, make it extensible, and build native integration with Azure. It’s available in this GitHub repository and you can read more about Feathr on the LinkedIn Engineering Blog.

Some of the highlights for Feathr include:

  • Scalable with built-in optimizations. For example, based on some internal use case, Feathr can process billions of rows and PB scale data with built-in optimizations such as bloom filters and salted joins.
  • Rich support for point-in-time joins and aggregations: Feathr has high performant built-in operators designed for Feature Store, including time-based aggregation, sliding window joins, look-up features, all with point-in-time correctness.
  • Highly customizable user-defined functions (UDFs) with native PySpark and Spark SQL support to lower the learning curve for data scientists.
  • Pythonic APIs to access everything with low learning curve; Integrated with model building so data scientists can be productive from day one.
  • Rich type system including support for embeddings for advanced machine learning/deep learning scenarios. One of the common use cases is to build embeddings for customer profiles, and those embeddings can be reused across an organization in all the machine learning applications.
  • Native cloud integration with simplified and scalable architecture, which is illustrated in the next section.
  • Feature sharing and reuse made easy: Feathr has built-in feature registry so that features can be easily shared across different teams and boost team productivity.
Read More  Microsoft Build 2019 | Building Autonomous Systems with Machine Teaching

Feathr on Azure architecture

The high-level architecture diagram below articulates how would a user interacts with Feathr on Azure:

Feathr on Azure architecture.

Figure 2: Feathr on Azure architecture.

  1. A data or machine learning engineer creates features using their preferred tools (like pandas, Azure Machine Learning, Azure Databricks, and more). These features are ingested into offline stores, which can be either:
    • Azure SQL Database (including serverless), Azure Synapse Dedicated SQL Pool (formerly SQL DW).
    • Object storage, such as Azure BLOB storage, Azure Data Lake Store, and more. The format can be Parquet, Avro, or Delta Lake.
  2. The data or machine learning engineer can persist the feature definitions into a central registry, which is built with Azure Purview.
  3. The data or machine learning engineer can join on all the feature dataset in a point-in-time correct way, with Feathr Python SDK and with Spark engines such as Azure Synapse or Databricks.
  4. The data or machine learning engineer can materialize features into an online store such as Azure Cache for Redis with Active-Active, enabling multi-primary, multi-write architecture that ensures eventual consistency between clusters.
  5. Data scientists or machine learning engineers consume offline features with their favorite machine learning libraries, for example scikit-learn, PyTorch, or TensorFlow to train a model in their favorite machine learning platform such as Azure Machine Learning, then deploy the models in their favorite environment with services such as Azure Machine Learning endpoint.
  6. The backend system makes a request to the deployed model, which makes a request to the Azure Cache for Redis to get the online features with Feathr Python SDK.
Read More  AT&T And IBM Bring The Power Of 5G, Hybrid Cloud And AI To Drive Innovation For Clients Across Industries

A sample notebook containing all the above flow is located in the Feathr repository for more reference.

Feathr has native integration with Azure and other cloud services. The table below shows these integrations:

Feathr component Cloud Integrations

Offline store – Object Store

Azure Blob Storage
Azure ADLS Gen2
AWS S3

Offline store – SQL

Azure SQL DB
Azure Synapse Dedicated SQL Pools (formerly SQL DW)
Azure SQL in VM
Snowflake

Online store

Azure Cache for Redis

Feature Registry

Azure Purview

Compute Engine

Azure Synapse Spark Pools
Databricks

Machine Learning Platform

Azure Machine Learning
Jupyter Notebook

File Format

Parquet
ORC
Avro
Delta Lake

Table 1: Feathr on Azure Integration with Azure Services.

Installation and getting started

Feathr has a pythonic interface to access all Feathr components, including feature definition and cloud interactions, and is open sourced here. The Feathr python client can be easily installed with pip:

pip install -U feathr

For more details on getting started, please refer to the Feathr Quickstart Guide. The Feathr team can also be reached in the Feathr community.

Going forward

In this blog, we’ve introduced a battle-tested feature store, called Feathr, which is scalable and enterprise ready, with native Azure integrations. We are dedicated to bringing more functionalities into Feathr and Feathr on Azure integrations, and feel free to give any feedback by raising issues in Feathr GitHub repository.

  • Checkout the GitHub repository for Feathr.
  • Reach out to Feathr community.
  • Read Feathr open-source blog post from our LinkedIn colleagues.

 

By Xiaoyong Zhu Principal Data Scientist, Azure Data
Source Microsoft Azure

This blog post is co-authored by David Stein, Senior Staff Software Engineer, Jinghui Mo, Staff Software Engineer, and Hangfei Lin, Staff Software Engineer, all from Feathr team.


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

liwaiwai.com

Related Topics
  • Azure Machine Learning
  • Azure SQL
  • Feathr
  • LinkedIn
  • Microsoft Azure
  • PySpark
You May Also Like
View Post
  • Artificial Intelligence
  • Platforms

Oracle CloudWorld 2023: 6 Key Takeaways From The Big Annual Event

  • September 25, 2023
View Post
  • Artificial Intelligence

3 Ways AI Can Help Communities Adapt To Climate Change In Africa

  • September 25, 2023
Robotic Hand | Lights
View Post
  • Artificial Intelligence
  • Technology

Nvidia H100 Tensor Core GPUs Come To Oracle Cloud

  • September 24, 2023
View Post
  • Artificial Intelligence
  • Engineering
  • Technology

AI-Driven Tool Makes It Easy To Personalize 3D-Printable Models

  • September 22, 2023
View Post
  • Artificial Intelligence
  • Data

Applying Generative AI To Product Design With BigQuery DataFrames

  • September 21, 2023
View Post
  • Artificial Intelligence
  • Platforms

Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes

  • September 21, 2023
Microsoft and Adobe
View Post
  • Artificial Intelligence
  • Machine Learning
  • Platforms

Microsoft And Adobe Partner To Deliver Cost Savings And Business Benefits

  • September 21, 2023
View Post
  • Artificial Intelligence
  • Technology

Huawei Connect 2023: Accelerating Intelligence For Shared Success

  • September 20, 2023
A Field Guide To A.I.
Navigate the complexities of Artificial Intelligence and unlock new perspectives in this must-have guide.
Now available in print and ebook.

charity-water



Stay Connected!
LATEST
  • 1
    Oracle CloudWorld 2023: 6 Key Takeaways From The Big Annual Event
    • September 25, 2023
  • 2
    3 Ways AI Can Help Communities Adapt To Climate Change In Africa
    • September 25, 2023
  • Robotic Hand | Lights 3
    Nvidia H100 Tensor Core GPUs Come To Oracle Cloud
    • September 24, 2023
  • 4
    AI-Driven Tool Makes It Easy To Personalize 3D-Printable Models
    • September 22, 2023
  • 5
    Applying Generative AI To Product Design With BigQuery DataFrames
    • September 21, 2023
  • 6
    Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes
    • September 21, 2023
  • Microsoft and Adobe 7
    Microsoft And Adobe Partner To Deliver Cost Savings And Business Benefits
    • September 21, 2023
  • Coffee | Laptop | Notebook | Work 8
    First HP Work Relationship Index Shows Majority of People Worldwide Have an Unhealthy Relationship with Work
    • September 20, 2023
  • 9
    Huawei Connect 2023: Accelerating Intelligence For Shared Success
    • September 20, 2023
  • 10
    Document AI Workbench Is Now Powered By Generative AI To Structure Document Data Faster
    • September 15, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • Intel Innovation 1
    Intel Innovation 2023
    • September 15, 2023
  • 2
    Microsoft And Oracle Expand Partnership To Deliver Oracle Database Services On Oracle Cloud Infrastructure In Microsoft Azure
    • September 14, 2023
  • 3
    Real-Time Ubuntu Is Now Available In AWS Marketplace
    • September 12, 2023
  • 4
    IBM Brings Watsonx To ESPN Fantasy Football With New Waiver Grades And Trade Grades
    • September 13, 2023
  • Data 5
    UK Space Sector Has Sights Set On Artificial Intelligence And Machine Learning Professionals
    • September 15, 2023
  • /
  • Artificial Intelligence
  • Explore
  • About
  • Contact Us

Input your search keywords and press Enter.