Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
  • Data
  • Development
  • Technology
  • Tools

Data Systems That Learn To Be Better

  • August 13, 2020
  • relay

data storage

Big data has gotten really, really big: By 2025, all the world’s data will add up to an estimated 175 trillion gigabytes. For a visual, if you stored that amount of data on DVDs, it would stack up tall enough to circle the Earth 222 times.

One of the biggest challenges in computing is handling this onslaught of information while still being able to efficiently store and process it. A team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) believe that the answer rests with something called “instance-optimized systems.”

Traditional storage and database systems are designed to work for a wide range of applications because of how long it can take to build them — months or, often, several years. As a result, for any given workload such systems provide performance that is good, but usually not the best. Even worse, they sometimes require administrators to painstakingly tune the system by hand to provide even reasonable performance.

In contrast, the goal of instance-optimized systems is to build systems that optimize and partially re-organize themselves for the data they store and the workload they serve.

“It’s like building a database system for every application from scratch, which is not economically feasible with traditional system designs,” says MIT Professor Tim Kraska.

As a first step toward this vision, Kraska and colleagues developed Tsunami and Bao. Tsunami uses machine learning to automatically re-organize a dataset’s storage layout based on the types of queries that its users make. Tests show that it can run queries up to 10 times faster than state-of-the-art systems. What’s more, its datasets can be organized via a series of “learned indexes” that are up to 100 times smaller than the indexes used in traditional systems.

Read More  Introducing Document AI platform, A Unified Console For Document Processing

Kraska has been exploring the topic of learned indexes for several years, going back to his influential work with colleagues at Google in 2017.

Harvard University Professor Stratos Idreos, who was not involved in the Tsunami project, says that a unique advantage of learned indexes is their small size, which, in addition to space savings, brings substantial performance improvements.

“I think this line of work is a paradigm shift that’s going to impact system design long-term,” says Idreos. “I expect approaches based on models will be one of the core components at the heart of a new wave of adaptive systems.”

Bao, meanwhile, focuses on improving the efficiency of query optimization through machine learning. A query optimizer rewrites a high-level declarative query to a query plan, which can actually be executed over the data to compute the result to the query. However, often there exists more than one query plan to answer any query; picking the wrong one can cause a query to take days to compute the answer, rather than seconds.

Traditional query optimizers take years to build, are very hard to maintain, and, most importantly, do not learn from their mistakes. Bao is the first learning-based approach to query optimization that has been fully integrated into the popular database management system PostgreSQL. Lead author Ryan Marcus, a postdoc in Kraska’s group, says that Bao produces query plans that run up to 50 percent faster than those created by the PostgreSQL optimizer, meaning that it could help to significantly reduce the cost of cloud services, like Amazon’s Redshift, that are based on PostgreSQL.

Read More  Mobileye’s Self-Driving Secret? 200PB Of Data

By fusing the two systems together, Kraska hopes to build the first instance-optimized database system that can provide the best possible performance for each individual application without any manual tuning.

The goal is to not only relieve developers from the daunting and laborious process of tuning database systems, but to also provide performance and cost benefits that are not possible with traditional systems.

Traditionally, the systems we use to store data are limited to only a few storage options and, because of it, they cannot provide the best possible performance for a given application. What Tsunami can do is dynamically change the structure of the data storage based on the kinds of queries that it receives and create new ways to store data, which are not feasible with more traditional approaches.

Johannes Gehrke, a managing director at Microsoft Research who also heads up machine learning efforts for Microsoft Teams, says that his work opens up many interesting applications, such as doing so-called “multidimensional queries” in main-memory data warehouses. Harvard’s Idreos also expects the project to spur further work on how to maintain the good performance of such systems when new data and new kinds of queries arrive.

Bao is short for “bandit optimizer,” a play on words related to the so-called “multi-armed bandit” analogy where a gambler tries to maximize their winnings at multiple slot machines that have different rates of return. The multi-armed bandit problem is commonly found in any situation that has tradeoffs between exploring multiple different options, versus exploiting a single option — from risk optimization to A/B testing.

Read More  The Information Arms Race Can’t Be Won, But We Have To Keep Fighting

“Query optimizers have been around for years, but they often make mistakes, and usually they don’t learn from them,” says Kraska. “That’s where we feel that our system can make key breakthroughs, as it can quickly learn for the given data and workload what query plans to use and which ones to avoid.”

Kraska says that in contrast to other learning-based approaches to query optimization, Bao learns much faster and can outperform open-source and commercial optimizers with as little as one hour of training time.In the future, his team aims to integrate Bao into cloud systems to improve resource utilization in environments where disk, RAM, and CPU time are scarce resources.

“Our hope is that a system like this will enable much faster query times, and that people will be able to answer questions they hadn’t been able to answer before,” says Kraska.

A related paper about Tsunami was co-written by Kraska, PhD students Jialin Ding and Vikram Nathan, and MIT Professor Mohammad Alizadeh. A paper about Bao was co-written by Kraska, Marcus, PhD students Parimarjan Negi and Hongzi Mao, visiting scientist Nesime Tatbul, and Alizadeh.

The work was done as part of the Data System and AI Lab ([email protected]), which is sponsored by Intel, Google, Microsoft, and the U.S. National Science Foundation.

relay

Related Topics
  • Bao
  • Big Data
  • Computer Science and Artificial Intelligence Laboratory
  • CSAIL
  • Machine Learning
  • MIT
  • Tsunami
You May Also Like
View Post
  • Data
  • Platforms
  • Technology

How Osmo Is Digitizing Smell With Google Cloud AI Technology

  • March 20, 2023
View Post
  • Data
  • Engineering
  • Tools

Built With BigQuery: How Sift Delivers Fraud Detection Workflow Backtesting At Scale

  • March 20, 2023
View Post
  • Platforms
  • Technology

Building The Most Open And Innovative AI Ecosystem

  • March 20, 2023
View Post
  • Data

Understand And Trust Data With Dataplex Data Lineage

  • March 17, 2023
View Post
  • Artificial Intelligence
  • Technology

Limits To Computing: A Computer Scientist Explains Why Even In The Age Of AI, Some Problems Are Just Too Difficult

  • March 17, 2023
View Post
  • Big Data
  • Data

The Benefits And Core Processes Of Data Wrangling

  • March 17, 2023
View Post
  • Technology

We Cannot Even Agree On Dates…

  • March 17, 2023
View Post
  • Artificial Intelligence
  • Machine Learning
  • Platforms
  • Technology

Using ML To Predict The Weather And Climate Risk

  • March 16, 2023

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay Connected!
LATEST
  • 1
    How Osmo Is Digitizing Smell With Google Cloud AI Technology
    • March 20, 2023
  • 2
    Built With BigQuery: How Sift Delivers Fraud Detection Workflow Backtesting At Scale
    • March 20, 2023
  • 3
    Building The Most Open And Innovative AI Ecosystem
    • March 20, 2023
  • 4
    Understand And Trust Data With Dataplex Data Lineage
    • March 17, 2023
  • 5
    Limits To Computing: A Computer Scientist Explains Why Even In The Age Of AI, Some Problems Are Just Too Difficult
    • March 17, 2023
  • 6
    The Benefits And Core Processes Of Data Wrangling
    • March 17, 2023
  • 7
    We Cannot Even Agree On Dates…
    • March 17, 2023
  • 8
    Financial Crisis: It’s A Game & We’re All Being Played
    • March 17, 2023
  • 9
    Using ML To Predict The Weather And Climate Risk
    • March 16, 2023
  • 10
    Google Is A Leader In The 2023 Gartner® Magic Quadrant™ For Enterprise Conversational AI Platforms
    • March 16, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • 1
    The Future Of AI Is Promising Yet Turbulent
    • March 16, 2023
  • 2
    ChatGPT: How To Prevent It Becoming A Nightmare For Professional Writers
    • March 16, 2023
  • 3
    Midjourney Selects Google Cloud To Power AI-Generated Creative Platform
    • March 8, 2023
  • 4
    A Guide To Managing Your Agile Engineering Team
    • March 15, 2023
  • 5
    10 Ways Wikimedia Does Developer Advocacy
    • March 15, 2023
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
  • About

Input your search keywords and press Enter.