Simplify machine learning workflows with BigQuery ML and Vertex AI
Organizations that follow a siloed approach to managing databases, analytics and machine learning often need to move data from one system to another. This leads to data duplication with no single source of truth and makes it difficult to adhere to security and governance requirements.
Additionally, when building ML pipelines, you need to train and deploy your models. Therefore you need to plan your infrastructure for scale. You also need to make sure that your ML models are tuned and optimized to run efficiently on your infrastructure. For example, you may need a large set of kubernetes clusters or access to GPU-based clusters so that you can train your models quickly. This forces organizations to hire highly skilled professionals with deep knowledge of Python, Java and other programming languages.
Google’s data cloud provides a unified data and AI solution to help you overcome these challenges and simplify your machine learning workflows. BigQuery’s serverless, scalable architecture helps you create a powerful single source of truth for your data. BigQuery ML brings machine learning capabilities directly into your data warehouse through a familiar SQL interface. BigQuery ML’s native integration with Vertex AI allows you to leverage MLOps tooling to deploy, scale, and manage your models.
BigQuery ML and Vertex AI help accelerate the adoption of AI across your organization.
- Easy data management: Manage ML workflows without moving data from BigQuery, eliminating security and governance problems. The ability to manage workflows within your datastore removes a big barrier to ML development and adoption.
- Reduce infrastructure management overhead: BigQuery takes advantage of the massive scale of Google’s compute and storage infrastructure. You don’t need to manage huge clusters or HPC infrastructure to do ML effectively.
- Remove skillset barrier: BigQuery ML is SQL based. This allows many model types to be directly available in SQL, such as regression, classification, recommender systems, deep learning, time series, anomaly detection, and more.
- Deploy models and operationalize ML workflows: Vertex AI Model Registry makes it easy to deploy BigQuery ML models to a Vertex AI REST endpoint for online or batch predictions. Further, Vertex AI Pipelines automate your ML workflows, helping you reliably go from data ingestion to deploying your model in a way that lets you monitor and understand your ML system.
Get started with BigQuery ML in three steps
Step 1: Bring your data into BigQuery automatically via Pub/Sub in real time or in batch using BigQuery utilities or through one of our partner solutions. In addition, BigQuery can access data that may be residing in open source format such as Parquet/Hudi residing in object storage using BigLake. Learn more about loading data into BigQuery.
Step 2: Train a model by running a simple SQL query (create model) in BigQuery and point to the dataset. BigQuery is highly scalable in terms of compute and storage, whether it is a dataset with 1000 rows or billions of rows. Learn more about model training in BigQuery ML.
Increase impact with new capabilities in BigQuery ML
At Next ‘22, we announced several innovations in BigQuery ML that help you to quickly and easily operationalize ML at scale. To get early access and check out these new capabilities, submit this interest form.
1. Scale with MLOps and pipelines
When you are training a lot of models across your organization, managing models, comparing results, and creating repeatable training processes can be incredibly difficult. New capabilities make it easier to operationalize and scale BigQuery ML models with Vertex AI’s MLOps capabilities.
Vertex AI Model Registry is now GA, providing a central place to manage and govern the deployment of all your models, including BigQuery ML models. You can use Vertex AI Model Registry for version control and ML metadata tracking, model evaluation and validation, deployment and model reporting. Learn more here.
Another capability that further helps operationalize ML at scale is Vertex AI Pipelines, a serverless tool for orchestrating ML tasks so that they can be executed as a single pipeline, rather than manually triggered each task (e.g. train a model, evaluate the model, deploy to an endpoint) separately. We are introducing more than 20 BigQuery ML components to simplify orchestrating BigQuery ML operations. This eliminates the need for developers and ML engineers to write their own custom components to invoke BigQuery ML jobs. Additionally, if you are Data Scientist who prefers running code over SQL, you can now use these operators to train and predict in BigQuery ML.
- BigQuery ML trained models
- Imported models of various formats
- Remote models (will be available soon)
BigQuery ML supports several models out-of-the-box. However, some customers want to inference with models that are already trained in other platforms. Therefore, we are introducing new capabilities that allow users to import models beyond TensorFlow into BigQuery ML, starting with TFLite and XGBoost.
- Automatic preprocessing. BigQuery ML performs automatic preprocessing during training. For more information, see Automatic feature preprocessing.
- Manual preprocessing. BigQuery ML provides the TRANSFORM clause for you to define custom preprocessing using the manual preprocessing functions. You can also use these functions outside the TRANSFORM clause.
Further, when you export BigQuery ML models by registering with Vertex AI Model Registry or manually, transform clauses will also be exported with it. This really simplifies online model deployment to Vertex.
5. Multivariate time series forecasting
Many BigQuery customers use the natively supported ARIMA PLUS model to forecast future demand and plan their business operations. Until now customers could forecast using only a single input variable. For example, to forecast ice cream sales, along with target metrics the past sales, customers could not forecast using external covariates such as weather. With this launch, users can now make more accurate forecasts by taking more than one variable into account through multivariate time series forecasting with ARIMA_PLUS_XREG (ARIMA_PLUS with external regressors (such as weather, location, etc).
Submit this form to try these new capabilities that help you accelerate your data to AI journey with BigQuery ML. Check out this video to learn more about these features and see a demo of how ML on structured and unstructured data can really transform marketing analytics.
Acknowledgements:It was an honor and privilege to work on this with Amir Hormati, Polong Lin, Candice Chen, Mingge Deng, Yan Sun. We further acknowledge Manoj Gunti, Shana Matthews and Neama Dadkhahnikoo for support, work they have done and their inputs.
By: Abhinav Khushraj (Product Management, Google) and Firat Tekiner (Product Management, Google)
Source: Google Cloud Blog