We launched BigQuery ML, an integrated part of Google Cloud’s BigQuery data warehouse, in 2018 as a SQL interface for training and using linear models. Many customers with a large amount of data in BigQuery started using BigQuery ML to remove the need for data ETL, since it brought ML directly to their stored data. Due to ease of explainability, linear models worked quite well for many of our customers.
However, as many Kaggle machine learning competitions have shown, some non-linear model types like XGBoost and AutoML Tables work really well on structured data. Recent advances in Explainable AI based on SHAP values have also enabled customers to better understand why a prediction was made by these non-linear models. Google Cloud AI Platform already provides the ability to train these non-linear models, and we have integrated with Cloud AI Platform to bring these capabilities to BigQuery. We have added the ability to train and use three new types of regression and classification models: boosted trees using XGBoost, AutoML tables, and DNNs using Tensorflow. The models trained in BigQuery ML can also be exported to deploy for online prediction on Cloud AI Platform or a customer’s own serving stack. Furthermore, we expanded the use cases to include recommendation systems, clustering, and time series forecasting.
We are announcing the general availability of the following: boosted trees using XGBoost, deep neural networks (DNNs) using Tensorflow, and model export for online prediction. Here are more details on each of them:
Boosted trees using XGBoost
You can train and use boosted tree models using the XGBoost library. Tree-based models capture feature non-linearity well, and XGBoost is one of the most popular libraries for building boosted tree models. These models have been shown to work very well on structured data in Kaggle competitions without being as complex and obscure as neural networks, since they let you inspect the set of decision trees to understand the models. This should be one of the first models you build for any problem. Get started with the documentation to understand how to use this model type.
Deep neural networks using TensorFlow
These are fully connected neural networks, of type DNNClassifier and DNNRegressor in TensorFlow. Using a DNN reduces the need for feature engineering, as the hidden layers capture a lot of feature interaction and transformations. However, the hyperparameters make a significant difference in performance, and understanding them requires more advanced data science skills. We suggest only experienced data scientists use this model type, and leverage a hyperparameter tuning service like Google Vizier to optimize the models. Get started with the documentation to understand how to use this model type.
Model export for online prediction
Once you have built a model in BigQuery ML, you can export it for online prediction or further editing and inspection using TensorFlow or XGBoost tools. You can export all models except time series models. All models except boosted tree are exported as TensorFlow SavedModel, which can be deployed for online prediction or even inspected or edited further using TensorFlow tools. Boosted tree models are exported in Booster format for online deployment and further editing or inspection. Get started with the documentation to understand how to export models and use them for online prediction.
We are building a set of notebooks for common patterns (use cases) for these models that we see in different industries. Check out all the tutorials and notebooks.
By Abhishek Kashyap, Product Manager
Source: Google Cloud Blog