Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
  • Artificial Intelligence
  • Data
  • Machine Learning
  • Platforms

Want To Use AutoML Tables From A Jupyter Notebook? Here’s How

  • January 21, 2020
  • liwaiwai.com

While there’s no doubt that machine learning (ML) can be a great tool for businesses of all shapes and sizes, actually building ML models can seem daunting at first. Cloud AutoML—Google Cloud’s suite of products—provides tools and functionality to help you build ML models that are tailored to your specific needs, without needing deep ML expertise.

AutoML solutions provide a user interface that walks you through each step of model building, including importing data, training your model on the data, evaluating model performance, and predicting values with the model. But, what if you want to use AutoML products outside of the user interface? If you’re working with structured data, one way to do it is by using the AutoML Tables SDK, which lets you trigger—or even automate—each step of the process through code.


Partner with liwaiwai.com
for your next big idea.
Let us know here.


cyberpogo

There is a wide variety of ways that the SDK can help embed AutoML capabilities into applications. In this post, we’ll use an example to show how you can use the SDK from end-to-end within your Jupyter Notebook. Jupyter Notebooks are one of the most popular development tools for data scientists. They enable you to create interactive, shareable notebooks with code snippets and markdown for explanations. Without leaving Google Cloud’s hosted notebook environment, AI Platform Notebooks, you can leverage the power of AutoML technology.

There are several benefits of using AutoML technology from a notebook. Each step and setting can be codified so that it runs the same every time by everyone. Also, it’s common, even with AutoML, to need to manipulate the source data before training the model with it. By using a notebook, you can use common tools like pandas and numpy to preprocess the data in the same workflow. Finally, you have the option of creating a model with another framework, and ensemble that together with the AutoML model, for potentially better results. Let’s get started!

Read More  Streamline Your Models To Production With The Vertex AI Model Registry

Understanding the data

The business problem we’ll investigate in this blog is how to identify fraudulent credit card transactions. The technical challenge we’ll face is how to deal with imbalanced datasets: only 0.17% of the transactions in the dataset we’re using are marked as fraud. More details on this problem are available in the research paper Calibrating Probability with Undersampling for Unbalanced Classification.

To get started, you’ll need a Google Cloud Platform project with billing enabled. To create a project, follow the instructions here. For a smooth experience, check that the necessary storage and ML APIs are enabled. Then, follow this link to access BigQuery public datasets in the Google Cloud console.

In the Resources tree in the bottom-left corner, navigate through the list of datasets until you find ml-datasets, and then select the ulb-fraud-detection table within it.

ulb-fraud-detection.png

Click the Preview tab to preview sample records from the dataset. Each record has the following columns:

  • Time is the number of seconds between the first transaction in the dataset and the time of the selected transaction.
  • V1-V28 are columns that have been transformed via a dimensionality reduction technique called PCA that has anonymized the data.
  • Amount is the transaction amount.
ulb-fraud-detection 1.png

Set up your Notebook Environment

Now that we’ve looked at the data, let’s set up our development environment. The notebook we’ll use can be found in AI Hub. Select the “Open in GCP” button, then choose to either deploy the notebook in a new or existing notebook server.

set up Notebook Environment.png
ai hub.png

Configure the AutoML Tables SDK

Next, let’s highlight key sections of the notebook. Some details, such as setting the project ID, are omitted for brevity, but we highly recommend running the notebook end-to-end when you have an opportunity.

Read More  Microsoft’s New AI Can Clone Your Voice In Just 3 Seconds

We’ve recently released a new and improved AutoML Tables client library. You will first need to install the library and initialize the Tables client.

!pip install google-cloud-automl --user
client = automl.TablesClient(project=PROJECT_ID, region=REGION)

By the way, we recently announced that AutoML Tables can now be used in Kaggle kernels. You can learn more in this tutorial notebook, but the setup is similar to what you see here.

Import the Data

The first step is to create a BigQuery dataset, which is essentially a container for the data. Next, import the data from the BigQuery fraud detection dataset. You can also import from a CSV file in Google Cloud Storage or directly from a pandas dataframe.


# The URI pattern is bq://..

BIGQUERY_INPUT_URI = ‘bq://bigquery-public-data.ml_datasets.ulb_fraud_detection’ # Create dataset client.create_dataset(dataset_display_name=DATASET_DISPLAY_NAME) # Import data into dataset response = client.import_data(dataset_display_name=DATASET_DISPLAY_NAME, bigquery_input_uri=BIGQUERY_INPUT_URI)

Train the Model

First, we have to specify which column we would like to predict, or our target column, with set_target_column(). The target column for our example will be “Class”—either 1 or 0, if the transaction is fraudulent or not.

Then, we’ll specify which columns to exclude from the model. We’ll only exclude the target column, but you could also exclude IDs or other information you don’t want to include in the model.

There are a few other things you might want to do that aren’t necessary needed in this example:

  • Set weights on individual columns
  • Create your own custom test/train/validation split and specify the column to use for the split
  • Specify which timestamp column to use for time-series problems
  • Override the data types and nullable status that AutoML Tables inferred during data import

The one slightly unusual thing that we did in this example is override the default optimization objective. Since this is a very imbalanced dataset, it’s recommended that you optimize for AU-PRC, or the area under the Precision/Recall curve, rather than the default AU-ROC.


# Set the target column
client.set_target_column(
  dataset_display_name=DATASET_DISPLAY_NAME,
  column_spec_display_name=TARGET_COLUMN
)

# Set columns to exclude as features
EXCLUDE_COLUMNS = [TARGET_COLUMN]

 # Create the model
 response = client.create_model(
   MODEL_DISPLAY_NAME,
   dataset_display_name=DATASET_DISPLAY_NAME,
   train_budget_milli_node_hours=TRAIN_BUDGET,
   exclude_column_spec_names=EXCLUDE_COLUMNS,
   optimization_objective=OPTIMIZATION_OBJECTIVE
 )

Evaluate the Model

After training has been completed, you can review various performance statistics on the model, such as the accuracy, precision, recall, and so on. The metrics are returned in a nested data structure, and here we are pulling out the AU-PRC and AU-ROC from that data structure.


me = client.list_model_evaluations(model_display_name=MODEL_DISPLAY_NAME)
metrics = list(me)[1].classification_evaluation_metrics

Deploy and Predict with the Model

To enable online predictions, the model must first be deployed. (You can perform batch predictions without deploying the model).


response = client.deploy_model(model_display_name=MODEL_DISPLAY_NAME)

We’ll create a hypothetical transaction record with similar characteristics and predict on it. After invoking the predict() API with this record, we receive a data structure with each class and its score. The code below finds the class with the maximum score.


# Create an example record with similar characteristics as dataset, and predict on it.

record = [randint(0, 100000)] + [round(uniform(-1,1), 2) for i in range(28)] + [randint(0, 1000)]
print("record: {}".format(record))
result = client.predict(record, model_display_name=MODEL_DISPLAY_NAME)

prediction = max(result.payload, key=lambda x: x.tables.score).tables
print("value:  {}".format(prediction.value.string_value))
print("score:  {}".format(round(prediction.score, 5)))

Conclusion

Now that we’ve seen how you can use AutoML Tables straight from your notebook to produce an accurate model of a complex problem, all with a minimal amount of code, what’s next?

Read More  Artificial Intelligence's Promise And Peril

To find out more, the AutoML Tables documentation is a great place to start. When you’re ready to use AutoML in a notebook, the SDK guide has detailed descriptions of each operation and parameter. You might also find our samples on GitHub helpful.

After you feel comfortable with AutoML Tables, you might want to look at other AutoML products. You can apply what you’ve learned to solve problems in Natural Language, Translation, Video Intelligence, and Video domains.

 

Karl Weinmeister

This guide is republished from Google Cloud blog.


Our humans need coffee too! Your support is highly appreciated, thank you!

liwaiwai.com

Related Topics
  • Cloud AutoML
  • Datasets
  • Google Cloud
  • Jupyter
  • Jupyter Notebook
  • Tutorial
You May Also Like
View Post
  • Artificial Intelligence

Introducing 100K Context Windows

  • May 30, 2023
View Post
  • Architecture
  • Artificial Intelligence
  • Design

Sandvik unveils the Impossible Statue – an AI-enabled collaboration between Michelangelo, Rodin, Kollwitz, Kotaro, Savage and Sandvik

  • May 30, 2023
View Post
  • Data
  • Machine Learning

Effective Management Of Data Sources In Machine Learning

  • May 29, 2023
View Post
  • Artificial Intelligence

How Auditoria.AI Is Building AI-Powered Smart Assistants For Finance Teams

  • May 29, 2023
View Post
  • Artificial Intelligence
  • Technology

AI Coming To The PC At Scale

  • May 27, 2023
View Post
  • Artificial Intelligence
  • Platforms

Build Next-Generation, AI-Powered Applications On Microsoft Azure

  • May 26, 2023
View Post
  • Artificial Intelligence
  • Data
  • Machine Learning

Faster Together: How Dun & Bradstreet Datasets Accelerate Your Real-Time Insights

  • May 24, 2023
View Post
  • Engineering
  • Machine Learning
  • Practices

5 Skills Every Successful MLOps Engineer Should Have

  • May 24, 2023

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay Connected!
LATEST
  • 1
    Introducing 100K Context Windows
    • May 30, 2023
  • 2
    Sandvik unveils the Impossible Statue – an AI-enabled collaboration between Michelangelo, Rodin, Kollwitz, Kotaro, Savage and Sandvik
    • May 30, 2023
  • 3
    Effective Management Of Data Sources In Machine Learning
    • May 29, 2023
  • 4
    How Auditoria.AI Is Building AI-Powered Smart Assistants For Finance Teams
    • May 29, 2023
  • 5
    G7 2023: The Real Threat To The World Order Is Hypocrisy.
    • May 28, 2023
  • 6
    AI Coming To The PC At Scale
    • May 27, 2023
  • 7
    Build Next-Generation, AI-Powered Applications On Microsoft Azure
    • May 26, 2023
  • 8
    Faster Together: How Dun & Bradstreet Datasets Accelerate Your Real-Time Insights
    • May 24, 2023
  • 9
    5 Skills Every Successful MLOps Engineer Should Have
    • May 24, 2023
  • 10
    London & UK Is The Best For International Students! You Have Been Warned.
    • May 23, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • 1
    Wipro Expands Google Cloud Partnership To Advance Enterprise Adoption Of Generative AI
    • May 23, 2023
  • 2
    Google Cloud Launches AI-Powered Solutions To Safely Accelerate Drug Discovery And Precision Medicine
    • May 16, 2023
  • 3
    Huawei And Partners Announce Yucatan Wildlife Conservation Findings
    • May 18, 2023
  • 4
    Cloudflare’s R2 Is The Infrastructure Powering Leading AI Companies
    • May 16, 2023
  • 5
    TCS Announces Generative AI Partnership With Google Cloud And New Offering For Enterprise Customers
    • May 22, 2023
  • /
  • Artificial Intelligence
  • Explore
  • About
  • Contact Us

Input your search keywords and press Enter.