Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • Learning
  • About
  • Artificial Intelligence
  • Data
  • Machine Learning
  • Platforms

Want To Use AutoML Tables From A Jupyter Notebook? Here’s How

  • January 21, 2020
  • liwaiwai.com

While there’s no doubt that machine learning (ML) can be a great tool for businesses of all shapes and sizes, actually building ML models can seem daunting at first. Cloud AutoML—Google Cloud’s suite of products—provides tools and functionality to help you build ML models that are tailored to your specific needs, without needing deep ML expertise.

AutoML solutions provide a user interface that walks you through each step of model building, including importing data, training your model on the data, evaluating model performance, and predicting values with the model. But, what if you want to use AutoML products outside of the user interface? If you’re working with structured data, one way to do it is by using the AutoML Tables SDK, which lets you trigger—or even automate—each step of the process through code.


Partner with liwaiwai.com
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

There is a wide variety of ways that the SDK can help embed AutoML capabilities into applications. In this post, we’ll use an example to show how you can use the SDK from end-to-end within your Jupyter Notebook. Jupyter Notebooks are one of the most popular development tools for data scientists. They enable you to create interactive, shareable notebooks with code snippets and markdown for explanations. Without leaving Google Cloud’s hosted notebook environment, AI Platform Notebooks, you can leverage the power of AutoML technology.

There are several benefits of using AutoML technology from a notebook. Each step and setting can be codified so that it runs the same every time by everyone. Also, it’s common, even with AutoML, to need to manipulate the source data before training the model with it. By using a notebook, you can use common tools like pandas and numpy to preprocess the data in the same workflow. Finally, you have the option of creating a model with another framework, and ensemble that together with the AutoML model, for potentially better results. Let’s get started!

Read More  OpenAI Releases Free ChatGPT iPhone App

Understanding the data

The business problem we’ll investigate in this blog is how to identify fraudulent credit card transactions. The technical challenge we’ll face is how to deal with imbalanced datasets: only 0.17% of the transactions in the dataset we’re using are marked as fraud. More details on this problem are available in the research paper Calibrating Probability with Undersampling for Unbalanced Classification.

To get started, you’ll need a Google Cloud Platform project with billing enabled. To create a project, follow the instructions here. For a smooth experience, check that the necessary storage and ML APIs are enabled. Then, follow this link to access BigQuery public datasets in the Google Cloud console.

In the Resources tree in the bottom-left corner, navigate through the list of datasets until you find ml-datasets, and then select the ulb-fraud-detection table within it.

ulb-fraud-detection.png

Click the Preview tab to preview sample records from the dataset. Each record has the following columns:

  • Time is the number of seconds between the first transaction in the dataset and the time of the selected transaction.
  • V1-V28 are columns that have been transformed via a dimensionality reduction technique called PCA that has anonymized the data.
  • Amount is the transaction amount.
ulb-fraud-detection 1.png

Set up your Notebook Environment

Now that we’ve looked at the data, let’s set up our development environment. The notebook we’ll use can be found in AI Hub. Select the “Open in GCP” button, then choose to either deploy the notebook in a new or existing notebook server.

set up Notebook Environment.png
ai hub.png

Configure the AutoML Tables SDK

Next, let’s highlight key sections of the notebook. Some details, such as setting the project ID, are omitted for brevity, but we highly recommend running the notebook end-to-end when you have an opportunity.

Read More  PyCon 2019 | Put Down The Deep Learning: When Not To Use Neural Networks And What To Do Instead

We’ve recently released a new and improved AutoML Tables client library. You will first need to install the library and initialize the Tables client.

!pip install google-cloud-automl --user
client = automl.TablesClient(project=PROJECT_ID, region=REGION)

By the way, we recently announced that AutoML Tables can now be used in Kaggle kernels. You can learn more in this tutorial notebook, but the setup is similar to what you see here.

Import the Data

The first step is to create a BigQuery dataset, which is essentially a container for the data. Next, import the data from the BigQuery fraud detection dataset. You can also import from a CSV file in Google Cloud Storage or directly from a pandas dataframe.


# The URI pattern is bq://..

BIGQUERY_INPUT_URI = ‘bq://bigquery-public-data.ml_datasets.ulb_fraud_detection’ # Create dataset client.create_dataset(dataset_display_name=DATASET_DISPLAY_NAME) # Import data into dataset response = client.import_data(dataset_display_name=DATASET_DISPLAY_NAME, bigquery_input_uri=BIGQUERY_INPUT_URI)

Train the Model

First, we have to specify which column we would like to predict, or our target column, with set_target_column(). The target column for our example will be “Class”—either 1 or 0, if the transaction is fraudulent or not.

Then, we’ll specify which columns to exclude from the model. We’ll only exclude the target column, but you could also exclude IDs or other information you don’t want to include in the model.

There are a few other things you might want to do that aren’t necessary needed in this example:

  • Set weights on individual columns
  • Create your own custom test/train/validation split and specify the column to use for the split
  • Specify which timestamp column to use for time-series problems
  • Override the data types and nullable status that AutoML Tables inferred during data import

The one slightly unusual thing that we did in this example is override the default optimization objective. Since this is a very imbalanced dataset, it’s recommended that you optimize for AU-PRC, or the area under the Precision/Recall curve, rather than the default AU-ROC.


# Set the target column
client.set_target_column(
  dataset_display_name=DATASET_DISPLAY_NAME,
  column_spec_display_name=TARGET_COLUMN
)

# Set columns to exclude as features
EXCLUDE_COLUMNS = [TARGET_COLUMN]

 # Create the model
 response = client.create_model(
   MODEL_DISPLAY_NAME,
   dataset_display_name=DATASET_DISPLAY_NAME,
   train_budget_milli_node_hours=TRAIN_BUDGET,
   exclude_column_spec_names=EXCLUDE_COLUMNS,
   optimization_objective=OPTIMIZATION_OBJECTIVE
 )

Evaluate the Model

After training has been completed, you can review various performance statistics on the model, such as the accuracy, precision, recall, and so on. The metrics are returned in a nested data structure, and here we are pulling out the AU-PRC and AU-ROC from that data structure.


me = client.list_model_evaluations(model_display_name=MODEL_DISPLAY_NAME)
metrics = list(me)[1].classification_evaluation_metrics

Deploy and Predict with the Model

To enable online predictions, the model must first be deployed. (You can perform batch predictions without deploying the model).


response = client.deploy_model(model_display_name=MODEL_DISPLAY_NAME)

We’ll create a hypothetical transaction record with similar characteristics and predict on it. After invoking the predict() API with this record, we receive a data structure with each class and its score. The code below finds the class with the maximum score.


# Create an example record with similar characteristics as dataset, and predict on it.

record = [randint(0, 100000)] + [round(uniform(-1,1), 2) for i in range(28)] + [randint(0, 1000)]
print("record: {}".format(record))
result = client.predict(record, model_display_name=MODEL_DISPLAY_NAME)

prediction = max(result.payload, key=lambda x: x.tables.score).tables
print("value:  {}".format(prediction.value.string_value))
print("score:  {}".format(round(prediction.score, 5)))

Conclusion

Now that we’ve seen how you can use AutoML Tables straight from your notebook to produce an accurate model of a complex problem, all with a minimal amount of code, what’s next?

Read More  Google Cloud And STS To Automate U.S. Navy Maintenance Inspections Using AI And ML Technology

To find out more, the AutoML Tables documentation is a great place to start. When you’re ready to use AutoML in a notebook, the SDK guide has detailed descriptions of each operation and parameter. You might also find our samples on GitHub helpful.

After you feel comfortable with AutoML Tables, you might want to look at other AutoML products. You can apply what you’ve learned to solve problems in Natural Language, Translation, Video Intelligence, and Video domains.

 

Karl Weinmeister

This guide is republished from Google Cloud blog.


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

liwaiwai.com

Related Topics
  • Cloud AutoML
  • Datasets
  • Google Cloud
  • Jupyter
  • Jupyter Notebook
  • Tutorial
You May Also Like
View Post
  • Artificial Intelligence
  • Technology

NASA’s Mars Rovers Could Inspire A More Ethical Future For AI

  • September 26, 2023
View Post
  • Artificial Intelligence
  • Platforms

Oracle CloudWorld 2023: 6 Key Takeaways From The Big Annual Event

  • September 25, 2023
View Post
  • Artificial Intelligence

3 Ways AI Can Help Communities Adapt To Climate Change In Africa

  • September 25, 2023
Robotic Hand | Lights
View Post
  • Artificial Intelligence
  • Technology

Nvidia H100 Tensor Core GPUs Come To Oracle Cloud

  • September 24, 2023
View Post
  • Artificial Intelligence
  • Engineering
  • Technology

AI-Driven Tool Makes It Easy To Personalize 3D-Printable Models

  • September 22, 2023
View Post
  • Artificial Intelligence
  • Data

Applying Generative AI To Product Design With BigQuery DataFrames

  • September 21, 2023
View Post
  • Artificial Intelligence
  • Platforms

Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes

  • September 21, 2023
Microsoft and Adobe
View Post
  • Artificial Intelligence
  • Machine Learning
  • Platforms

Microsoft And Adobe Partner To Deliver Cost Savings And Business Benefits

  • September 21, 2023
A Field Guide To A.I.
Navigate the complexities of Artificial Intelligence and unlock new perspectives in this must-have guide.
Now available in print and ebook.

charity-water



Stay Connected!
LATEST
  • 1
    NASA’s Mars Rovers Could Inspire A More Ethical Future For AI
    • September 26, 2023
  • 2
    Oracle CloudWorld 2023: 6 Key Takeaways From The Big Annual Event
    • September 25, 2023
  • 3
    3 Ways AI Can Help Communities Adapt To Climate Change In Africa
    • September 25, 2023
  • Robotic Hand | Lights 4
    Nvidia H100 Tensor Core GPUs Come To Oracle Cloud
    • September 24, 2023
  • 5
    AI-Driven Tool Makes It Easy To Personalize 3D-Printable Models
    • September 22, 2023
  • 6
    Applying Generative AI To Product Design With BigQuery DataFrames
    • September 21, 2023
  • 7
    Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes
    • September 21, 2023
  • Microsoft and Adobe 8
    Microsoft And Adobe Partner To Deliver Cost Savings And Business Benefits
    • September 21, 2023
  • Coffee | Laptop | Notebook | Work 9
    First HP Work Relationship Index Shows Majority of People Worldwide Have an Unhealthy Relationship with Work
    • September 20, 2023
  • 10
    Huawei Connect 2023: Accelerating Intelligence For Shared Success
    • September 20, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • Intel Innovation 1
    Intel Innovation 2023
    • September 15, 2023
  • 2
    Microsoft And Oracle Expand Partnership To Deliver Oracle Database Services On Oracle Cloud Infrastructure In Microsoft Azure
    • September 14, 2023
  • 3
    Real-Time Ubuntu Is Now Available In AWS Marketplace
    • September 12, 2023
  • 4
    IBM Brings Watsonx To ESPN Fantasy Football With New Waiver Grades And Trade Grades
    • September 13, 2023
  • 5
    Document AI Workbench Is Now Powered By Generative AI To Structure Document Data Faster
    • September 15, 2023
  • /
  • Artificial Intelligence
  • Explore
  • About
  • Contact Us

Input your search keywords and press Enter.