Intelligence, Inside and Outside.

Making Your Pictures Worth A Thousand Labels! (With Cloud Vision API)

They say a picture is worth a thousand words. But how do you make those words available and useful? Around the world, we are generating more images than ever before, and it’s no surprise that businesses are turning to image recognition technology to help meet the immense opportunities created with this growing set of data.

Cloud Vision API is a powerful tool that enables you to perform a variety of tasks including label detection, text recognition, and object tracking on your image data. Whether it’s identifying products in a retail store, analyzing social media posts for brand mentions, or scanning through millions of images to find a specific object, the Cloud Vision API can help businesses automate their image analysis workflows and gain valuable insights from their visual data.  To protect privacy, and help you build responsibly, the Cloud Vision API offers features to limit personal identification, such as person blur, which hides identifiable features. 

Let’s explore a few of the key features of the Cloud Vision API.

Detect famous landmarks

Landmark detection allows you to analyze images to identify specific landmarks such as buildings, natural features, and other recognizable locations. Cloud Vision API recognizes landmarks and provides information about them, including their name, location, and other relevant details. Perhaps you are trying to identify the landmarks in images shared by customers as part of social campaigns, or want to build a mobile app that provides information to tourists on famous landmarks.

In the below left-hand side image, Cloud Vision API has detected the Eiffel Tower, shown in the visualized response. Not shown in this visualization here, but also detected, were Pont de Bir-Hakeim (the bridge) and Champs de Mars (the park in front of the Eiffel Tower).

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Cloud_Vision_API.max-1400x1400.jpg
Response from landmark detection feature visualized. Original image courtesy of John Towner.

Detect objects and label images

Object detection and labels are two related features that enable you to identify and classify objects within an image. Object detection detects and locates objects within an image, and provides information such as the position, size, and orientation of each object.  Labels, on the other hand, provide a general classification of the content within an image.

Read More  Their Stories Should Be Celebrated: Using AI To Deliver More Inclusive Biographical Content On Wikipedia

Object detection has practical applications in many industries such as self-driving vehicles (where it’s critical), retail, manufacturing and more, while labels can be used to help classify and organize large collections of images, or to categorize and filter content.

You can see the similarities and differences in the responses provided by the object detection and labeling features in this image taken in Setagaya.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_Cloud_Vision_API.max-1300x1300.jpg
Response from object detection feature visualized. The green bounding boxes were added to the original image with the response data from the Cloud Vision API. Original image courtesy of Alex Knight.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_Cloud_Vision_API.max-1300x1300.jpg
Response from labels feature visualized. Original image courtesy of Alex Knight.

Detect text

Cloud Vision API detects and extracts text from any image, even if it’s handwritten or in different languages. Once it detects text, the API can provide information about the position, orientation, and size of each text element, as well as individual words, and their bounding boxes.

In this image of a traffic sign, Cloud Vision API has detected the text and provided it in the response.

https://storage.googleapis.com/gweb-cloudblog-publish/images/4_Cloud_Vision_API.max-1400x1400.jpg
Response from text detection feature visualized.

Detect explicit content

Cloud Vision API can automatically identify and flag explicit or inappropriate content within an image using five categories: adult, spoof, medical, violence, and racy. The API provides a score that indicates the likelihood for each category in the image, which you can use to set thresholds in your application and decide how to handle those that exceed them. This feature is particularly useful for filtering or moderating user-generated content. 

Luckily for the images I shared here, each category has been deemed “very unlikely” to be present. Phew!

https://storage.googleapis.com/gweb-cloudblog-publish/images/5_Cloud_Vision_API.max-1300x1300.jpg
https://storage.googleapis.com/gweb-cloudblog-publish/images/6_Cloud_Vision_API.max-1300x1300.jpg
https://storage.googleapis.com/gweb-cloudblog-publish/images/7_Cloud_Vision_API.max-1500x1500.jpg
Responses from explicit content feature visualized.

Next Steps

These are just a few features of the Cloud Vision API and how it can help your business with automating image analysis workflows and gaining valuable insights from your visual data. 

Read More  Streamline Your Models To Production With The Vertex AI Model Registry

Head to the interactive walkthrough tutorials in Python, Node.js, Go, and Java to see step-by-step how to access the API and learn more about all the features that you can integrate into your own applications! Again, this tutorial can be completed at no cost within the Google Cloud Free Tier.

By Alicia Williams, Developer Advocate
Originally published at Google Cloud

Source: Cyberpogo


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!
Share this article
Shareable URL
Prev Post

London & UK Is The Best! You Have Been Warned. Part 1-ish.

Next Post

A Better Way To Study Ocean Currents

Read next