Applying Generative AI To Product Design With BigQuery DataFrames

September 21, 2023

6 min read

For any company, naming a product or service is complex and time-consuming. This process is particularly challenging in the pharmaceutical industry. Typically, companies start by brainstorming and researching thousands of names. They must ensure that the names are unique, compliant with regulations, and easy to pronounce and remember. With so many factors to consider, multiplied across an entire product catalog, the process must be designed to scale.

In this blog post, we will show how the power of data analytics and generative AI can help unleash the creative process, and accelerate testing. We will provide a step-by-step guide on how to generate potential drug names using BigQuery DataFrames. Please note that this blog post simply illustrates the concepts and does not address any regulatory requirements.

Background

Our goal in this demonstration is to generate a set of 10 brand names that can be reviewed by a panel of experts for an imaginary generic drug called “Entropofloxacin”. Drugs with the suffix -floxacin belong to the fluoroquinolones class of antibiotics.

We’ll use the text-bison model, a large language model that has been trained on a massive dataset of text and code. It can generate text, translate languages, write different kinds of creative content, and answer all kinds of questions.

We will also provide these indications & usage to the model: “Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial infections, including: pneumonia, streptococcus infections, salmonella infections, escherichia coli infections, and pseudomonas aeruginosa infections. It is taken by mouth or by injection. The dosage and frequency of administration will vary depending on the type of infection being treated. It should be taken for the full course of treatment, even if symptoms improve after a few days. Stopping the medication early may increase the risk of the infection coming back.”

Getting started

In case you want to follow along, we will use code from this Drug Name Generation notebook in this blog post. We will highlight key steps here, leaving some details in the notebook.

We will be using BigQuery DataFrames to perform generative AI operations. It’s a brand new way to access BigQuery, providing a DataFrame interface that Python developers and data scientists are familiar with. It brings compute capabilities directly to your data in the Cloud, enabling you to process massive datasets. BigQuery DataFrames directly supports a wide variety of ML use cases, which we will showcase here.

Zero-shot learning

Let’s start with a base case, where we simply ask the model a question, through a prompt. No examples, no chains, just a simple request and response scenario.¨C12C

zero_shot_prompt = f"""Provide {NUM_NAMES} unique and modern brand names in Markdown bullet point format. Do not provide any additional explanation.

Be creative with the brand names. Don't use English words directly; use variants or invented words.

The generic name is: {GENERIC_NAME}

The indications and usage are: {USAGE}."""
print(zero_shot_prompt)

We can submit our prompt to the model using the `model.predict()` function. This function takes a dataframe input. In our simple scenario with a 1 string input and a 1 string output, I’ve created a helper function. This function creates a dataframe for the input string, and also extracts the string value from the returned dataframe. The function includes an optional parameter for temperature, to control the degree of randomness, which can be helpful in a creative context.

def predict(prompt: str, temperature: float = TEMPERATURE) -> str:
  # Create dataframe
  input = bigframes.pandas.DataFrame(
    {
      "prompt": [prompt],
    }
  )

# Return response
return model.predict(input,temperature).ml_generate_text_llm_result.iloc[0]

To get a response, we first need to create a model reference using a BigQuery connection. Then we can pass the prompt to our helper method.

# Get BigFrames session
session = bigframes.pandas.get_global_session()

# Define the model
model = PaLM2TextGenerator(session=session, connection_name=connection_name)

# Invoke LLM with prompt
response = predict(zero_shot_prompt)

# Print results as Markdown
Markdown(response)

And now, the exciting part. Here are several responses we get:

Xylocin
Zervox
Zarox
Zeroxy
Xerozid

These names might work! You might notice that the names are very similar. Well, that might not actually be a problem. According to “The art and science of naming drugs”: “The letters “X,” “Y” and “Z” often appear in brand names because they give a drug a high-tech, sciency sounding name (Xanax, Xyrem, Zosyn). Conversely, “H,” “J” and “W” are sometimes avoided because they are difficult to pronounce in some languages.”

Few-shot learning

Next, let’s try expanding on this base case by providing a few examples. This is referred to as few-shot learning, in which the examples provide a little more context to help shape the answer. It’s like providing some training data without retraining the whole model.

Fortunately, there is a public BigQuery FDA dataset available at bigquery-public-data.fda_drug that can help us with this task!

We can easily extract a few useful columns from the dataset into a dataframe using BigFrames:

df = bpd.read_gbq("bigquery-public-data.fda_drug.drug_label",
    col_order=["openfda_generic_name",
               "openfda_brand_name",
               "indications_and_usage"])

And it’s straightforward to sample the dataset for a few useful examples. Let’s run this code and peek at what we want to include in our prompt.

# Take a sample and convert to a Pandas dataframe for local usage.
df_examples = df.sample(NUM_EXAMPLES).to_pandas()

df_examples

We can create a more sophisticated prompt with 3 components:

General instructions (e.g. generate ð‘› brand names)
Multiple examples generated above
Information about the drug we’d like to generate a name for (entropofloxacin)

Our prompt will now look like this, truncating some sections for readability:

Provide 10 unique and modern brand names in Markdown bullet point format, related to the drug at the bottom of this prompt.

Be creative with the brand names. Don’t use English words directly; use variants or invented words.

First, we will provide 3 examples to help with your thought process.

Then, we will provide the generic name and usage for the drug we’d like you to generate brand names for.
Generic name: BUPRENORPHINE HYDROCHLORIDE
Usage: 1 INDICATIONS AND USAGE BELBUCA is indicated for the management of pain…
Brand name: Belbuca

Generic name: DROSPIRENONE/ETHINYL ESTRADIOL/LEVOMEFOLATE CALCIUM AND LEVOMEFOLATE CALCIUM
Usage: 1 INDICATIONS AND USAGE Safyral is an estrogen/progestin COC containing a folate…
Brand name: Safyral

Generic name: FLUOCINOLONE ACETONIDE
Usage: INDICATIONS AND USAGE SYNALARÂ® Solution is indicated for the relief of the inflammatory and pruritic manifestations of corticosteroid-responsive dermatoses.
Brand name: Synalar

Generic name: Entropofloxacin
Usage: Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial…
Brand names:

With this prompt, we see a much different set of brand names generated. With the examples included, we see that the model is anchored on the generic name.

Entrol
Entromycin
Entrozol
Entroflox
Entroxil
Entrosyn

Bulk generation

Now that we’ve learned the fundamentals of prompts & responses with BigQuery DataFrames, let’s explore generating names at scale. How can you generate candidate names when you have thousands of products? We can perform multiple operations in the Cloud without bringing the data into local memory within the notebook.

Let’s start with querying for drugs that don’t have a brand name in the FDA dataset. Technically, we are querying for drugs where the brand name and generic name match.

# Query 3 columns of interest from drug label dataset
df_missing = bpd.read_gbq("bigquery-public-data.fda_drug.drug_label",
  col_order=["openfda_generic_name",
             "openfda_brand_name",
             "indications_and_usage"])

# Exclude any rows with missing data
df_missing = df_missing.dropna()

# Include rows in which openfda_brand_name equals openfda_generic_name
df_missing = df_missing[
  df_missing["openfda_generic_name"] == df_missing["openfda_brand_name"]]

We’ll pass a whole dataframe column of prompts to BigFrames instead of a single string prompt. Let’s look at how we could construct this column.

df_missing["prompt"] = (
"Provide a unique and modern brand name related to this pharmaceutical drug." 
  + "Don't use English words directly; use variants or invented words. The generic name is: " 
  + df_missing["openfda_generic_name"] 
  + ". The indications and usage are: " 
  + df_missing["indications_and_usage"] 
  + "."
)

Next, let’s create a new helper function for batch prediction. We’ll use the column as-is without any transformation from/to strings.

def batch_predict(
input: bigframes.pandas.DataFrame, temperature: float = TEMPERATURE
) -> bigframes.pandas.DataFrame:
return model.predict(input, temperature).ml_generate_text_llm_result

response = batch_predict(df_missing["prompt"])

After the operation completes, let’s take a look at one of the generated brand names for “alcohol free hand sanitizer”:

**Sani-Tize**

This is a modern and unique brand name for an alcohol-free hand sanitizer. It is derived from the words “sanitize” and “tize”, which give it a scientific and technical feel. The name is also easy to spell and pronounce, making it memorable and easy to market.

In this scenario, we saw that Generative AI is a powerful tool for accelerating the branding process. While we walked through a pharmaceutical drug name scenario, these concepts could be applied to any industry. We also saw that BigQuery puts all of the tools in one place for multiple prompting styles, all with an intuitive DataFrame interface.

Enjoy applying these creative tools to your next project! For more information, feel free to check out the quickstart documentation.

By: Karl Weinmeister (Engineering Manager)
Originally published at: Google Cloud Blog

Source: cyberpogo.com

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

liwaiwai

Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes

How InstaDeep Used Cloud TPU V4 To Help Sustainable Agriculture

December 24, 2022

You are what you eat. We’ve all been told this, but the truth is what we eat is often more complex than we are –…

6 min read

Google I/O 2019 | Machine Learning for Game Developers

June 11, 2019

Google I/O 2019 | Machine Learning for Game Developers Machine learning is enabling game developers to solve…

1 min read

Google Cloud Next 2019 | Data Management: The New Best Practice for Incident Response

May 9, 2019

Google Cloud Next 2019 | DevOps & SRE Sessions Google Cloud Next 2019 | Data Management: The New Best…

1 min read

AI Reveals First Direct Observation Of Rupture Propagation During Slow Quakes

December 16, 2020

Using a trained neural network and data from the North Anatolian Fault in Turkey, a research team led by Los…

2 min read

Applying Generative AI To Product Design With BigQuery DataFrames

Background

Getting started

Zero-shot learning

Few-shot learning

Bulk generation

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes

Huawei: Advancing a Flourishing AI Ecosystem Together

Confronting the AI/energy conundrum

Building secure, scalable AI in the cloud with Microsoft Azure

Robotic probe quickly measures key properties of new materials

Confronting the AI/energy conundrum

Despite Protests, Elon Musk Secures Air Permit for xAI

From Sensual Butt Songs to Santa’s Alleged Coke Habit: AI Slop Music Is Getting Harder to Avoid

Here’s What Mark Zuckerberg Is Offering Top AI Talent

A Pro-Russia Disinformation Campaign Is Using Free AI Tools to Fuel a ‘Content Explosion’

Livestream Replay: Beginner Advice for Claude, a ChatGPT Alternative

These Transcribing Eyeglasses Put Subtitles on the World

What Could a Healthy AI Companion Look Like?

Here Is Everyone Mark Zuckerberg Has Hired So Far for Meta’s ‘Superintelligence’ Team

Applying Generative AI To Product Design With BigQuery DataFrames

Background

Getting started

Zero-shot learning

Few-shot learning

Bulk generation

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Share this article

Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes

Huawei: Advancing a Flourishing AI Ecosystem Together

Read next