Intelligence, Inside and Outside.

Applying Generative AI To Product Design With BigQuery DataFrames

For any company, naming a product or service is complex and time-consuming. This process is particularly challenging in the pharmaceutical industry. Typically, companies start by brainstorming and researching thousands of names. They must ensure that the names are unique, compliant with regulations, and easy to pronounce and remember. With so many factors to consider, multiplied across an entire product catalog, the process must be designed to scale.

In this blog post, we will show how the power of data analytics and generative AI can help unleash the creative process, and accelerate testing. We will provide a step-by-step guide on how to generate potential drug names using BigQuery DataFrames. Please note that this blog post simply illustrates the concepts and does not address any regulatory requirements.

Background

Our goal in this demonstration is to generate a set of 10 brand names that can be reviewed by a panel of experts for an imaginary generic drug called “Entropofloxacin”. Drugs with the suffix -floxacin belong to the fluoroquinolones class of antibiotics.

We’ll use the text-bison model, a large language model that has been trained on a massive dataset of text and code. It can generate text, translate languages, write different kinds of creative content, and answer all kinds of questions.

We will also provide these indications & usage to the model: “Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial infections, including: pneumonia, streptococcus infections, salmonella infections, escherichia coli infections, and pseudomonas aeruginosa infections. It is taken by mouth or by injection. The dosage and frequency of administration will vary depending on the type of infection being treated. It should be taken for the full course of treatment, even if symptoms improve after a few days. Stopping the medication early may increase the risk of the infection coming back.”

Getting started

In case you want to follow along, we will use code from this Drug Name Generation notebook in this blog post. We will highlight key steps here, leaving some details in the notebook.

We will be using BigQuery DataFrames to perform generative AI operations. It’s a brand new way to access BigQuery, providing a DataFrame interface that Python developers and data scientists are familiar with. It brings compute capabilities directly to your data in the Cloud, enabling you to process massive datasets. BigQuery DataFrames directly supports a wide variety of ML use cases, which we will showcase here.

Read More  Harmonizing AI-Enhanced Physical And Cloud Operations

Zero-shot learning

Let’s start with a base case, where we simply ask the model a question, through a prompt. No examples, no chains, just a simple request and response scenario.¨C12C

zero_shot_prompt = f"""Provide {NUM_NAMES} unique and modern brand names in Markdown bullet point format. Do not provide any additional explanation.

Be creative with the brand names. Don't use English words directly; use variants or invented words.

The generic name is: {GENERIC_NAME}

The indications and usage are: {USAGE}."""
print(zero_shot_prompt)

We can submit our prompt to the model using the `model.predict()` function. This function takes a dataframe input. In our simple scenario with a 1 string input and a 1 string output, I’ve created a helper function. This function creates a dataframe for the input string, and also extracts the string value from the returned dataframe. The function includes an optional parameter for temperature, to control the degree of randomness, which can be helpful in a creative context.

def predict(prompt: str, temperature: float = TEMPERATURE) -> str:
  # Create dataframe
  input = bigframes.pandas.DataFrame(
    {
      "prompt": [prompt],
    }
  )

# Return response
return model.predict(input,temperature).ml_generate_text_llm_result.iloc[0]

To get a response, we first need to create a model reference using a BigQuery connection. Then we can pass the prompt to our helper method.

# Get BigFrames session
session = bigframes.pandas.get_global_session()

# Define the model
model = PaLM2TextGenerator(session=session, connection_name=connection_name)

# Invoke LLM with prompt
response = predict(zero_shot_prompt)

# Print results as Markdown
Markdown(response)

And now, the exciting part. Here are several responses we get:

Xylocin
Zervox
Zarox
Zeroxy
Xerozid

These names might work! You might notice that the names are very similar. Well, that might not actually be a problem. According to “The art and science of naming drugs”: “The letters “X,” “Y” and “Z” often appear in brand names because they give a drug a high-tech, sciency sounding name (Xanax, Xyrem, Zosyn). Conversely, “H,” “J” and “W” are sometimes avoided because they are difficult to pronounce in some languages.”

Few-shot learning

Next, let’s try expanding on this base case by providing a few examples. This is referred to as few-shot learning, in which the examples provide a little more context to help shape the answer. It’s like providing some training data without retraining the whole model.

Read More  How Businesses Should Respond To The EU’s Artificial Intelligence Act

Fortunately, there is a public BigQuery FDA dataset available at bigquery-public-data.fda_drug that can help us with this task!

We can easily extract a few useful columns from the dataset into a dataframe using BigFrames:

df = bpd.read_gbq("bigquery-public-data.fda_drug.drug_label",
    col_order=["openfda_generic_name",
               "openfda_brand_name",
               "indications_and_usage"])

And it’s straightforward to sample the dataset for a few useful examples. Let’s run this code and peek at what we want to include in our prompt.

# Take a sample and convert to a Pandas dataframe for local usage.
df_examples = df.sample(NUM_EXAMPLES).to_pandas()

df_examples

We can create a more sophisticated prompt with 3 components:

  • General instructions (e.g. generate 𝑛 brand names)
  • Multiple examples generated above
  • Information about the drug we’d like to generate a name for (entropofloxacin)

Our prompt will now look like this, truncating some sections for readability:


Provide 10 unique and modern brand names in Markdown bullet point format, related to the drug at the bottom of this prompt.

Be creative with the brand names. Don’t use English words directly; use variants or invented words.

First, we will provide 3 examples to help with your thought process.

Then, we will provide the generic name and usage for the drug we’d like you to generate brand names for.
Generic name: BUPRENORPHINE HYDROCHLORIDE
Usage: 1 INDICATIONS AND USAGE BELBUCA is indicated for the management of pain…
Brand name: Belbuca

Generic name: DROSPIRENONE/ETHINYL ESTRADIOL/LEVOMEFOLATE CALCIUM AND LEVOMEFOLATE CALCIUM
Usage: 1 INDICATIONS AND USAGE Safyral is an estrogen/progestin COC containing a folate…
Brand name: Safyral

Generic name: FLUOCINOLONE ACETONIDE
Usage: INDICATIONS AND USAGE SYNALAR® Solution is indicated for the relief of the inflammatory and pruritic manifestations of corticosteroid-responsive dermatoses.
Brand name: Synalar

Generic name: Entropofloxacin
Usage: Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial…
Brand names:


With this prompt, we see a much different set of brand names generated. With the examples included, we see that the model is anchored on the generic name.

Entrol
Entromycin
Entrozol
Entroflox
Entroxil
Entrosyn

Bulk generation

Now that we’ve learned the fundamentals of prompts & responses with BigQuery DataFrames, let’s explore generating names at scale. How can you generate candidate names when you have thousands of products? We can perform multiple operations in the Cloud without bringing the data into local memory within the notebook.

Read More  Got Your Eye On AI? Try These 6 Interactive Tutorials

Let’s start with querying for drugs that don’t have a brand name in the FDA dataset. Technically, we are querying for drugs where the brand name and generic name match.

# Query 3 columns of interest from drug label dataset
df_missing = bpd.read_gbq("bigquery-public-data.fda_drug.drug_label",
  col_order=["openfda_generic_name",
             "openfda_brand_name",
             "indications_and_usage"])

# Exclude any rows with missing data
df_missing = df_missing.dropna()

# Include rows in which openfda_brand_name equals openfda_generic_name
df_missing = df_missing[
  df_missing["openfda_generic_name"] == df_missing["openfda_brand_name"]]

We’ll pass a whole dataframe column of prompts to BigFrames instead of a single string prompt. Let’s look at how we could construct this column.

df_missing["prompt"] = (
"Provide a unique and modern brand name related to this pharmaceutical drug." 
  + "Don't use English words directly; use variants or invented words. The generic name is: " 
  + df_missing["openfda_generic_name"] 
  + ". The indications and usage are: " 
  + df_missing["indications_and_usage"] 
  + "."
)

Next, let’s create a new helper function for batch prediction. We’ll use the column as-is without any transformation from/to strings.

def batch_predict(
input: bigframes.pandas.DataFrame, temperature: float = TEMPERATURE
) -> bigframes.pandas.DataFrame:
return model.predict(input, temperature).ml_generate_text_llm_result

response = batch_predict(df_missing["prompt"])

After the operation completes, let’s take a look at one of the generated brand names for “alcohol free hand sanitizer”:


**Sani-Tize**

This is a modern and unique brand name for an alcohol-free hand sanitizer. It is derived from the words “sanitize” and “tize”, which give it a scientific and technical feel. The name is also easy to spell and pronounce, making it memorable and easy to market.


In this scenario, we saw that Generative AI is a powerful tool for accelerating the branding process. While we walked through a pharmaceutical drug name scenario, these concepts could be applied to any industry. We also saw that BigQuery puts all of the tools in one place for multiple prompting styles, all with an intuitive DataFrame interface.

Enjoy applying these creative tools to your next project! For more information, feel free to check out the quickstart documentation.

By: Karl Weinmeister (Engineering Manager)
Originally published at: Google Cloud Blog

Source: cyberpogo.com


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!
Share this article
Shareable URL
Prev Post

Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes

Next Post

Huawei: Advancing a Flourishing AI Ecosystem Together

Read next