Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
Liwaiwai Liwaiwai
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
    • Architecture
    • Design
    • Software
    • Hybrid Cloud
    • Data
  • About
  • Artificial Intelligence

Dynatask, A New Paradigm Of AI Benchmarking Is Now Available For The AI Community

  • October 1, 2021
  • relay

It’s been one year since Facebook AI launched Dynabench, a first-of-its-kind platform that radically rethinks benchmarking in AI. Starting today, we’re unlocking Dynabench’s full capabilities for the AI community — AI researchers can now create their own custom tasks to better evaluate the performance of natural language processing (NLP) models in more flexible, dynamic, and realistic settings for free.

This new feature, called Dynatask, makes it easy for researchers to leverage human annotators to actively fool NLP models and identify weaknesses through natural interactions. This dynamic approach arguably better reflects the way people behave and react as compared with previous benchmarks, which test against fixed data points and are prone to saturation. Researchers can also use our evaluation-as-a-service capabilities and compare models on our dynamic leaderboard, which goes beyond just accuracy and explores a more holistic measurement of fairness, robustness, compute, and memory.

Dynabench initially launched with four tasks: natural language inference (created by Yixin Nie and Mohit Bansal of UNC Chapel Hill, question answering (created by Max Bortolo, Pontus Stenetorp, and Sebastian Riedel of UCL), sentiment analysis (created by Atticus Geiger and Chris Potts of Stanford), and hate speech detection (Bertie Vidgen of Turing Institute and Zeerak Waseem Talat of University of Sheffield/Simon Fraser University).

Over the past year, we’ve launched a visual question answering task and low-resource machine translation tasks. We also powered the multilingual translation challenge at the Workshop for Machine Translations. Cumulatively, the dynamic data collection efforts have, so far, resulted in eight published papers, 400K raw examples, and four open source large-scale data sets.

“Dynatask opens up a world of possibilities for task creators. They can set up their own tasks with little coding experience, easily customize annotation interfaces, and enable interactions with models hosted on Dynabench. This makes dynamic adversarial data collection considerably more accessible to the research community,” Max Bartolo, University College London.

Now, we hope that by enabling custom NLP tasks for the entire AI community, we’ll empower the field to explore entirely new research directions. High-quality and holistic model evaluation is critical to the long term success of AI, and we believe that Dynabench as a collaborative effort will play an important role in the future of benchmarking.

Read More  Darktrace Releases Attack Path Modeling Research

How to use Dynatask

Dynatask is highly flexible and customizable. A single task can have one or more owners, who define the settings of each task. For example, owners can choose which existing data sets they want to use in the evaluation-as-a-service framework. They can select from a wide variety of evaluation metrics to measure model performance, including not only accuracy but also robustness, fairness, compute, and memory. Anyone can upload models to a task’s evaluation cloud, where scores and other metrics are computed on the selected data sets. Once those models have been uploaded and evaluated, they can be placed in the loop for dynamic data collection and human-in-the-loop evaluation. Task owners can also collect data via the web interface on dynabench.org or with annotators (such as Mechanical Turk).

Let’s walk through a concrete example that illustrates the different components. Suppose there were no Natural Language Inference tasks yet, and you wanted to start one.

  • Step 1: Log into your Dynabench account and fill out the “Request new task” form on your profile page.

  • Step 2:Once approved, you will have a dedicated task page and corresponding admin dashboard that you control, as the task owner.

  • Step 3: On the dashboard, choose the existing datasets that you want to evaluate models on when they are uploaded, along with the metrics you want to use for evaluation.

  • Step 4: Next, submit baseline models, or ask the community to submit them.

  • Step 5: If you then want to collect a new round of dynamic adversarial data, where annotators are asked to create examples that fool the model, you can upload new contexts to the system and start collecting data through the task owner interface.

  • Step 6: Once you have enough data and find that training on the data helps improve the system, you can upload better models and then put those in the data collection loop to build even stronger ones.

Read More  How AI And Weather Data Can Help You Plan For Allergy Season

 

We used the same basic process to construct several dynamic data sets, like Adversarial Natural Language Inference. Now, with our tools available for the broader AI community, anyone can construct data sets with humans and models in the loop.

Get started with Dynatask now

Dynabench is centered on community. We want to empower the AI community to explore better, more holistic, and more reproducible approaches to model evaluation. Our goal is to make it easy for anyone to construct high quality human-and-model-in-the-loop data sets. With Dynatask, you can move beyond accuracy-only leaderboards toward a more holistic evaluation of AI models that are more closely aligned with the expectations and needs of the people who interact with them.

At Facebook AI, we believe in collaborative open science, scientific rigor, and responsible innovation. Of course, the platform will continue to evolve and change as the community grows, befitting its dynamic nature. We invite you to join our Dynabench community.

Create new examples that fool existing models, upload new models for evaluation, or request your own Dynabench task now.

Go to Tasks under your user profile to start creating a task today

By Tristan Thrush, Research Associate | Adina Williams, Research Scientist | Douwe Kiela, Research Scientist
Source Facebook AI Research

relay

Related Topics
  • Dynatask
  • Facebook AI
  • Facebook AI Research
You May Also Like
View Post
  • Artificial Intelligence
  • Technology

Unlocking The Secrets Of ChatGPT: Tips And Tricks For Optimizing Your AI Prompts

  • March 29, 2023
View Post
  • Artificial Intelligence
  • Technology

Try Bard And Share Your Feedback

  • March 29, 2023
View Post
  • Artificial Intelligence
  • Data
  • Data Science
  • Machine Learning
  • Technology

Google Data Cloud & AI Summit : In Less Than 12 Hours From Now

  • March 29, 2023
View Post
  • Artificial Intelligence
  • Technology

Talking Cars: The Role Of Conversational AI In Shaping The Future Of Automobiles

  • March 28, 2023
View Post
  • Artificial Intelligence
  • Tools

Document AI Introduces Powerful New Custom Document Classifier To Automate Document Processing

  • March 28, 2023
View Post
  • Artificial Intelligence
  • Design
  • Practices

How AI Can Improve Digital Security

  • March 27, 2023
View Post
  • Artificial Intelligence
  • Machine Learning
  • Technology

ChatGPT 4.0 Finally Gets A Joke

  • March 27, 2023
View Post
  • Artificial Intelligence
  • Machine Learning
  • Technology

Mr. Cooper Is Improving The Home-buyer Experience With AI And ML

  • March 24, 2023

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay Connected!
LATEST
  • 1
    Unlocking The Secrets Of ChatGPT: Tips And Tricks For Optimizing Your AI Prompts
    • March 29, 2023
  • 2
    Try Bard And Share Your Feedback
    • March 29, 2023
  • 3
    Google Data Cloud & AI Summit : In Less Than 12 Hours From Now
    • March 29, 2023
  • 4
    Talking Cars: The Role Of Conversational AI In Shaping The Future Of Automobiles
    • March 28, 2023
  • 5
    Document AI Introduces Powerful New Custom Document Classifier To Automate Document Processing
    • March 28, 2023
  • 6
    How AI Can Improve Digital Security
    • March 27, 2023
  • 7
    ChatGPT 4.0 Finally Gets A Joke
    • March 27, 2023
  • 8
    Mr. Cooper Is Improving The Home-buyer Experience With AI And ML
    • March 24, 2023
  • 9
    My First Pull Request At Age 14
    • March 24, 2023
  • 10
    The 5 Podcasts To Check If You Want To Get Up To Speed On AI
    • March 24, 2023

about
About
Hello World!

We are liwaiwai.com. Created by programmers for programmers.

Our site aims to provide materials, guides, programming how-tos, and resources relating to artificial intelligence, machine learning and the likes.

We would like to hear from you.

If you have any questions, enquiries or would like to sponsor content, kindly reach out to us at:

[email protected]

Live long & prosper!
Most Popular
  • 1
    GPT-4 : The Latest Milestone From OpenAI
    • March 24, 2023
  • 2
    Ditching Google: The 3 Search Engines That Use AI To Give Results That Are Meaningful
    • March 23, 2023
  • 3
    Peacock: Tackling ML Challenges By Accelerating Skills
    • March 23, 2023
  • 4
    Coop Reduces Food Waste By Forecasting With Google’s AI And Data Cloud
    • March 23, 2023
  • 5
    Gods In The Machine? The Rise Of Artificial Intelligence May Result In New Religions
    • March 23, 2023
  • /
  • Artificial Intelligence
  • Machine Learning
  • Robotics
  • Engineering
  • About

Input your search keywords and press Enter.