Artificial intelligence feeds on data, and data is piling up from increasingly cheap sensors and surging Internet use: videos, images, text; time series data, machine data; structured, unstructured and semi-structured data. And while AI is currently confined to narrow problems in discreet domains, the ambition of machine-learning researchers globally is to write algorithms that can cross domains, transferring learning from one kind of data to another.
Show a computer vision system millions of X-rays of confirmed lung cancer patients and the system will become an expert at diagnosing lung cancer from X-rays.
Around the world, hordes of unskilled workers are annotating data used to train such machine-learning models. Images, videos, audio and text are being labeled by working mothers in Madagascar, migrant workers in Beijing, uneducated young men in India and otherwise unemployed autistic adults in the United States. But what are they doing, exactly?
Besides tagging objects in an image – this is a car, that is a person – or flagging child pornography in videos, or identifying verbs in a block of text or instruments in music, this growing yet disparate army, soon to be millions of people, are filling vast data lakes with meaning. These lakes are not yet connected, but once filled, they will remain indefinitely. Eventually, canals will be dug between them and, at some point, the lakes will become seas and then oceans of human understanding in digital form. That data will inform ever more sophisticated machine-learning models, which are already drinking in knowledge and making decisions based on what they learn. It’s a remarkable endeavor that will change human life forever.
Meaning is a relationship between two kinds of things: signs and the things they express or signify. To an infant, an elephant is not an ‘elephant’ until it is named and only then does it take on meaning. To a computer, an elephant is even less: nothing more than an array of light waves hitting a digital image sensor that converts those waves into numbers stored on a memory chip. It isn’t until a human tells the computer what those numbers are that a supervised learning system can begin to use that information in a meaningful way.
So, the woman in Madagascar, the worker in Beijing, the man in India and the autistic adult in the U.S. are effectively encoding human knowledge click by click so that that knowledge can be transmitted to rudimentary electronic brains. The brains, made up of massive blocks of recursive computer code, may yet be rudimentary but they can already recognize patterns or identify features – that spot on a lung in an X-ray image, for example – faster and more accurately than any human.
AI systems, meanwhile, are being built to manufacture labeled data synthetically, creating virtual cities, for example, to train computer-vision systems for autonomous vehicles, or spinning endless strings of virtual time series to train financial-market prediction models. Synthesizers can spin up endless amounts of data, particularly for so-called corner cases that are rare in real life. In time, there will be many times more synthetic data, which is cheaper and quicker to produce, than so-called ground-truth, hand-labeled data.
But hand-labeled data will continue to be the gold standard: knowledge painstakingly transferred from human to machine on training data platforms, software designed to allow people scattered around the world to work on the same data sets. Lakes become seas and seas become oceans.
As algorithms improve, what computers can do with that reservoir of labeled data will expand exponentially. It’s already starting to happen: transfer learning algorithms can apply what they’ve learned from one dataset to another. The unaddressed challenge is building models that can cross modalities, learning from video, audio and text.
Labeled data ties modalities together: natural language processing to computer vision, for example. Show a computer-vision model an image and it can give you the correct natural-language label, or show the computer model a word and it can give you a correct corresponding image. Researchers are working on multimodal systems that can fuse meaning between images and text, learning from visual data and applying that learning to language or vice versa.
Supervised learning is constrained to relatively narrow domains defined largely by labeled data.
Humans, of course, learn mostly without labels. Everyone agrees that computers will have to go beyond supervised learning to reach the Holy Grail of human-level intelligence.
There is reinforcement learning, which does not rely on labeled data and is modeled after reward-driven learning in the brain. Set a goal for a reinforcement learning system and it will work toward that goal through trial and error until it is consistently receiving a reward like a rat pushing a lever to receive a pellet of food.
There is self-supervised learning, which depends on massive amounts of unlabeled data to accumulate enough background knowledge that some sort of common sense can emerge.
But so far supervised learning works the best and so the data mountains will continue to be worked into labeled data, with training data platforms acting as the ore crushers and sluice boxes and smelters of machine-readable understanding.
The great minds behind the algorithms win awards and are recorded in the history books, but the hard labor of artificial intelligence is provided anonymously by a global army of human labelers; mothers and sons and fathers and sisters, filling ponds and lakes and seas with meaning. If mankind and machines ever reach the fabled singularity, the oceans of knowledge that they have filled are what will lead us first to human-level intelligence.
Manu Sharma is an aerospace engineer who previously worked at computer vision companies DroneDeploy and Planet Labs where he spent much of his time building in-house infrastructure for deep learning models. He is now co-founder of Labelbox, a training data platform for deep learning systems.
This article is republished from hackernoon.com