Posts in tag

Machine Translation


CCMatrix is the largest data set of high-quality, web-based bitexts for training translation models. With more than 4.5 billion parallel sentences in 576 language pairs pulled from snapshots of the CommonCrawl public data set, CCMatrix is more than 50 times larger than the WikiMatrix corpus that we shared last year. Gathering a data set of this size …

Years ago, on a flight from Amsterdam to Boston, two American nuns seated to my right listened to a voluble young Dutchman who was out to discover the United States. He asked the nuns where they were from. Alas, Framingham, Massachusetts was not on his itinerary, but, he noted, he had  ‘shitloads of time and …

PyCon 2019 | Applied Deep Learning For NLP Using PyTorch Speaker: Elvis Saravia   Natural language processing (NLP) has experienced a rapid growth over the last few years and has become an important skill to build applications that range from social features to clinical and health solutions. In this tutorial, we will introduce PyTorch as …