Chinese search engine giant Baidu has launched what it says is the world’s largest Chinese natural language processing (NLP) database, among several other artificial intelligence (AI) products, as it seeks to diversify its revenue sources.

NLP is a branch of AI involved in making computers understand the way humans naturally talk and type online, turning such information into structured data for further analysis.

The project, called Qian Yan – or “thousand words” in Chinese – is a collaboration with industry group China Computer Federation meant to help the industry cope with a lack of computing capacity and linguistic data, which are both barriers to the development of NPL technology, Baidu said in a press release on Tuesday.

Data scientists from 11 local universities and enterprises contributed to Qian Yan’s first phase, which includes 20 open source Chinese data sets and covers seven major machine learning tasks such as reading comprehension and open-domain dialogue systems used in chatbots, the press release said.

“In the future, we hope that more data scientists can participate in Qian Yan, jointly promote the progress of Chinese information processing technology, and build a worldwide Chinese information processing influence,”﻿ said Wuhua, chairman of the Baidu Technical Committee. ﻿

She added that the company aimed to construct at least 100 Chinese NLP data sets that can carry out more than 20 tasks within the next three years.

The ambitious project comes as Baidu seeks to diversify its revenue mix, amid a shift in internet usage patterns that has chipped away at its dominance in the search engine industry.

The company reported a 5 per cent drop in online marketing revenue year-on-year in 2019 to 78.1 billion yuan (S$15 billion), as it faced rising competition from self-contained, super-app ecosystems like Tencent Holding’ WeChat as well as short-video platforms like Tencent-backed Kuaishou and ByteDance’s Douyin.

Baidu CEO Robin Li said earlier this month that the company’s new AI-related businesses, including cloud, smart devices and smart transport will contribute to revenue growth in coming years.

The company made a breakthrough in the NLP field in December last year, when its language model, ERNIE (Enhanced Representation through kNowledge IntEgration), beat Microsoft and Google in an ongoing natural language processing competition.

On Tuesday, Baidu also launched several other NLP-related products including TextMind, an intelligent document analysis platform that uses ERNIE to help users analyse and compare documents.

“We have been continually committed to integrating natural language processing technology into platforms and products, which will generate a lot of value in applications,” said Wu Tian, Baidu’s vice-president.

Other tech giants are also venturing into the field, however. Alibaba Group Holding – the parent company of the Post – has been developing NLP systems for several years, and said in June last year that its model beat humans in a reading comprehension test developed by Microsoft.

Tencent also has a range of AI-powered models, including one that can read text from images of a wide range of documents such as identification cards and driving licences.

