Artificial Intelligence Can Write Wikibooks
January 14, 2019
MIT Technology Review – “Machine Learning—The Complete Guide” is a weighty tome. At more than 6,000 pages, this book is a comprehensive introduction to machine learning, with up-to-date chapters on artificial neural networks, genetic algorithms and machine vision.
But this is no ordinary publication. It is a Wikibook, a textbook that anyone can access or edit, made up from articles on Wikipedia, the vast online encyclopedia.
Crowdsourced information is constantly updated with all the latest advances and consistently edited to correct errors and ambiguities. But, since Wikipedia is so vast, it can be hard to decide what to include and not include, and editing the content can take a lot of time.
That’s why Shahar Admati, of BGU’s Department of Software and Information Systems Engineering, and her colleagues at BGU have developed a way to automatically generate Wikibooks using machine learning. They call their machine the Wikibook-bot. “The novelty of our technique is that it is aimed at generating an entire Wikibook without human involvement,” says Admati.
The approach is relatively straightforward. The researchers began by identifying a set of existing Wikibooks that can act as a training data set.
Since these Wikibooks form a kind of gold standard both for training and testing, the team needed a way to ensure their quality. “We chose to concentrate on Wikibooks that were viewed at least 1,000 times, based on the assumption that popular Wikibooks are of a reasonable quality,” Admati says.
The team then divided the task of creating a Wikibook into several parts, each of which requires a different machine-learning skill. The task begins with a title generated by a human, describing a concept of some kind, such as “Machine Learning—The Complete Guide.”
The first task is to sort through the entire set of Wikipedia articles to determine which are relevant enough to include. “This task is challenging due to the sheer volume of articles that exist in Wikipedia and the need to select the most relevant articles among millions of available articles,” says Admati.
The team created an algorithm that looked at each article and automatically determined whether including it in a Wikibook would make the network structure more similar to human-generated books or not. If not, the article was left out.
Once the articles were selected, it was time to put them into chapters and the final step was to determine the order in which the articles should appear in each chapter.
What’s next? Admati and her team plan to produce a range of Wikibooks on subjects not yet covered by human-generated books. They will then monitor the page views and edits to these books to see how popular they become and how heavily they are edited, compared with human-generated books. “This will be a real-world test for our approach,” says Admati.