Sciling helps reveal problems in the collaborative economy via Natural Language Processing
Asher and Lyric Fergusson are two bloggers and researchers specialising in travel who love to journey around the world with their two small children and share useful content for other travellers such as travel guides, advice and research. Their articles and studies have been published in prestigious media such as The New York Times, The Guardian, CNN, Forbes, National Geographic, Lonely Planet and many more.
In September 2017, they experienced two consecutive nightmares with Airbnb that left them on the streets of Paris with a 10-month-old baby; scared, vulnerable and with nowhere to go. This led them to wonder if Airbnb was safe, so when they got home they decided to analyse over 1,000 reviews from guests to discover the level of safety in booking accommodation on the platform and what the most common problems were.
The results from their study had such repercussions in the media that ASIS International and the research department in the John Jay College of Criminal Justice at the City University of New York offered to back and fund a more comprehensive study into the possible hidden threats to safety behind two of the most controversial success stories in the sharing economy: Airbnb and Uber. This more intensive study into the hidden problems of the two companies would mean analysing the tweets in their customer service accounts. But Asher knew that classifying millions of conversations by hand would be a titanic task.
About the client
Asher and Lyric Fergusson’s blog, in their own words, “helps hundreds of thousands of people every month to stay safe, healthy and happy at home and while they travel”. The John Jay College research department and the ASIS Association are committed to progress in the most significant problems facing the world today in areas such as public safety and social justice.
More and more, the social networks are becoming an environment to express feelings, opinions, complaints and to denounce abusive behaviour. That is the case of Twitter, where users have realised that writing a tweet mentioning a brand and expressing their problem is a direct way to urge the company to get in touch with them, while also being a way of warning other users.
With this is mind, Asher came to the conclusion that a good way of discovering the most common problems to which Airbnb and Uber users are exposed would be to analyse the tweets in these companies’ customer service accounts. Given the volume of data he would have to process, he realised that the only option was to automate this task using artificial intelligence, and more specifically, Natural Language Processing—a specialisation which is at the core of Sciling.
Twitter is one of the most popular social networks in the world. That makes it one of the biggest sources of information available on the Internet. Nevertheless, there are certain peculiarities in this social network that make it difficult to extract knowledge. “In addition to the high volume of information, the short length of the tweets also poses a challenge. Since micro-texts don’t provide enough occurrences of words, traditional classification methods are of limited use for them. Without using more ingenious techniques, the results obtained won’t reach the level of precision required,” explains Vicent Alabau, Director of Operations at Sciling.
In addition to ensuring a greater rate of success, the process for implementation suggested by Sciling enabled them to categorise the more than two million tweets accumulated in the customer service accounts of Airbnb and Uber in the shortest time possible and with the least effort possible.
After four iterations of the process, a classifier was obtained that enables the Airbnb and Uber users’ most recurring problems to be identified. Not only that, but with the resulting information it is also possible to carry out studies on the appearance of certain words or to analyse the evolution of certain problems over time.
But the possibilities of these types of algorithms go far beyond obtaining information relevant to the sphere of safety and security. The researchers from the John Jay College point out the following in their study: “Examining data from the social networks could be a proactive way of avoiding a loss of reputation. A loss of reputation is difficult to quantify, but it is a primordial concern for any organisation. The organisation runs the risk of losing not only the client who had a bad experience, but everybody who read the tweet.”
“The technology developed is useful for any client-centric business. Not only does it allow the problems that users have with a particular service to be discerned; it also makes it possible for any company to develop a customer-centric strategy to analyse their clients’ feelings, anticipate their problems, discover trends, learn their level of satisfaction and many other applications,” explains Antonio Salas, Director of Marketing at Sciling.
Airbnb tweets analysed
Uber tweets analysed
“The research profile of Sciling’s staff impressed me and gave me confidence that they would figure out how to make this project a big success no matter what obstacles we might come across.”Asher FergussonAsher & Lyric
“With the help of artificial intelligence, large amounts of data from the social networks can be analysed efficiently and effectively.”Chelsea A. BinnsAssistant Professor John Jay College of Criminal Justice
The implementation process
In order to achieve the utmost precision possible despite the limitations caused by the brief texts used in micro-blogging, a method of work was set up based on a cyclic process that fine-tuned the algorithm every time it ran.
The process begins with the discovery phase in which 100 or so of the tweets are analysed manually in order to create an initial set of categories. Next, using the category detected in the previous phase, about 1,500 tweets are labelled. These make up the dataset that the algorithm will use to automatically assign categories to the rest in a subsequent stage, known as the training stage. Once the algorithm has been trained, it is applied to the entire dataset in order to classify it completely.
The process could have ended there. However, Sciling introduced a couple of additional steps that had to be carried out cyclically until the desired precision was achieved and good results obtained. Once all of the tweets are classified, the categories are identified for which the algorithm was less successful, with less data and more noise. Then, these categories are validated by hand, indicating only whether the category assigned is the right one or not. This process, which is much less costly in terms of time and resources than the manual labelling in the previous stage, enables the system to be trained again and obtain better results in the next iteration.
- Natural language processing technologies are part of our core work.
- Our team of researchers is used to using the most groundbreaking technologies to solve any operational challenge.
- We have over 15 years’ experience in carrying out projects involving research into Machine Learning and its applications.