The Singapore Government has been active in promoting Smart Nation and Digitalization. There are benefits with digitalization. As more data are gathered, it will give the government more insights on how policies should be structured. Below are examples of how GovTech data scientists have been deployed to solve problems for the citizens.
1) Circle Line Rogue Train 2016 using Visual Analytics
Back in 2016, trains were breaking down every few weeks and the engineers could not identify the causes of the breakdown. In come the data scientists who gave a few possible hypothesis through the data that was given to them. The engineers narrowed down the investigation using those hypothesis and was finally able to identify a rogue train which was causing the breakdowns.
A detailed blog post on how they approach towards the data was published online which can be read here.
2) Tributes to Lee Kuan Yew 2015 using Topic Modelling
There were 50,000+ tributes written online by netizens. Let’s try to categorize them and summarize them into a report. The easiest way to categorize the tributes is through word cloud, which is what we see below.
As with most word cloud, the words in condolences are usually ‘rest in peace’ and you will see the word ‘rest’ and ‘peace’ appearing the most often in the Word Counts to the left. It don’t really give much insight from the data given.
If we were to perform a Noun Counts, the three most frequent nouns are ‘kampong’, ‘airport’ and ‘nationaldayparade’. Lee Kuan Yew is most notable on the establishment of Changi International Airport, transforming Singapore from it’s ‘kampong’ to HDBs, and an inspiring figure always during National Day.
A different approach is to do a Topic Modelling. Below are the results from the Topic Modelling.
Most of the tributes are general condolences, follow by mention of contribution to Singapore, offers of gratitude, sharing of personal experiences, expression of emotion and feelings, touching on the place in history and the legacy he has left. Let’s take a look at the word cloud after Topic Modelling has been employed, we can see the more frequent words that have been expressed.
Typically in tributes, we do not just give one category of tribute alone. We tend to give our gratitude or personal experience together, as well as expressing certain feelings. Below we can see the relationship between the different categories. The size of the circle also reflect the frequency of occurrences.
3) HDB Emails Analysis using Clustering
HDB receive as many as 100,000 emails on HDB flat sales alone between 2014 – 2015. Excluding public holidays and weekends, that come out to 200 emails on average a day. There are certain seasonal trends that email come in, especially prior to or during the sales launch period. You will have at least 2 – 3 staff just replying emails full time.
HDB decide to just do something about it, and try to discover what insights they can garner from the vast amount of emails. Machine learning technique of Clustering on Key Collections was employed to understand what people are asking. There are 4 main clusters identified by the machine learning algorithm:
With this data, HDB understood that they have staff just helping to handle customer request on arranging the dates for key collection. And in Jan 2017, they launched the online system for key collection. Customers can change the dates easily online using this system. Here is a brief sharing on how HDB launch their online system for key collection using data.
Moving forward, I feel data will play a greater role in how policies or new initiatives are formulated. Currently a lot of time is spent conducting survey, on the ground studies, and by the time the policies or new initiatives were released, it is either not comprehensive or it will soon be obsolete. Some challenges to realize that are the availability of data. There’s typically concern over security and privacy as these data gets more prevalent.