M1 said that an unexpected congestion problem spread to different layers of its network, interrupting the process of confirming that people making calls on it were indeed M1 customers. This resulted in many customers being unable to make calls for more than five hours.
Get the full story from The Straits Times.
Here is the full statement from M1:
SINGAPORE - M1 takes a serious view of the network incident that lasted 5 hours and 15 minutes on Tuesday, 4 February, which affected customers making voice calls on our network. SMS and mobile data services were also affected, intermittently, during the incident.
A full investigation to determine the root cause of the incident is ongoing. We would like to share our preliminary findings and the immediate measures we are taking to further enhance our network.
Our preliminary findings suggest that an unexpected call processing software issue had prevented our customers from registering on the mobile network.
This issue originated from a pair of mobile site switches which performed in an unstable manner, causing intermittent connection problems between our switches and customer databases. The cause of this is still being investigated by our vendor.
To make voice calls, customers have to be authenticated by our network. This is the beginning of the call processing flow. However, the intermittent connection interrupted the authentication process. These unsuccessful authentication attempts tied up network resources, leading to congestion at the customer databases. The congestion cascaded to another network layer, tying up additional network resources and preventing customers from making or receiving calls.
M1 has invested significantly and remains committed to building a highly advanced and resilient network to deliver better customer experience. To this extent, our network now utilises the Advanced Telecommunications Computing Architecture, which enables dynamic allocation of network resources. This has however increased the complexity of the network architecture, and accordingly, the troubleshooting process.
Although we quickly identified and stabilised the unstable switches, the congestion had spread across our network. As a result, we had to clear the congestion across the network layers and services were restored when the congestion was cleared.
During the incident, we updated our customers through relevant channels such as our Facebook page, our corporate website, and through our call centre hotline. We also kept the media updated on the situation, including the results of our preliminary investigations that same day.
"We are continually upgrading our network to deliver better customer experience and improve network resiliency. With a more sophisticated network, it has inevitably increased the level of complexity in troubleshooting. I sincerely apologise for the inconvenience caused to our customers from this incident," said Ms Karen Kooi, Chief Executive Officer, M1 Limited.
Ms Kooi added: "While the root cause for the mobile site switch instability is being investigated, we are reconfiguring our site switches to eliminate congestion that may arise due to unstable intermittent connections. We will also be deploying a software enhancement that will enable us to better manage sudden and unexpected extreme traffic conditions. In addition, an independent expert will be appointed to review our network architecture and connectivity to further enhance network performance."