MT06 Issues
Incident Report for TelcoSwitch Group
Postmortem

It appears that a customer on MT06 made a large CDR query which took some time to process, when the system didn't respond instantly they repeatedly made the same query adding extra load on the database.

This resulted in the database server being unable to accept new queries and affected the operation of the server.

You may have noticed that the issue was initially resolved quite quickly as the system failed over to a backup server, however the customer continued to make queries and very quickly overloaded that database server also.

Once we were able to block the queries the system became stable again and remains so.

Obviously it is not acceptable for the system to allow a single customer to degrade usage for others and there are existing resource protection mechanisms in place to stop this. Our engineers are looking at ways to further increase these mechanisms to avoid a recurrence of this or similar issues and these changes will be pushed out to all CallSwitch servers.

Posted Aug 09, 2019 - 11:18 BST

Resolved
The issue has been identified and resolved.

Engineers will continue to monitor the service closely.
Posted Aug 08, 2019 - 11:41 BST
Identified
This issue has now been identified as only affecting the MT06 server. Engineers are working to restore service as a matter of urgency.
Posted Aug 08, 2019 - 11:16 BST
Investigating
We have had reports of a small number of inbound numbers being unable to accept calls.

Our engineers are investigating.
Posted Aug 08, 2019 - 10:30 BST
This incident affected: CallSwitch (Hosted Telephony Platform).