Worldwide IT outage was digital red flag
text size

Worldwide IT outage was digital red flag

The recent IT outage should serve as a wake-up call to organisations to ensure they are better prepared to deal with the unexpected

A passenger scours a screen displaying a flight schedule at Don Mueang International Airport Terminal 1 in Bangkok on July 19. (Reuters photo)
A passenger scours a screen displaying a flight schedule at Don Mueang International Airport Terminal 1 in Bangkok on July 19. (Reuters photo)

The worldwide IT outage on July 19 caused by a faulty software update issued by cybersecurity firm CrowdStrike Holdings has served as a wake-up call to local organisations that it's time to ensure their systems are better prepared to deal with the unexpected.

According to Bloomberg, CrowdStrike said a bug in a quality-assurance tool the company uses to check updates for errors allowed flawed data to reach customers, causing the major IT outage.

The company pushed through an update for Windows devices on July 19 via a rapid-response mechanism, meant to respond quickly to changing threats. That update contained a critical flaw. CrowdStrike's "content validator", which is supposed to test updates for errors before they go out, malfunctioned and let the bug through, the company said in an incident report published on July 24.

That undetected error crashed Windows systems and sparked one of the most spectacular rolling IT failures ever.

Microsoft and CrowdStrike have already rolled out fixes and many systems have been restored. CrowdStrike said it is working to improve Rapid Response Content testing in the future.

Some business operations in Thailand were also hit, including airlines operating out of major airports such as Suvarnabhumi and Don Mueang.

PREPARE FOR EMERGENCIES

The Stock Exchange of Thailand (SET) revealed that the bourse was not affected by the IT outage because the SET does not use CrowdStrike as a securities software vendor.

Senior executive vice-president Thirapun Sanpakit, also head of the bourse's information technology division, said the SET has been testing the IT system and is always preparing for such incidents.

After the outage, the SET asked a vendor of securities systems that the SET has operated with how the global outage happened in order to outline measures in case it strikes again.

The SET also has a backup site to deal with emergencies. The system is tested at least once a year.

Most brokerage firms use a front-end Linux (open-source) system with a relatively high-level security system. However, some brokers who use CrowdStrike were affected by the incident and managed to solve the problem quickly without incurring any damage.

"The CrowdStrike system problem arose from the 'auto update' setting. Therefore, we have warned members to change the auto update system to a manual update system to prevent the system from going down if such an occurrence happens again," said Mr Thirapun.

As for affected listed companies, they should learn from the experience and the entire industry needs to better prepare to deal with such problems in the future, he added.

Air Asia passengers queue at counters inside Don Mueang International Airport Terminal 1 amid system outages disrupting the airline's operations, in Bangkok on July 19, 2024. REUTERS

Air Asia passengers queue at counters inside Don Mueang International Airport Terminal 1 amid system outages disrupting the airline's operations, in Bangkok on July 19, 2024. REUTERS

BOLSTERING CYBERSECURITY

Jomkwan Kongsakul, deputy secretary-general of the Securities and Exchange Commission (SEC), said the regulator along with capital market business operators and stakeholders are preparing to establish a Thailand Computer Emergency Response Team (TCM-CERT) to enhance cybersecurity for the capital market.

While the SEC was not affected by the CrowdStrike incident, some business operators that use CrowdStrike products were affected. However, all the problems were fixed and they managed to resume normal service later the same day, she added.

Since the incident occurred, the SEC has been closely monitoring the situation and has provided basic advice and assistance to related businesses to help minimise the impact.

"The SEC is preparing to deal with such incidents in the future by collaborating with stakeholders in cybersecurity companies as well as capital market business operators, regulators, the financial industry and the Thai Computer Emergency Response Team."

She added that the SEC places importance on the cyber resilience of all organisations involved.

Following the outage, the banking sector's Computer Emergency Response Team (CERT) sent guidelines to members of the Thai Bankers' Association on how to prevent possible threats to the IT supply chain. It suggested the banks prepare contingency plans, based on scenarios of massive IT outages.

CONFIDENCE MAINTAINED

Thai AirAsia was among the airlines hit by the global IT outage at Don Mueang airport. The event disrupted its online system, including check-in and reservation services, prompting the airline to deploy a manual service to deal with passengers.

Tansita Akrarittipirom, head of commercial for Thai AirAsia, said while the two-day incident caused delays to over 200 flights and affected 40,000 passengers, the airline has been operating all flights without leaving any passengers stranded.

AirAsia's system went down on July 19 from 12.30pm until 10pm, and on July 20 from 6.30-11am.

Airports of Thailand also deployed staff at the airport to help facilitate passengers on those days.

She said the incident would not impact traveller sentiment and passengers are still confident in terms of flying with airlines.

In the past, the airline encountered some system outages or software updates, but only for a short period of time, which meant it was able to notify passengers, allowing them to prepare beforehand. However, the recent outrage represented the longest impact of this kind, lasting for almost 24 hours.

"Airlines still require IT experts to help manage their business since it is a mode of public transportation that serves a massive number of people, unlike small scale operations like hotels," said Ms Tansita.

She said the airline is holding discussions on how to improve and upgrade its emergency crisis preparations after the incident to improve the management of large numbers of passengers.

MORE RESILIENT DIGITAL FUTURE

Matthew Hardman, chief technology officer (Asia-Pacific) at Hitachi Vantara, said while the cause of the IT outage was not a cyber-attack, this was a stark reminder of the vulnerability of the interconnected digital world.

It also shows organisations are not necessarily ready for an outage in relation to their core business processes. This highlights the importance of data protection and cyber resiliency to ensure a quick recovery and business continuity.

"Disruptions can erode public trust, and organisations must prioritise a swift return to normalcy. But this event shouldn't just be a recovery effort -- it's a catalyst for positive change. Now is the time to ensure your systems are prepared for the unexpected," he said.

Associate Professor Siriyupa Roongrerngsuke, executive advisor in the office of the CEO at Bumrungrad International Hospital, said organisations should never rely only on a single cybersecurity vendor.

Evidently, organisations worldwide either do not have and/or are not well prepared for implementing contingency plans when a single point of failure of a software can have cascading effects and disrupt the entire business ecosystems, Ms Siriyupa said.

Organisations' chief information officers should carefully reconsider which security products they use and see if they need to diversify across different security products to prevent and minimise future outages.

According to a blog post of IT research firm IDC, the recent IT outage highlights the contrasting trust and attestation mechanisms taken by operating system vendors such as Microsoft, Apple and Red Hat in allowing the ecosystem of independent software vendors direct access to certain parts of the operating system stack, especially software that can potentially severely impact the system kernel.

While this issue impacted Windows devices -- both network and human centric -- managed by CrowdStrike, no iOS, MacOS or even Linux devices were affected. That is very telling and should compel vendors like Microsoft and Apple to take a long hard look at what "openness" means in the wake of regulations like the EU's Digital Markets Act.

It should also compel the largely Windows-dependent customer base to redefine their long-term cyber recovery strategy.

Don't Solely Trust Auto Updates

Axel Winter, chief executive of Xponential, a joint venture between Pivot Digital and Siam Piwat and former chief technology officer of Siam Piwat, said for non-critical systems, organisations should first test any change and not just use auto updates.

Small changes can have a significant impact. In effect, even cloud providers offer legacy support and make ample announcements in the case of high-impact changes.

Having a plan to recover from disasters -- such as a backup system -- is important, but it won't always prevent problems caused by automatic updates. This is because backup systems often copy the same software and settings as the original system, so they might have the same issues.

BCP Drills Required

Morragot Kulatumyotin, managing director of Internet Thailand (Inet), said customers who host servers with Inet and use CrowdStrike and Windows were impacted. They represent around 1% of all customers and their problems were fixed within four hours.

"We can fix problems quickly as soon as we detect them early. Our risk management framework prioritises early detection and rapid resolution to best assist our customers," Mrs Morragot said.

Moreover, organisations need to conduct a drill of their business continuity plan (BCP) once or twice a year, she added.

Many companies, with IT-related work as part of their daily operations, have no BCP to keep their businesses going, raising concerns over work interruptions if a global computing outage recurs, said Suphan Mongkolsuthree, chairman of Synnex Thailand, a local distributor of IT products and an IT and cloud service provider.

"I would say very few SMEs [small and medium-sized enterprises] have BCPs," said Mr Suphan.

SMEs are among the companies most prone to the severe impact of an outage. Currently many of them have liquidity problems and are struggling to gain access to financial sources, according to the Federation of Thai Industries.

Mr Suphan noted that only large companies have sufficient budgets to invest in preventive measures for their IT systems because such investments are usually costly.

BCPs are crucial for companies which need them as part of their risk management, he said.

BACKUP PLANS IN PLACE

A spokesperson for Nestlé Thailand said the company was not affected by the global technology outage, as the company did not use any services provided by CrowdStrike.

However, as cybersecurity is the main risk and challenge for every company across the globe, the company always has contingency and backup plans as well as teams seeking to prevent unexpected cybersecurity risks in place, the spokesperson said.

Thienprasit Chaiyapatranun, president of the Thai Hotels Association, said it had not received any reports that its members had been affected by the technical outage.

"To prevent operating systems from crashing, hotel operators can operate in an analogue way by printing guest information beforehand in case any disruptions in vital computer systems occur," Mr Thienprasit said.

With regard to cybersecurity risks, he said most of the hotels' guests come via online travel agency platforms, which are global operators equipped with resilient cybersecurity systems.

Molpasorn Shoowong, Somruedi Banchongduang, Lamonphet Apisitniran and Kuakul Mornkum

Do you like the content of this article?
1 1
COMMENT (18)

By continuing to use our site you consent to the use of cookies as described in our privacy policy and terms

Accept and close