Postmortem: Downtime, 23 March 2023

Announcement image BitMEX

On Thursday 23 March 2023, our trading engine was suspended from 12:55 UTC to 13:20 UTC, with orders executed in cancel only mode during this time. 

This suspension was triggered by our audit process detecting a possible inconsistency in the exchange state and halting the exchange as a precautionary measure. 

At BitMEX our positions and margins are checked multiple times a minute. With balances cross-checked against on-chain records, every 10 minutes. Bugs, flaws or intrusions, causing positions not to match, immediately halt our exchange.

We program our exchange to halt, as protecting the assets of our users is central to our operations. We don’t compromise security for convenience, which is why we’ve lost zero cryptocurrency. And as communicated during the downtime via our Status page and social media channels, all of our users’ funds remained safe at all times. 

Following a comprehensive review of the downtime, we are providing you, our users, with a more detailed summary of the incident as set out below. Should you have any additional questions, as always you can reach us by contacting Support.

What happened on 23 March?

As shared above this suspension was triggered by our audit process detecting a possible inconsistency in the exchange state and halting the exchange as a precautionary measure. 

A race condition surrounding internal transfers caused a momentary state inconsistency. The audit process detected this and took action – as intended. 

Our engineering team were able to assess the situation, identify the root cause and deem it safe for trading to continue. 

What was the user impact?

The exchange operated in cancel-only mode while trading and matching was suspended for 25 minutes. There was some market disruption when the exchange resumed trading. 

Currently, our support team is responding to those customers who were impacted. No customers were liquidated as a result of the market disruption. 

What is being done to ensure this does not happen again?

As a 24/7 exchange, we strive to offer perfect uptime and uninterrupted services, however this cannot be guaranteed. We don’t make compromises when it comes to the safety of our users’ funds. You can read more about how we keep funds safe here

A working group has been formed to explore how we, at BitMEX, can better resume trading moving forward. Additionally, we are reviewing the behaviour of the audit process, to see if full suspension can be avoided under certain strict circumstances. 

How can I protect myself against the risks associated with downtime events?

We provide a number of sophisticated tools and settings through which it is possible for traders to control the risk posed to them by downtime events.

For traders who use our API, we offer a “Dead Man’s Switch” feature, which enables traders to set a timeout for cancellation of orders in case they are unable to reach the exchange. For implementation details and an example of setting a timeout, please see the documentation for this feature here.

For enhanced protection traders can set the trigger price of a stop order to reference Last Price, Mark Price or Index Price. When the exchange is reopened after a downtime event, dramatic swings in the Last Price can take place. To avoid stop orders being triggered by †any dramatic swings in Last Price, traders can set their stops to trigger based on the Mark Price or Index Price, rather than the Last Price.

These are just two example features of the BitMEX platform that allow our traders to control risk and reduce the impact of downtime. Our Support team can provide further information about these (and any other) risk management tools on the platform.

Conclusion 

We hope this blog post provides all users with a clear description of this downtime incident and the robust ongoing steps we’re taking to enhance platform stability and our overall trading experience.

As always, we encourage any concerned users and those impacted by this downtime to contact Support