Policy on Bitcoin Hard Forks

Anyone can create a chain fork of Bitcoin at any time. The possibility of a User Activated Hard Fork (UAHF) on 1 August 2017 requires that we clarify our position on any and all potential hardforks.

BitMEX Policies

At BitMEX, our top priority is protecting the assets of our customers. In order for us to effectively do this, we insist that any Bitcoin hard fork includes the following:

  • Strong two-way replay protection, enabled by default, such that transactions on each chain are invalid on the other chain.
  • A clean break, such that the new chain cannot be “wiped out” by the original chain.
  • A modification to the block header such that all wallets (including light clients) are required to upgrade to follow the new chain.
  • A minimum of three months of open testing and review, before the client is formally released and a further three months after this, before the fork activates.

Should a hardfork not follow these policies, we will not list the coin and may not allow users to withdraw this coin from BitMEX. To be clear, we do not intend to access or keep these coins. The administrative overhead of distributing any and all hardforked coins (including Bitcoin-based distributions like Byteball/Lumens) is prohibitive and BitMEX will not monitor or maintain balances of hard-forked coins.

Additionally, support of any forked currency is solely at the discretion of BitMEX. While we will snap users’ margin balances at the time of the fork in case we decide to distribute, there is no guarantee that it will be safe, desirable, or practical to do so. If this concerns you, you should withdraw your funds before any given fork and handle the split on your own.

In a future case where a block size increase has supermajority support of the community and is handled responsibly, BitMEX intends to follow said chain and we will communicate with our traders accordingly.

1 August 2017 UAHF BitMEX Policy

The Bitcoin Cash (BCC) proposal is aimed at increasing the blocksize. It is scheduled to take place on 1 August 2017. This change is incompatible with the current Bitcoin ruleset and therefore a new coin may be created, which is to be named “Bitcoin Cash”.

It is our understanding that the UAHF proposal does not include two-way replay protection enabled by default. Should the UAHF occur, BitMEX may be unable to protect the new Bitcoin Cash tokens on behalf of clients.

As such, BitMEX will not support the split or distribution of Bitcoin Cash, nor will BitMEX be liable for any Bitcoin Cash sent to BitMEX. Therefore, it is up to our users to withdraw from BitMEX prior to August 1st if they wish to access Bitcoin Cash tokens or any other hardfork.

BitMEX considers any and all hardforks in this vein as altcoins. The .BXBT and .BXBTJPY indices will remain unchanged and will not include BCC.

Policy on BIP91, BIP148, and a Potential Chain Split

There is potential for the Bitcoin network to split if BIP91 (“Reduced Threshold SegWit MASF”) is not activated before the BIP148 UASF (“Mandatory Activation of SegWit Deployment”) start time of 01 Aug 2017 at 00:00 UTC. The following actions will be taken by BitMEX in the event of a potential network split:


  • If BIP91 locks-in and enters an activation period (336 blocks, approximately 56 hours), starting 4 hours before the potential activation time, BitMEX will require 6 block confirmations before crediting deposits due to the threat of a blockchain reorg as non-segwit blocks will be orphaned. This restriction will be lifted shortly after BIP91 activates, once BitMEX principals have ascertained that the chance of further reorgs is below a safe threshold.
  • If BIP91 does not activate before 31 July 2017 13:00 UTC, deposits and withdrawals will be halted until further notice in anticipation of a possible chain split. A round of withdrawals will be processed that day.
  • If you wish to trade the chain split, this date is also the deadline for deposits.
  • If there is a prolonged chain split and thus there are two or more versions of Bitcoin, once enabled, BitMEX will process withdrawals of the majority and minority chain.
    • In order for coins on either chain to be properly withdrawn, a viable method of avoiding replay attacks must be developed.
      • While there is a risk of a replay attack (transactions on one chain also being valid on another), it will not be possible to process deposits or withdrawals.
    • BitMEX will create a snapshot of user’s margin balances (including unrealised PNL) on 31 July 2017 13:00 UTC. A user’s balance of the minority chain’s coins will be equal to his/her balance at this time.

Contract Settlement and Marking

  • If the Bitcoin network splits, currently-listed futures at the time of the split will settle on the sum of spot exchange-listed versions of Bitcoin.
    • It may not be possible to predict or plan to get reliable pricing data from our current Index exchanges, or they may not list the minor coin(s) at all. In the event of a network split BitMEX reserves the right to move all Bitcoin derivatives to Last Price Protected Marking, until a stable index can be composed.
    • Each chain will be represented by a different index, and the sum will be taken to compose the Mark and Settlement Prices. The indices will be separated as to handle a split whereby not every constituent of the BitMEX Index lists a chain.
      • For example, if “Classic” (pre-BIP148) coins are listed on Bitstamp but not GDAX, the Bitstamp “Classic” price will comprise 100% of that part of the index sum.
  • Contracts listed after the split will settle on one version of Bitcoin, chosen by BitMEX. Only contracts listed pre-split will settle on the sum.
  • After the split, the XBTUSD Perpetual Swap will be timed to switch underlying indices in tandem with a futures contract. Ample notice will be given. Like futures, the new index will reference only one chain.

Bitcoin Basis Volatility

Cash and carry arbitrage is a staple strategy for traders seeking high risk adjusted returns. Because Bitcoin can go to infinity but only fall to zero, speculators at the margin are buyers. Add in 100x leverage, and bullish speculators are willing to pay very high per annum interest rates to go long Bitcoin for short periods of time.

Cash and Carry Step by Step:

  1. Buy $1,000 worth of Bitcoin at $2,000
  2. Sell 1,000 BitMEX futures contracts, e.g. XBTM17, at $2,500
  3. Wait until expiry, and collect $500 of profit

Given the current rally, the BitMEX Bitcoin / USD 30 June 2017 futures contract, XBTM17, has traded at a premium (or a positive basis) for almost the entire length of the contract. Traders who sell futures and buy Bitcoin essentially earn a fixed rate of return until expiry to lend long speculators synthetic USD.

To earn the full premium, you must keep the trade on until expiry. However, while arbitrageurs are making good money from this strategy, they are leaving food on the table.

The above chart plots the hourly annualised premium of XBTM17. It is immediately apparent that the premium itself is very volatile. The premium takes the stairs up, and the elevator down.

For traders already comfortable with cash and carry arbitrage, it is time to add a mean reversion strategy to your toolkit. Below are the relevant statistics for the data sample:

Min: -27.31%
Max: +122.30%
Mean: +17.87%
Median: +8.95%
Stdev: 24.68%

While you can’t pick the lows and highs, in general terms you can put on positive carry trades during times of extreme stress and close out when the premium normalises. That allows you to increase your returns during the contract period.

Recently the XBTM17 premium traded over 100% p.a., then a few days later traded at a discount. Instead of waiting another 20 days until expiry, arbitrageurs actively trading the basis could realise the entire 100% return in under 24 hours.

A simple yet effective strategy is what I call the Stair Step Cash and Carry Arbitrage. Assume that a normal range for the premium for a quarterly contract is 0% to 60% p.a. As the premium rises, at each 10% interval I will sell 10,000 XBTM17 contracts and buy $10,000 worth of Bitcoin. If the premium hits 60%, my expected return on $60,000 should I hold until expiry is 35% p.a.

If the premium falls from 60% to 0%, I will buy back 30,000 XBTM17 contracts and sell $30,000 worth of Bitcoin at every 30% increment. The key point is that at all times I have a portfolio with positive carry, and I benefit from volatility in the premium.

You can get very sophisticated about how you choose your increments and sizing of trades. The more granular your steps, the more profit you can harvest from the volatile premium.

Don’t Get High On Your Own Supply

The DAO’s world record crowdsale was eclipsed one year later by the recent Bancor ICO. Bancor raised close to 400,000 ETH valued at $152 million. ICO mania is upon us and the number of projects raising in excess of $10 million is mind blowing.

The common theme amongst the ICOs is that the tokens are created using the Ethereum protocol. Additionally to subscribe, most ICOs require punters to tender Ether and only Ether.

The killer app so far for Ethereum are ICOs. The rampant FOMO induced greed means that newbies and old hands alike must obtain Ether. They either sell a fiat currency or Bitcoin to obtain the mana from Big Daddy Vitalik.

The rapid price appreciation of Ether is due, in large part, to the demand by speculators to buy ICOs. The price remains high after the ICO ends because most teams to date have not cashed out of their loot.

Most teams will not be able to return to the market again begging for more money if they misallocate their capital. The question for teams is how to protect the value of all the Ether raised. Failure to cash out or properly hedge, could mean a rapid evaporation of the paper wealth they now enjoy.

While they do hold physical ETH, salaries and expenses must be paid in domestic fiat currencies. Ethereum raised $18 million of Bitcoin in 2014, then failed to hedge that and thus to pay expenses was forced to liquidate some holdings at a 50% loss. The development roadmap also was altered due to lack of funds.

The current crop of ICO cool kids witnessed the hardships placed on the Ethereum development team by lack of hedging. At the first sign of Ether weakness, they will rush to the exit. The team who sells first, sells at the best prices. When it is known that ICO teams are cashing out, the stampede and its induced casualties will have everyone singing Hakuna Matata.

If I were an ICO team, I would apply a 50% haircut to the value of any ETH raised. Even if I wanted to sell it, I know I couldn’t because the liquidity would disappear long before I could liquidate my entire stack. All the buyers at the margin bought Ether to invest in the ICO, they won’t be around to catch falling knives during the correction.

Some in the community believe it is irresponsible for development teams to raise such god like sums of money. They protest that a few million dollars worth of crypto should suffice any project. Additionally, if all other teams do yuge ICOs, the liquidity won’t be there when you need to sell even if you raise a “responsible” amount of money. Game theory has thus set in and it is now in your best interest to follow the herd and raise as much as you can, expecting that you will only be able to spend a fraction of that amount.

Pop Goes the Weasel

The Bancor ICO set a new benchmark. The $500 million mark is my mental goal post for the height of insanity and that will be the day that Icarus will burn in the noon day sun. Tezos and or Eos are positioned to meet or eclipse that number.

My second mental goal post is if Ether reaches parity with Bitcoin in terms of market cap. The profit taking at that level could cause the ripple that forces a calamitous unwind of the 2017 ICO bubble.

The correction in Ether and the secondary market prices of ICO tokens, will be disorderly. However, even if you do agree with my views, don’t let your haterade preclude you from making money during this glorious bull market.

Why Don’t Banks Own Bitcoin Exchanges?

Fiat government money is one of the biggest profit centres for banks globally. Even though we are all human, to trade and interact with each other requires the exchange of different coins, paper, and electronic credits. Each one of these transactions benefits banks who move money globally for a fee.

At the most simplistic level, the trading of crypto currencies and or digital tokens is just a standard foreign exchange transaction. If banks love foreign exchange, why then do they have an aversion to Bitcoin and other digital currencies?

A Close Second

Banks claim to care about optics, and say they abhor risk to their reputation. Yet – time and time again they’ve shown that they will gladly launder money for dictators, arms dealers, and drug lords. Why the aversion to conducting standard FX transactions involving digital currencies?

Banks also are not scared of breaking laws – but only so long as every other bank is doing it too. When it comes to a new business line that might be controversial to regulators, it is better to be a close second than the first. If meaningful punishment meted out to the first mover is lacking, the rest of the banks will follow headlong like lemmings.

Too Much Money to Ignore

Digital currency trading on-exchange volumes are too big to ignore. Given the level of fees charged to eager traders and speculators, the leading exchanges post revenue figures that value them in the billions of dollars using standard exchange revenue multiples.

The funny part is that a bank is necessary to operate a fiat to digital currency exchange. The on-ramp into the digital currency space requires the exchange to hold client fiat money with a bank. While the bank makes transaction fees on the movement of fiat, they miss out on the very profitable churn generated by digital currency exchange clients.

Hundreds of millions of dollars in fees will be made this year by the leading exchanges. The board of any bank that banks a digital currency exchange should be ashamed of themselves for not expanding their operations and offering, at a minimum, Bitcoin to fiat exchange services.

Trust a Banker

Throughout history, bankers have been held in low esteem. The handling of money is viewed as unclean, while rentier landholders are given the trappings of aristocracy. However the plebes, patricians, and governments still trust banks with their money.

Bank offices exude confidence, grandeur, and – most importantly – security. Contrast that to a slick and minimal website of a digital currency startup. Their airy San Fran, New York, or London offices don’t produce the same effect on potential customers.

The lack of trust between users and exchanges due to hacks, thefts, and a lack of business acumen is one of the largest reasons people don’t take the first step to acquiring some Bitcoin. Through generations of social conditioning, people believe a bank is the best place to store wealth. If your friendly neighbourhood bank offered the ability to buy, sell, and store Bitcoin directly, trust – and therefore trading volumes – would be much higher.

Figure It Out

Senior bankers appear to be catching on. A Barclays executive recently asked the FCA in London to figure out how a bank can join the party. [CNBC]

A wink and a nod from the appropriate alphabet letter agencies will set off a stampede for banks to open their own exchanges. The next question will be: buy or build?

Banks fail miserably at cyber security. Bitcoin, in many cases, does not require physical proximity to where it is stored in order to steal it. In the long run, the real differentiating feature in the exchange landscape is security. Banks that don’t want to “learn” how to conduct proper cyber security should buy the leading exchanges to whom they currently extend banking services – if they can find a trustworthy one.

The conversation between a bank and exchange will be very simple. Either the bank can buy out the exchange at a very generous multiple, or the bank will close the exchange’s accounts, and make it very difficult for the exchange to obtain an account with another organisation in a particular domicile.

As banks slowly come around to the revenue generating potential that a fiat to digital currency exchange offers, traders will abandon exchanges not explicitly owned by a bank. Banks already have millions of hungry financially repressed customers to whom they can immediately offer digital currency trading. The liquidity will immediately shift to bank-backed exchanges.

The Empire Strikes Back

Exchanges who have fought valiantly through the nuclear winter of 2014 and 2015 may not wish to sell out to the banks they loathe. Exchanges who can tap traditional VC funds or the ICO market should purchase struggling banks.

With a bank and its licenses as cover, existing exchanges can continue to innovate faster than an incumbent large bank.

This Time is Different

Yes, it’s likely we’re in a bubble. While the price of many digital currencies are likely to decline in the short-term, banks are now acutely aware of this new asset class. News stories about legacy banks “thinking” about how to offer digital currency trading to their clients will become more common.

I am extremely confident that by 2H2018 there will be a large storied bank that offers Bitcoin trading and storage to its customers. The stampede of Johnny-come-lately banks into the digital currency exchange space will be exciting to watch.

Here Come The Bankers

Goldman Sachs and Morgan Stanley, after their clients pestered them long enough, recently released two widely read reports on where the Bitcoin price is headed. Even though both banks believe the current rally is a bubble, it is very positive that so many clients demanded research on the cryptocurrency.

If there is that much pent up demand, the next question for executives is whether or not the industry is big enough yet to support one or more full time employees market making and trading Bitcoin and other digital currencies. The following is a thought experiment on the cost benefit analysis for starting a digital currency trading desk for a bulge bracket bank.


Traders Are Expensive

Before a young man or woman begins blowing up your capital, the resources needed to get them started run close to a million dollars alone. Market data feeds like Bloomberg and Reuters can run in the tens of thousands of dollars per month. As an example, while I was an ETF trader at Deutsche Bank, my market data costs were $50,000 per month.

The next and bigger cost center is the number of support staff needed. Compliance, middle office, back office, and IT personnel are needed to help a trader effectively perform his or her duties.

The final and most important asset a trader needs is capital. From the bank’s perspective, this capital has a cost. Investors in investment and commercial banks demand a certain return on equity (ROE) for their investment. Goldman Sachs is the most profitable bank by a country mile, mainly because its management actually has a clue about how to use capital effectively.

Over the past 5 years Goldman averaged an ROE of 10%. I will use this as the benchmark for the following calculations.

The trader himself needs to get paid. Given the risk involved in trading Bitcoin, a bank would assign a mid-career trader to the desk. Assume this person’s annual total compensation is $500,000. For an equities’ banker this might be the MD’s take home pay, but for a good FICC trader it is average.

Trading Bitcoin requires a trading desk to have accounts on the leading Bitcoin exchanges. Exchanges, as we know, get hacked repeatedly. Insurance in the Bitcoin space, for good reason, does not exist. The desk needs to assign a probability of default on the exchange. Using the Bitfinex haircut as an example, let’s assume the yearly probability of default is 35%.


Cost Summary:

Trader Support: $1 million
Trader Pay: $500 thousand
ROE: 10%
Default Risk: 35%

The next consideration is how much capital to allocate to this trading operation. Even if the desk is able to achieve the ROE, making just a few million dollars won’t be worth the hassle. The approval for a Bitcoin trading desk would need to come from the CEO. Lloyd Blankfein doesn’t get out of bed for less than $10 million of profit. Let’s assume that the bank, at a minimum, must be able to deploy $100 million.

Given that 35% of the capital deployed will be spirited away, the returns must be achieved on $65 million. In order to make $1.5 million (Cost) + $10 million (Required ROE), the desk must make 17.70%.

A 17.70% annual return is very achievable. I routinely speak about arbitrage opportunities that yield in excess of 50% per annum. However, you cannot put $65 million into any trades I describe without tremendous market impact.

The trading desk will not have a mandate to just punt Bitcoin or Altcoins. They will search for pricing discrepancies between exchanges, or between spot and its derivative. When massive directional bets are removed as a strategy, it is very hard to put that much size into arbitrage trades.


Not Now, But Soon

With a market cap close to $100 billion, the entire crypto space is worth evaluating for a trading desk. However, the market still cannot support the volume needed for a trading desk to meet its hurdle rate.

When the top 5 most liquid Bitcoin / USD exchanges trade in excess of $1 billion per day on average, then we will see the first bank sponsored Bitcoin trading desks emerge. Given that yesterday $500 million was traded by the top 5 exchanges, we are not far away.

Postmortem: Downtime, July 14, 2017


On July 14, 2017, we suffered a minor downtime as a runaway ZFS snapshot process froze up disk I/O on the trading engine. No data was lost. While the outage was relatively minor and required only a host reboot, we took additional time to re-verify data, clean up ZFS snapshots, and fix the underlying issue.

We apologize for the disruption.

If you are interested in our recent migration to ZFS, please see this post.

Tezos Rapture

The most anticipated ICO of 2017, Tezos, will begin July 1st.

What does Tezos do? Do you really care? In a nutshell: Tezos is souped up Ethereum. Tezos claims to fix governance issues and features formal verification of smart contracts. They don’t want a DAOsaster happening on their watch.

Tezos, like Ethereum, will launch via an un-capped ICO. If you so desire, you can pledge all your hard earned Bitcoin and Ether for new and shinny Tezzies (XTZ). Tezzies are the tokens that are used natively on the Tezos protocol. They sound super cute, n’est pas?

Bancor raised $152 million this month and surpassed the DAO as the largest crowdfunding project in human history. One of Tezos’ esteemed advisors, Emin Gun Sirer, in a recent Hacking Distributed blog post, stopped short of calling Bancor an outright fraud. With ICO euphoria at an all-time high, Tezos could crush the record just set by Bancor. My target is $500 million.

The token generation event (TGE – repeat after me “it is not a security”), will take place over 2,000 Bitcoin blocks. That is approximately two weeks. Every fundraising milestone reached will be covered to death and will engage financial reporters, an element missing from past ICOs.

DLS = Shadystan

The structure of Tezos is a little different that many ICOs you may be familiar with. Essentially the founders constructed a US entity, Dynamic Ledger Solutions (DLS), that gets a payout if certain conditions are met. The Tezos foundation, who issues the tokens and receives the Bitcoin and Ether, is contractually obligated to pay DLS 8.5% of the proceeds and 10% of the tokens.

A Swiss foundation raises hundreds of millions of dollars from retail investors globally, and then funnels a significant portion of the loot back into a US company. DLS owns all the Tezos IP, which the foundation essentially purchases with proceeds from the token sale. Ding Ding Ding, SEC please look over here. Not even a white-shoe law firm like Wachtell Lipton Rosen & Katz could wipe the stink off of this structure.

If the price of Tezzies performs badly in the secondary market, and / or the tech does not deliver as advertised, a hoard of grannies and grandpas have a nice juicy US entity to sue into oblivion.

DLS also could face the wrath of an ambitious securities prosecutor. If I were a young Southern District of New York prosecutor, DLS is the perfect target for an opening salvo in the war on ICOs. Someone will get perp walked due to ICO events. Who better than a gaggle of Goldman and Bridgewater bankers to publicly tar and feather.

The Tezos founders are incentivised, through their DLS stake, to engineer a quick cash grab. The bigger the raise, the more immediate funds they receive through the 8.5% sale proceeds payout. The short-term incentives rub many potential investors the wrong way.

Rally Time

Even in Shadystan, people will rush to purchase Bitcoin and Ether for the sole purpose of investing in Tezos. The Tezos Bitcoin and Ether vacuum will provide a strong bid for the funding currencies. During the DAO ICO, Ether rallied 40%. A similar phenomenon should occur during the Tezos TGE, especially if I am correct with my fundraising target of $500 million, which is almost 0.7% of the total market cap of Bitcoin and Ethereum combined.

Tezzies will not be distributed until late 2017. During those months, people who invested more than they could stomach to lose will start getting the shakes. There is a non-zero probability that Tezzies never materialise. Freaked out investors will search for any guidepost as to what the market believes the price of soon to be distributed Tezzies to be. We can help – through the darkness, the light of BitMEX will shine bright.

BitMEX Has Got Your Back

The BitMEX Tezzie / Bitcoin 29 December 2017 futures contract, XTZZ17, is now live. Using only Bitcoin, traders can speculate on the future value of XTZ. Traders can go long or short with up to 2x leverage.

XTZZ17 provides a real market for the future value of Tezzies. Punters who bought into the ICO, and want to take some off the table, can short XTZZ17. Traders who missed the TGE but still want a piece, can go long XTZZ17. Each contract is worth 1 XTZ.

Given the hype surrounding Tezos, I think XTZZ17 could rival the intensity and volatility of Zcash prior to spot being listed. Get ready to strap yourselves in.

Postmortem: Downtime, July 5, 2017


On July 5, 2017, we suffered a prolonged downtime – our longest since launch in November 2014 – due to a server issue. Trading was suspended from 23:30 UTC until 03:45 UTC, for a total suspension of 4 hours and 15 minutes.

Those of you who trade with us know that we take our uptime very seriously, and the record shows it. Before this month, we had not had a single month with less than 99.9% uptime, with our longest 100% streak reaching nearly 300 days.

So what happened?

The crypto market is exploding, as many of you know. While we have one of the most sophisticated trading engines in the industry, its focus has always been on correctness (remargining positions continuously, auditing every trade), rather than speed. This was a winning strategy from 2014 to 2016, and we’ve never lost an execution, but as we entered record-setting volume in the beginning of this year, requests started to queue up.


Optimizing the BitMEX Trading Engine

We started optimizing. The web layer, up to this point, hadn’t had any issues – we could always scale it horizontally – but the engine (at this time) cannot be horizontally scaled. We partnered with Kx, the makers of kdb+, which powers our engine. We began testing new storage subsystems and server configurations. We settled on an upgrade plan, set for five days hence (July 11), and began testing the switchover. We simulated the switchover thrice, each time setting a timer so that we could best estimate our downtime. The plan was:

  • Move to a larger instance with a faster local SSD, and
  • Move from bcache + ext4 to ZFS.

Some more details on those actions:

  • EBS is slow. So we would move the trading engine from an AWS c3.xlarge, which we used for its fast local SSDs in combination with bcache, to an i3.2xlarge. This gives us far faster local SSDs, nearly 20x the local SSD storage so we can easily cache our entire data set.
  • ZFS gives us some distinct advantages over other filesystems:
    • ZFS checksums individual blocks, preventing data rot. It can be scheduled to automatically check & repair drives (this is called a scrub), and can be configured to alert on varied criteria. This goes a long way toward ensuring the continued integrity of our data.
    • ZFS allow us to easily mirror and replicate our data across multiple volumes and physical locations.
    • ZFS snapshots are cheap, especially compared to traditional backup systems that must check the size & modified time of every file; in our testing, we can snapshot as often as every second (!) without any significant performance regression.
    • Kdb+ data is stored in a columnar fashion, like so:
      ├── foreignNotional
      ├── grossValue
      ├── homeNotional
      ├── price
      ├── side
      ├── size
      ├── symbol
      ├── tickDirection
      ├── timestamp
      └── trdMatchID
    • This data is highly compressible – in practice we see compression rates approaching 4x. This directly translates to less data over the wire to EBS and faster checkpointing & lower latency on the write log. For example, du is able to show the “apparent size”, that is, the size the OS thinks these files are, versus the actual space usage:
      /u/l/b/e/d/h/execution $ du --apparent-size -h
      955M .
      /u/l/b/e/d/h/execution $ du -h
      268M .
    • ZFS has the concept of the ARC (fast in-memory caching, a adaptive combination of MFU and MRU caches; in practice, the MFU cache is better for our use case), and the L2ARC, which provides a second-level spillover of this data, ideally to fast local SSD. It even compresses, leading to some eye-popping metrics:
      L2 ARC Size: (Adaptive)       1.17 TiB
      Compressed:            33.74% 403.90 GiB 
      Header Size:            0.08% 931.12 MiB
    • ZFS snapshots are amazing, and easy to code for. This allows us to do things that would be impossible otherwise, such as automatically snapshotting the engine data before and after any code changes. This is only practically possible because of the instance nature of snapshots.

I could go on. We’re ZFS superfans.


What Went Wrong

As Donald Rumsfeld once said:

Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.

We had the plan ready to go, checklists ready, and we had simulated the switchover a few times. We started preparing a zpool for use with the production engine.

Here’s where it went wrong.

19:47 UTC: We create a mirrored target zpool that would become the engine’s new storage. In order to not influence I/O performance on the running engine, we snapshot the data storage drive, then remount it to the instance. This is not something we did in our test runs.

Bcache, if you haven’t used it before, is a tricky beast. It actually moves the superblock of a partition up by 8KB and uses that space for specific metadata. One piece of this metadata is a UUID, so bcache can identify unique drives. And that makes perfect sense, in the physical world. It’s in the virtualized world that this becomes a problem. What happens when you snapshot a volume – bcache superblock and all – and attach it?

Without any interaction, the kernel automatically mounted the drive, figuring it was also the backing device on the existing (running) bcache device, and appeared to start spreading writes randomly across both devices. As you can imagine, this is a problem, and began to trash the filesystem minute-by-minute, but we didn’t know it was doing this. It seemed odd that it had mounted a bcache1 drive, but we were not immediately alarmed. No errors were thrown, and writes continued to succeed. We start migrating data to the zpool.

22:09 UTC: A foreign data scraper on the engine instance (we read in pricing data from nearly every major exchange) throws an “overlap mismatch”. This means that, when writing new trades, the data on disk did not mesh perfectly with what was in memory. We begin investigating and repairing the data from our redundant scrapers, not aware of the bcache issue.

23:02 UTC: A read of historical data from the quote table fails. This causes the engine team serious concern. We begin to verify all tables on disk to ensure they match memory. Several do not. We realize we can no longer trust the disk, but we aren’t sure why.

We begin snapshotting the volume every minute to aid in a rebuild, and our engine developers start copying all in-memory data to a fresh volume.

23:05 UTC: We schedule an engine suspension. To give traders time to react, we set the downtime for 23:30 UTC and send out this notice. We initially assume this is an EBS network issue and plan to migrate to a new volume.

23:30 UTC: The engine suspends and we begin shutting down processes, dumping all of their contents to disk. At this point we believe we have identified the cause of the corruption (bcache disk mounting).

Satisfied that all data is on multiple disks, we shut down the instance, flushing its contents to disk and wait for it to come back up.

It doesn’t. We perform the usual dance (if you’ve ever seen a machine fail to boot on AWS, you know this one): unmount the root volume, attach to another instance, check the logs. No visible errors.

We take a breath and chat. This is going to be more difficult than we thought.

23:50 UTC: We decide to move the timetable up on the ZFS and instance migration. It becomes very clear that we can’t trust bcache. We already have our migration script written – we begin ticking boxes. We clone our Testnet engine, which had already been migrated to ZFS, and begin copying data to it. The new instance has 2x the CPU & 4x the RAM, and a 1.7TB NVMe drive. We’re looking forward to the increased firepower.

00:30 UTC: We migrate all the init scripts and configuration, then mount a recent backup. We have trouble getting the bcache volume to mount correctly as a regular ext4 filesystem. The key is recalling the superblock has moved 8kB forward. We mount a loopback device & start copying.

We also set up an sshfs tunnel to Testnet to migrate any missing scraper data. The engine team begins recovering tables.

~01:00 UTC: We destroy and remount the pool to work around EBS<->S3 prewarming issues. While the files copy, we begin implementing our new ZFS-based backup scheme and replicate minutely snapshots, as we work, to another instance. This becomes valuable several times as we verify data.

~02:00 UTC: The copy has finished and the zpool is ready to go. Bcache trashed blocks all over the disk, so the engine team begins recovering from backup. This is painstaking work, but between all the backups we had taken, we have all the data.

~03:00 UTC: The backfill is complete and we are verifying data. Everything looks good. We didn’t lose a single execution. Relief starts flooding through the room. We start talking timetables. We partition the local NVMe drive into a 2GB ZIL & 1.7TB L2ARC and attach it to the pool to get ready for production trading.

03:05 UTC: We bring the site back online, scheduling unsuspension at 03:45 UTC.  Our support team begins telling customers the new timeline. Chat comes back on.

03:45 UTC: The engine unsuspends and trading restarts. Fortunately, the Bitcoin price has barely moved over these four hours. We consider our place in the world.



While we prepared for this event, actually experiencing it was quite different.

Over the next two days, the team communicating constantly. We wrote lists of every thing that went wrong: where our alerting failed, where we could introduce additional checksumming, how we might stream trade data to another instance and increase the frequency of backups. We introduced more fine-grained alerts up and down the stack, and began testing them.

To us, this particular episode is an example of an “unknown unknown”. Modern-day stacks are too large, too complicated, for any single person to fully understand every single aspect. We had tested this migration, but we had failed to adequately replicate the exact scenario. The best game to play is constant defense:

  1. Don’t touch production.
  2. Really, don’t touch production.
  3. Treat in-service instances as immutable: clone, modify, test, switch.

As we scale over the coming months, we will be implementing more systems toward this end, toward the eventual goal of having an infrastructure resilient to even multiple-node failures. We want to deploy a Simian Army.

Already, we are making improvements:

  • Moving to ZFS itself was a long-planned and significant step that affords us significantly improved data consistency guarantees, much more frequent snapshotting, and better performance.
  • We are developing automated tools to re-check data integrity at intervals (outside of our existing checks + ZFS checksumming), and to identify problems sooner.
  • We have reviewed every aspect of our alerting system, reworking several gaps in our coverage and implementing many more fail-safes.
  • We have greatly expanded the number of jobs covered under Dead Man’s Snitch, a service that has proven invaluable over the last few years.
  • We have implemented additional backup destinations and re-tested. We are frequently replicating data across continents and three cloud providers.
  • We continue to implement new techniques for increasing the repeatability of our architecture, so that major pieces can be torn down and rebuilt at-will without significant developer knowledge.

Thanks to our great customers for being understanding while we were down, and for continuing to support us.