Kakao, Data Center Fire, the Data Residency Dilemma
Data residency for a small country presents an awkward dilemma
I used to work at an open source distributed database startup called PingCAP. As the exec driving the company’s global market expansion effort, I talked to a lot of cloud architects and distributed systems engineers (as potential customers) about how a distributed database can be used for “disaster recovery”. It is a term used often by technical folks whose day (and night) job it is to keep things up and running at all times, no matter what happens.
When I first heard the term “disaster recovery” in the cloud context, I thought it was a bit dramatic. What sort of “disaster” are we talking about here that needs military-level operational practices, multiple backup plans, and sophisticated technologies to “recover” from? The answers I often get are along the lines of: “You never know. It could be anything! An earthquake. A typhoon. A pack of rats chewing up cables.”
As I’ve spent more time in the cloud and software infrastructure industry, I’ve grown to appreciate the reality that shit happens! Disasters are unavoidable! They may not happen all the time, but when they do, if you don’t have the right technology, architecture, or operation to recover, these disasters will ruin you.
A week ago, such a disaster happened in South Korea, ruined a CEO, triggered an investigation from the country’s president, and illuminated the awkward dilemma of data residency for a small country.
Kakao’s Disaster
The disaster I’m referring to is the fire that broke out in a data center near Seoul that lasted more than 10 hours, causing outages across all services provided by Kakao, the Korean tech giant. Kakao’s suite of apps, which covers messaging, ride hailing, map, gaming, webtoon, payment, banking, is used by over 90% of South Korea’s Internet users and the hundreds of thousands of the Korean diaspora living around the world. It’s a legit SuperApp. If you travel to South Korea, as I did earlier this year, the first thing you do is install Kakao Talk and Kakao Map, so you can message people and find your way around.
Since the fire, the co-CEO of Kakao responsible for its data operation resigned. South Korea’s president, Yoon Suk-yeol, is launching an investigation, forming a new "Digital Crisis Management Headquarters", and voicing alarms about Kakao’s monopolistic power. There are even conspiracy theories about the fire being an act of sabotage, not an accident.
The intrigue of monopoly, investigation, sabotage, and more will keep this firestorm burning (pun intended) for weeks to come. But they all steer away from the root problem that this fire illuminates – is a robust data center disaster recovery practical in a small country?
Because disasters are unavoidable, industry best practice for implementing “disaster recovery” is not to avoid disasters, but to mitigate damages by what is commonly known as “geo-replication.” In layman's terms, “geo-replication” for a tech company means making copies of apps, services, and users’ data and putting copies in different data centers that are geographically far away as backups. If a disaster hits one data center, you turn on one of your backups to keep your apps and services running (albeit with slower performance due to farther geographical distance), while you recover from the disaster, e.g. putting out a fire.
At first glance, it does not look like Kakao properly put in place a geo-replication scheme that could support disaster recovery. SK Holdings C&C operates the data center that caught on fire, and it appears to be the only data center it operates in South Korea! If that is indeed the case, then there is no other data center run by SK for Kakao to geo-replicate to. If so, that would definitely explain the long outage time, because it is unthinkable for a data center fire to last more than 10 hours, yet no backup data center was turned on to keep Kakao’s apps functioning, while the fire is being put out.
Data center locations can often be treated as trade secrets, so perhaps SK does have another data center that we don't know about. Even so, given South Korea’s size as a country, would it be far enough to sufficiently support disaster recovery regardless?
How Far is “Far Enough”?
There are no hard and fast rules to how far is “far enough” for geo-replication. But we do have some references from large countries with advanced data center layout to draw from.
In the US, it is well-known that the densest concentration of data centers is in Northern Virginia. Why? Because the location is close to both dense cities of east coast Internet users and Washington DC, where data center operators (AWS, GCP, Equinix, Digital Realty, etc.) can leverage existing, high-quality digital infrastructure already built for the US government. A common backup location for the Northern Virginia cluster is Council Bluffs, Iowa, a small town in the midwestern state that has become a hot destination for building data centers. (For more on why Iowa is attractive for data centers, see this deep dive from The Atlantic a few years ago.)
How far is Northern Virginia from Council Bluffs? About 1,100 miles (~1770 kilometers)
Let’s look at another reference point, in China. It is also well-known that Beijing has a cluster of data centers, for more or less the same reason as Northern Virginia – high concentration of Internet users, access to high-quality infrastructure that serves the government, etc. A common backup location is Xi’an – home to the Terracotta warriors, birthplace of dumplings (disputable), and data centers that store backup copies if a disaster hits Beijing.
How far is Beijing from Xi’an? About 1,100 kilometers (~700 miles)
Whether the rule of thumb is 1,100 miles or 1,100 kilometers, that is at least twice as far as the entire north-south “height” of South Korea. The distance from Seoul to Jeju Island, South Korea’s southernmost territory (and where Kakao’s corporate HQ happens to be), is only about 450 kilometers.
To provide good user experiences, proximity matters. To provide good disaster recovery, distance matters.
It makes 100% sense for SK to build a data center in the suburbs of Seoul. It makes 100% sense for Kakao and other Korean tech companies to use this data center to serve its millions of users, mostly concentrated in Seoul. However, to make disaster recovery work, it makes little sense for the backup data center to be located inside South Korea given its size.
And that brings us to the awkward dilemma of data residency for a small country.
Data Residency for Small Countries
Laws and policies around data residency have been popping up in many countries in the last few years. Since the promulgation of the EU’s GDPR and a growing consensus among national governments that its citizens' data is worth at least something, countries like China, India, Brazil, Nigeria, and many others, have all started legislating their own flavor of GDPR to exert control over their people’s data. As the most “wired” country in the world, South Korea has been developing its own “flavor” as well!
Although the justification for data residency can be different – some highlight national security concerns, some note personal privacy issues, some are straight up rent-seeking from rich tech companies – the way to comply is pretty much the same: store Country X citizens’ data in data centers located in Country X and nowhere else.
As our discussion of the “Kakao fire” has hopefully illustrated, the “nowhere else” part can be tricky, if there is physically not enough land to implement disaster recovery with far enough distance for geo-replication within your own borders.
The tradeoff between data residency and disaster recovery presents an awkward dilemma for small but technologically-advanced countries.
This is also where data locality and foreign relations with neighboring countries interconnect in interesting ways. After all, many EU countries are just as small and technologically-advanced as South Korea. GDPR has managed to function (so far) in large part because two-decades worth of trust has been seeded among EU countries, so that when GDPR was rolled out in 2012, most members feel comfortable being treated as a single data collective without fear of being taken hostage. Scandinavian countries that use the Stockholm data center are not too concerned about the geo-replication in Frankfurt or Paris.
The same level of trust, unfortunately, cannot be said among East Asian countries. For a data center in Seoul to have a backup located 1,000 miles (or 1,000 kilometers) away, it will have to be somewhere in China, Japan, Russia or the middle of the ocean.
Plenty of grievances, both historical and current, still exist among all these countries, while relationships and alliances are shifting constantly. Of course, I’m not saying that if the backup data center is located in Osaka or Shanghai, Japan or China will take Kakao's user data hostage to harm South Korea. But it is a risk that has to be negotiated and minimized, not assumed away.
When the “Kakao fire” investigation wraps up, the public conclusion will unlikely be: “disasters like this fire is unavoidable in data centers, South Korea is too small for proper geo-replication, so we need to strengthen relationships with our neighbors to architect robust disaster recovery while enforcing data residency.”
But it might as well be.
p.s. the bilingual (English/Chinese) version of this post is published on interconnected.blog