The Evolution of DNS Redundancy

July 19, 2022
Authoritative DNS

Michael Smith

Neustar Security Services (NSS) is immensely proud to announce the release of UltraDNS². A combination of UltraDNS that NSS has been operating for years combined with a new, separate platform called DNS² designed to give our customers a separate infrastructure for additional redundancy and higher availability with the features that you have come to expect from us.

But first, a history lesson of how DNS redundancy has evolved.

Early networks used a host file that held a mapping between hostnames and IP addresses. This is still used in some environments for a small number of entries. System administrators either edited them by hand on each system or used a protocol like FTP to synchronize them between systems. This process worked OK if there was a limited number of files, those files were short, and the changes were infrequent. This quickly ended up being the other kind DNS: Does Not Scale, so in 1987, RFC 1034 and RFC 1035 laid out the modern Domain Name System as we know it today with types of queries and the concept of zones and zone files. Additions to the DNS RFCs added the XFR, or zone transfer, where DNS servers can synchronize zone files when the zone file is changed. In addition, there were some constraints built into DNS such as a limitation on the number of authoritative nameservers for a domain, depending on the top-level domain (TLD) and the registrar.

Over time, domain administrators have adopted some best practices to keep their authoritative DNS servers available. They required at least two nameservers to be designated for every domain. They used multiple self-hosted DNS servers in different physical locations. They upsized their DNS servers to accommodate sudden spikes in traffic. They used different network providers to diversify their infrastructure and reduce the impact of ISP outages. They used cloud DNS providers to give them additional capacity and performance. And finally, they used multiple cloud DNS providers to diversify evenly across the full infrastructure. The “good old reliable” zone transfers served as the glue to keep it all synchronized.

Fast forward to September 2016 when the Mirai botnet breaks the internet (yet again) for several days because of DNS outages. A large botnet, called Mirai, made up of thousands of hacked IoT devices, such as internet-connected cameras, was used to attack Brian Kreb’s site, krebsonsecurity.com. Since Kreb’s site had adequate DDoS (Distributed Denial of Service) mitigation in place to protect the website properly, the attackers pivoted to the supporting infrastructure: the authoritative nameservers for the domain. These servers belonged to a DNS cloud provider that had many customers using them for hosting zone information. The impact of the attack caused a large amount of collateral damage as all the websites and other services that used only that service provider became unavailable.

A best practice since 2016 for a large or medium-sized website operator, is that you should be using two cloud DNS services providers to gain diversity across infrastructure, software, and operations personnel. However, using two different authoritative cloud DNS providers has its downsides. Anytime you use two different cloud providers, you are forced to use the least common denominator of features across all your providers to keep compatibility. You also lose the orchestration and automation that cloud providers provide and end up falling back on using zone transfers or manual updates to keep the zone information synchronized.

Today, DNS is used for a lot of different uses that do not fit nicely into the system of zone text files and zone transfers that have been used for over 40 years: load-balancing across servers, data centers, or clouds with liveness tests and failover; geographic affinity to specific data centers; wildcards or aliases to match a request for any hostname inside the domain or the top of the domain itself; and validation of responses using DNSSEC (DNS Security). The configurations for these advanced features are more like nested data structures than the simple listing of records that are used for zone files. These nested structures do not fit into a text zone file, and they are impossible to transmit using the zone transfer protocol built into DNS.

A more problematic issue is that you need some way to manage changes and apply them to the configurations on all your service providers. That means a much higher rate of either dependency on a third-party orchestrator tool—at increased cost and complexity—or a rate of human error as you “sneakernet” changes from one infrastructure to another. At larger numbers of domains and zones, the chance of an error occurring is incredibly high.

Everything that has been discussed so far is the reason NSS built UltraDNS². Our goal is to create a second separate authoritative DNS infrastructure and to limit any shared processes and technologies between it and our UltraDNS infrastructure while at the same time ensuring feature compatibility. This means that we walk a very thin tightrope between these two competing concepts, and I think we have done an excellent job at balancing these opposite ends of the spectrum.

It started with a risk management approach: we took our many years of knowledge and experience, sat down, and looked at the common failure modes and patterns that have happened in DNS and why people have evolved into using multiple cloud DNS providers.

Assessment of the impact of “bad neighbors” or “frequently-attacked neighbors” on the same authoritative server if you do not have enough segregation from their traffic
Human error caused by Network Operations Center (NOC) staff manually making mistakes to routing and switching
Electrical outages or natural disasters at a single data center location
ISP outages that impact a huge section of IP services to their customers
Errors in updating the DNS software and server daemons
Network outages

Then we looked at all the problems that using multiple cloud DNS providers creates:

Costs and complications of having multiple contracts
Complexity and errors in managing zone information to keep it current across service providers
Advanced features do not work across service providers
Reporting and statistics are in a separate silo for each provider, so you never get a unified view

So then, what exactly is separate in UltraDNS²?

UltraDNS²is hosted separately, and the network is operated by a leading anycast cloud provider with different network operations for the underlying systems, provisioning, automation and routing policies that are used for UltraDNS
UltraDNS² is built in separate, new Points of Presence to increase availability and redundancy but also to reduce the impact of facility outages
UltraDNS² has a separate Network Operations Center and staff to minimize the impact of human error or loss of monitoring and management of the infrastructure
UltraDNS² uses completely different provisioning, automation and routing policies to eliminate the impact of circuit, routing, and other network outages
Customers are placed in small, isolated groups of servers in addition to each customer receiving a set of unique nameservers that significantly reduces the risk and impact from co-tenants
Staged updates, upgrades, and other changes to reduce the impact of errors and bugs in server applications, deployment, and orchestration

To keep feature parity and simplicity in management, we did have to have some components that are shared between UltraDNS² and UltraDNS:

Same management portal, reporting, and billing interface
Same features—including advanced features—as UltraDNS
Same APIs to automate zone management and integrate with third-party tools
Same database and zone information propagation. It is worth noting that without these functions, our DNS servers will continue to answer queries
Same server software and server management software

At this point, you are asking is UltraDNS² is for you? It is if you meet one or more of the following criteria:

You have a large volume of DNS queries
You want to achieve business continuity and disaster recovery goals
You are using UltraDNS and want more availability and redundancy across infrastructure
You are using UltraDNS and other providers and want to use advanced features like load balancing or DNSSEC
You are using UltraDNS and other providers and want to reduce complexity and cost

UltraDNS² is the result of using our 20+ years of experience to address the historical problems with DNS resiliency while maintaining a balance with service management. We are proud to bring UltraDNS² to you to allow your business to thrive online with peace of mind.

Last Updated: March 19, 2024

Introducing UltraAPI: Bash bots and secure APIs.

The Evolution of DNS Redundancy

Solutions

Industries

General