"Automating" Server Upgrades

You're thinking of robots in the data center, right? There actually a number of ways to go about it that do not involve our soon-to-be sentient friends.

Let's start by defining the objective. What do we want to achieve here? Bare metal automation is used for a variety of reasons, but fundamentally, it is to offer the compute, storage, and network part to some application or user or further consumption.

The challenge is how to provide a new server with a certain configuration (RAM, DISK, Cores, GPUs) and connect it in a certain way to other servers without touching any server or cable.

This is useful for speeding up service delivery for service providers and reducing truck rolls in telco-edge cloud scenarios.

Let's break it down, shall we?

Step 1 - disaggregate storage

Storage is the biggest pain for a service provider. Too little and many people will run out. Too much, and your solution is too expensive. Hence, storage providers do what's called storage disaggregation.

There are two major ways of doing this:

Option 1: External storage

Some bare metal automation solutions (Including MetalSoft) allow the operation of diskless servers (also called Netboot). Servers have no local storage, and they boot via iSCSI from an external storage. This way, you can quickly move the volumes between hosts if you want to perform an upgrade.

From our experience, this setup is actually FASTER than local hard drives and even some local SSDs due to the storage system's cache.

From a cost perspective, it is about the same. Storage space costs more than local disks per GB, but they make up through deduplication and compression.

This setup is limited by the network links of the server (eg: if you have Gbit connections, you are limited to about 100Mbps.). The other limit is latency (read IOPs). From our experience, this is negligible for disks, but it starts to become apparent versus setups with more than one local NVMe SSDs.

A new class of transport promises to eliminate this issue as well, which is NVMe-over-Fabrics. There are multiple flavors, including my favorite RDMA over Converged Ethernet (RoCE), which promises performance levels identical to direct attached storage (DAS).

Option 2: Common pool of local disks

Other solutions use another approach whereby the disks are still in the node but they are aggregated into a common pool and exported over the network to the other nodes.

This solution doesn't quite work for pure bare metal in a multi-tenant environment due to the security implications. However, it does work very well with private Kubernetes clusters (by using something such as Rook+Ceph). This is a setup that works best at the edge as it avoids having to have local storage.

Step 2 - standardize on CPU and RAM configurations

From my experience as a product manager for a service provider, I can tell you that users do not actually need very specific configuration sizes, so it pays to standardize.

Users typically think in terms of I need a "Small" server rather than I want a server with 10 cores and 17GB of RAM. Nobody is that precise.

The ratio that works best as a general-purpose server seems to be 4GB RAM for every hyper-threaded 'core' (1:4).

An "S" server would be a 8 HT cores, 32GB RAM machine. An "M" server would be a 16 (or 20) HT cores 64GB RAM and so forth. Your base "SKUs" would look like this:

"S" server configuration: 1x Intel® Xeon® E-2134 Processor (4 cores/ 8 threads) 32GB RAM.
"M" server configuration: 1x Intel® Xeon® E-2278G Processor (8 cores/ 16 threads) 64 GB RAM.
"L" server configuration: 2x Intel® Xeon® Gold 6328H Processor (16 cores 32 threads) 256GB RAM.

This should cover 80% of your requests. Beyond this, you will also have some "Specialized" instance types with a ratio of 1:8 ratio of HT cores:RAM for "memory intensive apps" or 1:2 for CPU "intensive apps," but remember that those are exceptions, not the rule, and you can always fallback to one of the "larger" standard configurations. You can always fit inside a larger memory server if you have enough CPU.

I've attached a graph of one of the RAM capacity distributions that I've seen in practice so you can calculate your numbers. Lean towards having more of the extremes (S and L/XL) rather than the Ms.

This standardization will help you greatly to avoid having to actually upgrade and downgrade servers all the time physically. This is not only costly and time-consuming but can also damage the server itself while the RAM DIMM is introduced. It's also hit-and-miss that we had to 're-bed" so many DIMMS over the years that it's just not worth it.

If you use netboot, if a customer needs an upgrade, you simply move the drive over to the new system and reboot. If you use a locally installed OS, then you provision the new instance in addition to the first one, copy the apps and data over, and shutdown the first one.

To learn more, check out this blog post.

Step 3 - Automate switch provisioning

This is perhaps the key to the above process. Cabling is one of the biggest pains and the piece most likely to fail. To completely eliminate the need to touch the servers after you've racked them, you need to switch provisioning automation.

You typically pre-cable 2 or 4 connections per server to one or two leaf switches when you rack the server.

After that, you have the freedom to programmatically "link" the ports to whatever network you want from the infrastructure.

We recommend using M-Lag capable switches and using link aggregation across different switches by default for all links. This way, if a switch fails, you're still up and running.

Conclusions

So there you have it, you can now provision, de-provision, and repurpose bare metal to serve different needs without touching it (after the initial racking, that is). The servers don't actually change, of course. It's the application that moves between them.

In a different article, I'll get into more detail about the server configurations you can use.

Subscribe to our newsletter for more articles like this.