Benchmark between x86 and ARM cloud servers

What exactly are ARM CPUs and how do ARM cloud servers compare to traditional x86 machines? We did the test!

Benchmark between x86 and ARM cloud servers

ARM, best known for its use in mobile devices and Apple's M2 CPU, is making a strong advance to servers. For example, in 2020 Fujitsu designed the (then) world's most powerful supercomputer "Fugaku" which runs on ARM CPUs.

ARM vs x86

CPUs differ by their instruction set architecture (Instruction Set Architecture, ISA). CPUs from Intel and AMD are based on the x86 ISA, while chips based on Arm's design are based on the ARM ISA. ISAs are not mutually compatible. Software written for the x86 ISA will not work on the ARM ISA and vice versa.

The x86 ISA was designed by Intel in the late 1970s. After a legal debacle, Intel and AMD have agreed that AMD also has the right to develop chips on the x86 ISA. AMD then developed the first 64-bit variant of x86; even then, both manufacturers agreed that Intel may produce 64-bit variants of the x86 chipset. Only the small Taiwanese VIA is the third company worldwide with an x86 license that is allowed to develop these chips. As a result, we mainly only see Intel and AMD CPUs on the shelves.

Ampere Altra CPU

ARM (Advanced RISC Machines) was developed in the 1980s by Acorn; meanwhile by Arm. Arm develops ARM CPU designs and licenses those designs to third parties. Arm itself does not produce ARM CPUs itself. For example, Samsung, MediaTek, Apple and Qualcomm have an ARM ISA license and build their own ARM CPUs.

Server hardware manufacturers must choose between the x86 and the ARM ISA. Until now, the choice usually fell on x86, because most software is written for that ISA. You can work with emulators to run x86 software on ARM chips, but it is better to avoid that because it comes at the expense of performance.

The technical difference

The x86 ISA is based on the CISC principle (Complex Instruction Set Computing). ARM processors are based on the RISC principle (Reduced Instruction Set Computing). Therein lies the essential technical difference. x86 is capable of executing complex instructions per clock cycle; against which ARM can only execute one (1) instruction per clock cycle.

What is a clock cycle?

Clock speed is the indicator of the speed of a CPU and refers to the frequency at which the processor generates the clock signals that synchronize operations within the processor. The unit of measurement is "clock cycles per second," or hertz (Hz). Today's CPUs perform billions of clock cycles per second.

For those complex tasks, the x86 ISA relies on hardware transistors and circuitry. ARM doesn't need that extra hardware. ARM does require more RAM than x86; but the low cost of RAM today justifies it - unlike a decade or so ago. ARM with RISC thus has the advantage of being able to use more economical and simpler hardware.

A further detailed explanation of the difference between the ARM and x86 would distract us too much from this benchmark. But if you are interested in that, be sure to watch the video below from Gary Explains (English, 20 min.). Red Hat's article 'ARM vs x86: What's the difference' also provides a concise summary.

ARM vs x86 - Key differences explained

ARM and cloud servers

Cloud servers that we find at parties such as Digital Ocean, Vultr, Linode (Akamai) and UpCloud offer virtual servers on the x86 architecture CPUs. In addition to x86, Hetzner also offers ARM cloud servers. Because of, among other things, the lower power consumption (therefore less heat and therefore less cooling required), they can also offer these servers at a lower price than the x86 counterparts.

In 2018 hyperscaler AWS launched their first ARM cloud servers 'AWS Graviton' with their self-developed ARM CPU. Google Cloud has also been offering ARM cloud servers with their product 'Tau T2A' since 2022. In 2023, Hetzner was the first mid-market provider to come up with ARM servers based on Ampere Altra.

Pricing

We included a number of ARM cloud servers from AWS, Google and Hetzner for a price comparison. Making an exact comparison is not possible because the combination vCPU and RAM do not exactly match; but it gives an idea. In terms of cost price, we have converted USD to EUR. Prices are from August 2023 and are exclusive of VAT.

Provider Type Price/month
AWS a1.large (2vCPU, 4GB RAM) € 38,6
Google t2a.standard-2 (2vCPU, 8GB RAM) € 54,46
Hetzner CAX11 (2vCPU, 4GB RAM) € 3,79
AWS a1.xlarge (4vCPU, 8GB RAM) € 78,38
Google t2a.standard-4 (4vCPU, 16GB RAM) € 108,93
Hetzner CAX21 (4vCPU, 8GB RAM) € 6,49

What we compare

Given the large price difference between AWS, Google Cloud and Hetzner, we only compare cloud servers from Hetzner in this benchmark. We compare Hetzner's x86 offering with their ARM cloud servers. We compare three x86 virtual machines (VMs) with three similar ARM VMs. Prices are from August 2023 and are exclusive of VAT. All VMs had a 'shared CPU' and local storage.

We were curious if we could measure a difference in performance between the x86 and the ARM cloud servers.

Type Specifications Price/month
CX21 Intel, 2 vCPU 4GB RAM € 5,35
CPX31 AMD, 4 vCPU 8GB RAM € 13,60
CPX41 AMD, 8vCPU 16GB RAM € 25,20
CAX11 Arm64, 2 vCPU 4GB RAM € 3,79
CAX21 Arm64, 4 vCPU 8GB RAM € 6,49
CAX31 Arm64, 8vCPU 16GB RAM € 12,49

How we compared the virtual machines

We ran the benchmarks on an Ubuntu 22.04 LTS installation. We installed all updates, but made no changes. For the benchmarks we used Phoronix Test Suite v10.8.4. When a standard deviation (deviation) greater than 2.5% was observed between tests, an additional test was performed until the standard deviation fell below 2.5%, with a maximum of 40 tests. The average value was recorded as the result.

We performed the benchmarks below, with a focus on the system, a single and multi core CPU, the ram memory and the storage space. Because we focus on the CPU in this benchmark, we ran three different multi-core CPU tests.

  • pts/apache (focus on the systeem). This is a test of the Apache HTTPD web server. This Apache HTTPD web server benchmark test profile makes use of the Golang "Bombardier" program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients.
  • pts/hint (focus on single core CPU). This test runs the U.S. Department of Energy's Ames Laboratory Hierarchical INTegration (HINT) benchmark.
  • pts/stockfisch (focus on multi core CPU). This is a test of Stockfish, an advanced open-source C++11 chess benchmark that can scale up to 512 CPU threads.
  • pts/openssl (focus on multi core CPU). OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities.
  • pts/compress-7zip (focus on multi core CPU). This is a test of 7-Zip compression/decompression with its integrated benchmark feature.
  • pts/stream (focus on the memory). This is a benchmark of Stream, the popular system memory (RAM) benchmark.
  • pts/postmark (focus on the storage). This is a test of NetApp's PostMark benchmark designed to simulate small-file testing similar to the tasks endured by web and mail servers. This test profile will set PostMark to perform 25,000 transactions with 500 files simultaneously with the file sizes ranging between 5 and 512 kilobytes.
  • pts/speedtest-cli (focus on the network). This test profile uses the open-source speedtest-cli client to benchmark your Internet connection's upload/download performance and latency against the Speedtest.net servers. We did not manually select the Speedtest.net servers.

Results

For clarity, we have always placed the x86 and the ARM VM with the same specifications (Cores and Memory) in a consecutive column. So we always kept this order of columns:

  • CX21 (x86 2C2M) (2 Cores en 2 GB Memory)
  • CAX11 (ARM 2C2M)
  • CPX31 (x86 4C8M)
  • CAX21 (ARM 4C8M)
  • CPX41 (x86 8C16M)
  • CAX31 (ARM 8C16M)

Benchmark: pts/apache

Focus on the system. A higher score is better.

Test CX21 (Intel 2C2M) CAX11 (ARM 2C2M) CPX31 (AMD 4C8M) CAX21 (ARM 4C8M) CPX41 (AMD 8C16M) CAX31 (ARM 8C16M)
Concurrent Requests: 20 (Reqs/sec) 8929.03 8396.6 21583.93 11851.32 28775.56 13866.27
Concurrent Requests: 100 (Reqs/sec) 8721.59 8500.81 23902.41 13827.31 40303.72 25811.99
Concurrent Requests: 200 (Reqs/sec) 8409.15 8360.37 24437.22 14412.63 44176.32 27884.56
Concurrent Requests: 500 (Reqs/sec) 8099.12 8659.95 23457.27 14417.18 40599 25157.7
Concurrent Requests: 1000 (Reqs/sec) 8109.32 8509.21 23123.16 14061.9 39624.6 25842.47

Here we see that the x86 VMs from 4 cores clearly score better than the ARM VMs.

Benchmark: pts/hint

Focus on single core CPU. A higher score is better.

Test CX21 (Intel 2C2M) CAX11 (ARM 2C2M) CPX31 (AMD 4C8M) CAX21 (ARM 4C8M) CPX41 (AMD 8C16M) CAX31 (ARM 8C16M)
FLOAT (QUIPs) 245077735.5 324279108 255728756.6 323302616.1 256482173.5 323597349

For a single core CPU test, the ARM CPUs score noticeably better.

Benchmark: pts/stockfish

Focus on multi core CPU. A higher score is better.

Test CX21 (Intel 2C2M) CAX11 (ARM 2C2M) CPX31 (AMD 4C8M) CAX21 (ARM 4C8M) CPX41 (AMD 8C16M) CAX31 (ARM 8C16M)
Total Time (Nodes/s) 2212880 2458096 6258003 5206669 13194825 9757003

The more cores, the better the x86 machines score in this multi-core CPU test.

Benchmark: pts/openssl

Focus on multi core CPU. A higher score is better.

Test CX21 (Intel 2C2M) CAX11 (ARM 2C2M) CPX31 (AMD 4C8M) CAX21 (ARM 4C8M) CPX41 (AMD 8C16M) CAX31 (ARM 8C16M)
Algorithm: SHA256 (byte/s) 417284440 1560538643 2827727480 3150497120 5768424633 6344199607
Algorithm: SHA512 (byte/s) 436448190 536639563 1262125013 1075240673 2628704613 2152238263
Algorithm: RSA4096 (sign/s) 339 98.8 899 197.6 1819.1 393.4
Algorithm: RSA4096 (verify/s) 22348.4 8065.4 58486.2 16153.3 118880.3 32243.2
Algorithm: ChaCha20 (byte/s) 6560171807 2520305663 10879505840 5043876203 22068229733 10070370990
Algorithm: AES-128-GCM (byte/s) 7450392257 5945657843 15182816323 11888070930 31208029663 23774275450
Algorithm: AES-256-GCM (byte/s) 5528949737 4826336007 13902395437 9682090120 28343330043 19344769023
Algorithm: ChaCha20-Poly1305 (byte/s) 3290716726 1749239397 7049334327 3499712147 14375567270 6982201343

Except for the SHA256 and SHA512 test, the x86 machines score better in this multi core CPU test.

Benchmark: pts/compression-7zip

Focus on multi core CPU. A higher score is better.

Test CX21 (Intel 2C2M) CAX11 (ARM 2C2M) CPX31 (AMD 4C8M) CAX21 (ARM 4C8M) CPX41 (AMD 8C16M) CAX31 (ARM 8C16M)
Compression Rating (MIPS) 7681 10286 19046 20368 36971 39161
Decompression Rating (MIPS) 5392 8634 13913 17024 28768 34205

For compression, all CPUs are very evenly matched. For decompression, the ARM CPUs win by a small margin.

Benchmark: pts/stream

Focus on the memory. A higher score is better.

Test CX21 (Intel 2C2M) CAX11 (ARM 2C2M) CPX31 (AMD 4C8M) CAX21 (ARM 4C8M) CPX41 (AMD 8C16M) CAX31 (ARM 8C16M)
Copy (MB/s) 20394.3 43034 85046.9 74044.8 123673.2 110804.1
Scale (MB/s) 21657 41494.6 50265.9 71391.2 77822 106582.1
Triad (MB/s) 24999.3 40091.9 54782.8 71122.4 83999.5 114068.5
Add (MB/s) 24930.9 39883.2 54297.9 70704.4 83635.8 113249.5

The ARM machines win in the memory test (with the exception of the Copy test) by a large margin.

Benchmark: pts/postmark

Focus on the storage space. A higher score is better. Each VM used ext4 as its file system.

Test CX21 (Intel 2C2M) CAX11 (ARM 2C2M) CPX31 (AMD 4C8M) CAX21 (ARM 4C8M) CPX41 (AMD 8C16M) CAX31 (ARM 8C16M)
Disk Transaction Performance (TPS) 2459 3846 4629 3989 4716 3969

For the VMs with 4 and 8 CPU cores, the x86 servers overtake the ARM machines by a small margin.

Conclusion

We made a percentage comparison of each test score. We made a total percentage distribution of all those scores in order to arrive at a final comparison. You can see this in the graph below.

We see that the 2C2M VMs score about the same. Looking at the 4C8M and 8C16M VMs, the x86 is the winner; but on the other hand it is double the cost.

If Apache is your workload, then it still seems better to go for x86 CPUs. But if your application requires a lot of memory, you will be satisfied with ARM machines. If your application is CPU-intensive, then it is best to do a test to see whether you continue to build on x86 or on ARM cloud servers.

💡
Of course you can always scale your application horizontally, so that you spread the load over several ARM servers and that at half the price of x86 servers.

Try it yourself?

Do you want to experience an ARM server yourself? Via this (affiliate) link from Hetzner you will receive a starting budget of 20 euros. Good luck and have fun!