SoftBank Improves VM Capacity and Reduces Costs

Performance verification of Intel® Optane™ persistent memory improves VM capacity and reduces infrastructure cost.

At a glance:

  • SoftBank, a communications carrier, uses more than 10,000 virtual machines for its internal IT infrastructure, mainly for mission-critical systems.

  • The IT Infrastructure Division decided to switch the virtualization infrastructure to a server with Intel® Optane™ persistent memory, delivering up to 3x improvement in VM capacity.1

author-image

By

Promote the Beyond Carrier and Take Information Revolution to a New Stage

SoftBank Corp. develops various businesses centered on the mobile communications business, broadband services, and fixed-line telecommunications business based on the corporate philosophy “Information Revolution—Happiness for everyone.” The company, which has led the SoftBank Group’s communication business for many years, has been listed on the First Section of the Tokyo Stock Exchange in December 2018, and is currently taking the information revolution to a new stage under the Beyond Carrier Strategy, which goes beyond the traditional telecommunications business model, providing innovative services in a wide range of industries and aiming for further growth.

In the core communication business, SoftBank, Y!mobile, and LINE MOBILE, a multicarrier MVNO, are addressing diverse needs with these three brands. SoftBank and Y!mobile provide unique services in cooperation with Yahoo!, one of the largest portal sites in Japan. In addition, while developing fiber optical circuits for fixed-line and corporate services, we are promoting technology development and demonstration experiments for new service solutions using advanced technologies such as AI, IoT, robots, and automated driving toward the practical application of high-speed, large-capacity, low-latency in 5G.

The IT Infrastructure Division, IT Division, supports the company’s Beyond Carrier strategy, which is constantly innovating and taking on challenges. “Communication Services and their infrastructure have become indispensable in everyday life and business, and their importance is increasing more than ever,” said Tadashi Suzuki, Senior Director, IT Infrastructure Division, Information Technology Division. “The role of our IT Division is to pursue the latest technology while prioritizing the safety and stability of communications, and to make people happy by the information revolution.”

Validating Features and Performance of Intel® Optane™ Persistent Memory to Improve Server Consolidation Rate

As a communications carrier, SoftBank uses more than 10,000 virtual machines for its internal IT infrastructure, mainly for mission-critical systems, information systems used by approximately 45,000 employees of the company and its partner companies, applications used in day-to-day operations such as call center systems, store systems or billing systems used in mobile shops, and consumer services such as smartphone applications/cloud services etc. Since the merger of SoftBank Mobile, SoftBank BB, SoftBank Telecom, and Y!mobile in 2015, the company has managed multiple distributed data centers and more than 1,000 racks.

SoftBank has introduced a virtualization platform for more than 10 years to improve the efficiency of infrastructure, but the hardware has been aging and the number of operation steps in the event of a failure has been increasing year by year. As a result of the introduction of the virtualization platform at the right place for the business application, hypervisors of commercial products such as VMware, Hyper-V, Xen, KVM, and OpenStack of OSS, etc., were mixed in the platform.

With more than 1,000 virtual machines growing each year, the need for the rack space in the data center was also increasing. Therefore, in order to increase the server consolidation rate and reduce infrastructure costs, we built an integrated virtualization platform unified with VMware in 2018 and started consolidating systems running on the previous virtualization platform in June 2019.

However, when we checked the usage of virtualization infrastructure resources at the end of the migration of about 1,000 virtual machines, it was found that the memory allocation rate almost reached the limit (100%), even though the CPU utilization could be spared. Kohei Tanemura, Director, IT Infrastructure Department, IT Infrastructure Division, Information Technology Division, commented as follows.

“Because of the peak in communication access, we are investing in platform based on busy season capacity. Although the CPU usage fluctuates, the memory initially allocated is fixed and reserved, may suffer from been run out first. Also, compared to CPUs that can allocate more virtual resources than physical resources with overcommitment, the overcommitment of memory is not recommended, and memory resources will be exhausted first when trying to increase the consolidation rate.”

To address these issues, SoftBank focused on Intel Optane persistent memory, a non-volatile memory that offers both high access performance and large capacity. As soon as it was released from Intel in April 2019, the company adopted it immediately, and it was decided to determine whether or not it could be introduced into the integrated virtualization platform, and to verify its operation and usefulness. “The purpose is to study the server design that will be introduced in the future and to verify whether it can contribute to an increase in the server consolidation rate,” recalls Yusuke Omiya, Manager, also from IT Infrastructure Department.

Verifying the Deployment, Function, Performance, and Design in the Configuration Similar to Commercial Environment

The IT Infrastructure Division has obtained two servers equipped with Intel Optane persistent memory as evaluation machines, and decided to conduct verification in a configuration similar to a commercial environment. The company provides a commercial environment with 16 servers, 2 network devices and 1 storage unit in a single server rack. This time, two of the 16 servers in the server rack were replaced with evaluation machines, and the verification was performed without changing the network, storage, or virtualization environment configuration of VMware. “Our company increases availability by loosely coupling in every rack unit. Therefore, even if a failure occurs, the effect is limited within the rack, so it does not affect the whole,” says Omiya. The summary of the verification is as follows.

Evaluation Machine

Server # 1

  • CPU: Intel® Xeon® Gold 6252 processor
  • Memory: Intel Optane persistent memory 128GB x 12 (1.5TB)
  • Memory cache: DRAM 16GB x 12 (192GB)
  • SSD: SATA 480GB, NVMe* 4TB
  • NIC: 10Gb Dual port

Server # 2

  • CPU: Intel® Xeon® Platinum 8260L processor
  • Memory: Intel Optane persistent memory 256GB x 12 (3.0TB)
  • Memory cache: DRAM 32GB x 12 (384GB)
  • SSD: SATA 480GB, NVMe* 4TB
  • NIC: 10Gb Dual port

Validation Schedule

Validation was conducted over a three-month period from July to September 2019. In the first month, the company conducted environmental survey and made internal adjustments, and created a specific verification plan. From August, they moved to on-site work at the data center and built a verification environment. The actual verification started in the second week of August, and functional/performance verifications were performed in about three weeks. In September, as a period during which the change of the verification environment configurations was prohibited, a long-run test was conducted in which operation was continued for one month while maintaining the CPU load at almost 100%. Based on the results of performance verification, designing for actual application started.

Validation Items

In this verification, the following nine tests were conducted for four items: deployment, function, performance, and design. Validation items are also realistic for commercial use, and cover everything from initial installation, functional and performance testing, and power consumption measurement. Omiya says, “An important point in production operation is power consumption. We manage more than 1,000 racks in multiple data centers, but the available power consumption per rack is fixed. So, we needed to make sure that it was within the acceptable range for this verification.”

Deployment

  • Installation: OS can be installed using internal standard procedures
  • Add vCenter/Cluster: Can add to vCenter/Cluster

Function

  • Basic functions: Verify basic functions of VMware such as virtual machine deployment, vMotion, DRS
  • Susceptibility test: VMware HA and server mixed environment test by turning off the power
  • Long-run test: Long-run test with 100% CPU load for 1 month during the evaluation machine lending period

Performance

  • Conduct Benchmark: Benchmark on servers with Intel Optane persistent memory
  • Benchmark comparison: Compare benchmark results with existing servers

Design

  • Power consumption measurement: Measure required power consumption for design
  • Tentative design: Tentative design when Intel Optane persistent memory is introduced

Validation Results

The deployment and functional verifications were completed successfully and found that it can be used in a commercial environment. The results of server performance verification using the benchmark tool UnixBench are as follows. Verification machines equipped with Intel Xeon Gold 6252 processor showed higher values than the current machine equipped with Intel® Xeon® Silver 4114 processor when configuring and measuring an 8-core 32 GB VM.

Measurement of memory latency using the benchmark tool LMbench3 showed no significant change in the case of 8-core 32 GB. When configuring and measuring an 8-core 384 GB VM, the peak latency was about 110ns in the existing environment compared to about 340ns in the server environment of Intel Xeon Gold 6252 processor with Intel Optane persistent memory.

This is probably due to the fact that the memory capacity of the VM is 384GB, and that data has increased since the DRAM buffer area (384 GB) overflowed and data was written to Intel Optane persistent memory. Eventually, SoftBank found that there was no problem when using applications for tests.

All virtualization platforms released after February 2020 will be replaced by Intel Optane persistent memory. We’ll start with about 300 Intel Optane persistent memory servers and will migrate and operate them.” — Kohei Tanemura, Director, IT Infrastructure Department, IT Infrastructure Division, Information Technology Division

Estimated Cost per VM Can Be Reduced by 41% YoY

From the results of the verification, it was found that the memory allocation capacity per VM could be improved by 33%1, and that up to about three times as many VMs could be accommodated. In actual procurement, the company adopted the Intel® Xeon® Gold 6222V processor, which has a similar number of cores and delivers low power consumption. As a result, the unit price per VM is also estimated to be 41% lower than the procurement costs in the first half of 2018. The company expects to increase server unit cost by 15% per server, but the higher consolidation rates will reduce the number of servers and VMware licensing costs, and lower overall costs.

By increasing the consolidation ratio per physical server, the number of rack installations for annual demand is also reduced, so the space efficiency of the data center can be improved. Shared storage is also expected to decrease in proportion to the number of racks.

Replaced All Integrated Virtualization Infrastructure with Intel Optane Persistent Memory

The IT Infrastructure Division decided to switch the server procurement for virtualization infrastructure to a server with Intel Optane persistent memory from the second half of FY2019, as the results of the verification were as expected. “All virtualization platforms released after February 2020 will be replaced by Intel Optane persistent memory. We’ll start with about 300 Intel Optane persistent memory servers and will migrate and operate them.” says Tanemura.

The IT Infrastructure Division, which has switched to a new virtualization platform, is now considering applying Intel Optane persistent memory in the database and multi-access edge computing (MEC) areas. In addition, the results of this verification are being rolled out to the application development department and the DevOps department in the IT Division. “The exhaustion of virtual machine memory is a problem among all departments. We will actively share information to improve the efficiency of company-wide systems. We look forward to having a latest information from Intel.” says Suzuki.

Intel Optane Persistent Memory

Low-cost, high-capacity memory with low latency at near DRAM performance (up to 40 times faster than conventional SSDs). In addition, since it is a non-volatile memory, data is retained even if the server power is turned off due to a failure, etc.

It has two operation modes: “App Direct Mode,” which operates as ultra-high-speed nonvolatile memory, and “Memory Mode”, which can be used as a large-capacity memory in the same way as ordinary DRAM. The memory mode is cheaper in units of bytes, so it can be used as an inexpensive RAM with near DRAM performance.

Challenge

  • Reduce data center space
  • Reduce server costs
  • More memory per server

Solution

  • Intel Optane persistent memory
  • Intel Xeon Gold 6222V processor

Results

  • Quick response
  • 33% more memory allocated per VM1
  • Up to 3x increase in VM capacity1
  • Unit price per VM reduced by 41% y/y1
  • Improve data center space efficiency

SoftBank Corp.

Established: December 9, 1986

Capital: 204,309 million yen (As of March 31, 2019)

Sales: 3,746.3 billion yen (fiscal year ending March 2019)

Employees: Approx. 17,100 (as of March 31, 2019)

Business Activities: Provision of mobile communication services, sale of mobile devices, provision of fixed-line telecommunication services, provision of Internet connection services

Download the PDF ›