Unveiling the ZStack Cloud Vhost Primary Storage Architecture: Achieving a Perfect Harmony of Million IOPS per Disk and 100-Microsecond Latency

2025-03-19 11:54

Table of Contents

ZStack Cloud 5.0.0 Version Supports the Next Generation High-Performance All-Flash Distributed Storage.This article explores the ZStack Cloud Vhost main storage integration architecture from a technical implementation perspective.

Introduction

With the rise of artificial intelligence, enterprises’ demand for data storage systems has gone beyond basic capacity and stability, focusing more on performance, high-concurrency processing capabilities, and flexible scalability. This is particularly true for industries such as finance, healthcare, and manufacturing, which are facing unprecedented data storage pressures.

The financial industry, known for its stability, also has stringent performance requirements for its application systems. For example, typical business scenarios such as trading, risk control, and asset management face new challenges in storage performance, especially in terms of bandwidth and latency, when dealing with large-scale concurrency and real-time data analysis, as seen in current internet business operations.

In the healthcare industry, over 80% of medical storage data originates from rapidly growing medical imaging data, with an annual growth rate of 30-40%. Medical institutions are also required to retain inpatient electronic medical records for at least 30 years and outpatient electronic medical records for at least 15 years. Additionally, different clinical departments have their own specific time-related requirements for data storage, which imposes higher standards on existing storage architectures.

Industry 4.0 has had a massive impact on modern manufacturing, especially in real-time business scenarios like MES (Manufacturing Execution System). The entire production line relies heavily on storage performance, as any delay beyond an acceptable range can lead to production accidents, directly impacting enterprise development.

Traditional storage solutions, such as SAN, often encounter performance bottlenecks, high latency, and expansion difficulties when handling complex and high-frequency data, which hampers the development of user businesses. To break through this dilemma, a high-performance all-flash distributed storage system based on the vhost-user protocol has emerged. This system, with its innovative design concepts and advanced technical features, provides a high-performance, scalable, and cost-effective solution for industries such as finance, healthcare, and manufacturing.

Technical Principles

The mainstream virtio solution is a device model specifically designed for virtualization environments, providing standard interfaces and implementations for virtual I/O devices. The virtio driver is designed to minimize host backend operations (vmexit) to improve I/O efficiency. Although virtio has been significantly optimized compared to pure virtualization solutions, I/O processing within the QEMU IO thread remains inefficient. Taking virtio-blk with a file backend as an example, its I/O path involves:

This path involves two context switches between the host user space and kernel space, which increases latency.

virtio solution

vhost technology moves the backend data processing of virtio devices outside the QEMU process, avoiding state switching and system calls within QEMU. Instead, it directly calls the device drivers on the host (such as block device read/write or network send/receive) to perform the actual I/O operations, greatly enhancing backend performance.

As one of the core technologies of high-performance storage, the vhost-user protocol enables direct data transfer between user-space applications and virtual machines, significantly reducing context switches between kernel and user spaces, effectively lowering CPU interrupt handling overhead, and improving overall performance. Compared to traditional virtio, vhost-user simplifies the I/O path:

vhost-user solution

As a user-space implementation of vhost, vhost-user communicates with the QEMU process via Unix sockets to obtain virtqueue configuration information and memory layout. It then uses mmap to implement shared memory communication between QEMU and the guest. This mechanism allows user-space applications on the host (such as storage gateways) to directly access the virtqueue of the virtio device without going through other queues, thus significantly improving I/O performance.

Technical Features

However, relying solely on the vhost-user protocol is not enough to achieve million-level IOPS and sub-hundred-microsecond latency with a single disk. To address this, ZStack Cloud has implemented several key technologies:

Lock-Free Architecture

By using ID Sharding to distribute I/O tasks across different processing channels, resource sharding is achieved, and a lock-free architecture is built, effectively reducing I/O latency. ZStack Cloud adopts a fully user-space, lock-free data channel combined with the RTC I/O processing model. This achieves end-to-end no-context-switching, lock-free zero-interrupt, full-user-space network I/O handling, along with core binding and asynchronous polling strategies. This design minimizes software stack latency to 20–40μs, a significant improvement compared to the 400μs typical in traditional distributed I/O stacks. It fully unleashes the capabilities of new-generation hardware and network technologies.

Full-Stack Memory Zero-Copy

ZStack Cloud optimizes data transmission by using zero-copy technology, avoiding multiple data copies between user space and kernel space. Data is transferred directly from the sender’s buffer to the receiver’s buffer without CPU intervention, greatly reducing data processing time and system resource consumption. With the support of large-page shared memory and RDMA networks, zero-copy is implemented from virtual machine I/O memory to storage gateways and OSDs, while large-page memory is centrally managed to ensure memory allocation considers NUMA access performance within the same NUMA node.

Reducing GuestOS Backend Notification Overhead

In the storage gateway, the vhost backend uses polling mechanisms to avoid the virtual machine triggering the backend storage each time an I/O operation occurs, thus improving system performance.

Multi-Core Concurrency

ZStack Cloud leverages modern multi-core CPUs’ powerful processing capabilities to support multi-queues. Using a balancing algorithm, it evenly distributes virtqueues across the cores, enabling high concurrency. This effectively balances the load while significantly improving Guest I/O throughput and response speed for individual disks.

Performance Verification

To verify the actual performance of the Vhost main storage, we used a typical server cluster configuration with an efficient RDMA network environment. The specific test environment configuration is as follows:

RDMA network configuration:

 

Three servers configuration:

During the test, the Vhost main storage exhibited outstanding performance:

Low Latency Performance

Under strict latency testing conditions, with fio test parameters set to depth=1 and numjob=1, random 4KB read/write operations in the virtual machine were controlled to a latency of around 100 microseconds.

High Throughput Capability

To test the system’s high concurrency handling capabilities, we adjusted fio test parameters to depth=64 and numjob=8. The system successfully achieved million-level random small IOPS while maintaining industry-leading read and write performance.

Efficient CPU Utilization

With a multi-queue design (queues=4), the IOPS performance increased 2 to 3 times compared to single-queue mode, while maintaining low latency even under high throughput.

Database Testing

In MySQL 5.7 with an 8-core 16GB configuration, compared to traditional distributed storage, Vhost main storage achieved a 30% improvement in TPS, and 95% latency was reduced by approximately 3 times. When the thread count increased to 96, QPS improved by 38%.

In Oracle 19c with an 8-core 16GB configuration, compared to traditional distributed storage, Vhost main storage achieved a 6-fold increase in TPM, reaching 420,000 TPM.

Scenario Practice

Financial Institution Batch Processing Practice

A financial institution in the Yangtze River Delta region performed a POC test using Vhost main storage, and compared with a foreign product under the same configuration, the batch processing time was reduced by 62%, significantly improving operational efficiency.

High-End Manufacturing Client Performance Testing Practice

A vehicle manufacturing enterprise in central China achieved a 21% improvement in read-write performance compared to high-end storage array products by testing Vhost main storage in ZStack Cloud 5.0.0.

Tianjin Third-Class Hospital Registration Concurrency Testing

A major hospital in Tianjin improved the concurrent registration capacity by 63% using the Vhost main storage in ZStack Cloud 5.0.0, reducing waiting time by 30%, greatly improving patient experience.

Conclusion

The next-generation Vhost main storage system, with its innovative technical principles and significant business value, provides efficient, reliable, and cost-effective data storage solutions for various industries. It enables enterprises to cope with modern data challenges and achieve technological breakthroughs.

As a cloud computing company focused on product innovation, ZStack strives to make cloud products easier to use and lower the threshold for users. The launch of the Vhost main storage feature is a strong embodiment of this philosophy. In the future, we will continue to introduce more rich and useful cloud computing product features to create value for users.

 

//