Seizing Over 1 Billion Worth of Opportunities with High-Concurrency Architecture I

1.Challenges Faced by Enterprises for Sustainable Growth

Digital Transformation Trend
Over the past 30 years, digitalization has rapidly flourished, bringing forth not only numerous emerging industries but also the digital transformation of large enterprises. We can observe in our daily lives that many activities are highly integrated with digital systems, making digitalization an indispensable basic operational requirement across various industries.

Food: Ordering meals, reservations, promotional meal vouchers, etc.
Apparel: Going out for shopping is no longer the only choice, as abundant online shopping options provide greater convenience.
Housing: Room reservations, online property viewings, eliminating the need for phone or in-person reservations.
Transportation: Various ticket bookings, online schedule inquiries.
Education: With the booming development of online education, physical classrooms are no longer the sole means of education.
Entertainment: Various online games, mobile games, live broadcasts of large-scale online events.

Seizing Over 1 Billion Worth of Opportunities with High-Concurrency Architecture I
▲ In 2019, the live broadcast of online real-time battle games set a record of the highest simultaneous online users with 44 million people (Image source: Riot official website)

Warning Signs
In the state of continuous growth and development of network digitalization, enterprises are faced with more orders, members, and massive transactions. Business operations increasingly rely on systems to complete all tasks. With this, system usage rates also significantly increase. Despite the seemingly positive situation, hidden concerns accompany it. Enterprises' systems will face issues such as insufficient resources, slow processing, and more critically, system crashes.

For example, despite meticulous planning, an enterprise's long-awaited live streaming event suffers from poor performance and interruptions due to overwhelming traffic, resulting in a subpar user experience and significant losses in revenue and reputation amidst a seemingly prosperous scenario.

Seizing Over 1 Billion Worth of Opportunities with High-Concurrency Architecture I
▲In 2021, a certain online bank went offline for nearly 5 hours due to service interruption caused by excessive traffic flow. (圖片來源 : 蘋果新聞網)

2.What is High Concurrency?

Firstly, it's essential to understand that concurrency refers to multiple requests accessing a system simultaneously. When a service system operates, the high load caused by a significant increase in usage at the same time is referred to as high concurrency in system architecture. This situation can lead to server overload, resulting in crashes or bottlenecks, rendering the system unable to provide normal service.

Many aspects of the system are affected by high concurrency, including the operating system, network, hardware resources, web servers, databases, code, and more. As service demand grows, the online user count may no longer be in the thousands but could reach tens of thousands or even hundreds of thousands, surpassing what a single server can handle.

With the proliferation of the internet and smartphones replacing physical stores, we observe a continuous exponential growth in online business models.

A large-scale practical example is the challenge faced annually by platforms like Taobao and Tmall.

Seizing Over 1 Billion Worth of Opportunities with High-Concurrency Architecture I
▲The chart showing the growth of concurrent orders per second during Alibaba's Singles' Day and Taobao Double 11. (資料來源 : 阿里雲)

How to Plan a Highly Available System?

This question has long been a concern for enterprise IT personnel, as they need to assess system expenditure while preventing system bottlenecks.

Before planning, several assessment indicators need to be understood:

QPS (Query Per Second) is a reference indicator of how many queries a service server can handle per second, excluding write operations.
TPS (Transactions Per Second) is a unit indicator of how many requests a service can complete per second. It is a unit indicator of stress testing software, measuring the number of requests clients can make to the server per second.
RT (Response Time) is the total time required from a client sending a request to receiving response data from the server. It directly reflects the speed of a system, usually measured in milliseconds.
Throughput refers to the system's CPU, Memory, and IOPS carrying pressure capacity, related to hardware performance.
PV (Page View) refers to the number of page views on the server side, counting each time a client refreshes a page.
UV (Unique Visitor) refers to the number of unique clients accessing the service, i.e., different IP addresses visiting.

From the above assessment indicators, planning a highly available system typically involves observing the system state over a period or conducting stress tests through software. After obtaining numerous data points through monitoring, adjustments can be made for success.

Stay tuned for the next part, where we will continue to explore high concurrency architecture.

Author

CTO
藍國豪 Levi Lan

share to