Stay in the middle: the road of Tencent's core construction

In hindsight, Lynda, a chip verification engineer, thought it was a bit "hasty" to enter Tencent.

As a senior engineer who has worked in the semiconductor industry for many years, Lynda was slightly surprised when she first saw Tencent's demand for chip jobs. 20 19 1 she joined the internet giant with curiosity and was ready to roll up her sleeves and do something big.

During the interview, Henry, who is in charge of chip design, gave her a shot in the arm: "We are making chips from scratch." Linda tried to use the usual low-key of Goose Factory to understand this sentence, but was immediately shocked by the conversation with her colleagues on the first day of work:

-"Where are our simulation tools?" -"No, I'm still talking."

-"What does the verification environment say?" -"Not yet ...No."

-"About what ... the verification process? -"This ...No."

For a chip verification engineer, simulation tools, verification environment and verification process are essential productivity tools. Lynda wants to be involved in chip research and development all the time, but she is not afraid to start from scratch. She just didn't expect that even these necessities could be "three noes"

When an Internet company set foot in semiconductors, the lack of tools is not the most important thing. "Core-building" is not only a simple extension of business, it often means more complex industrial chain, more time-consuming talent precipitation, and more different ecological culture and technical concepts.

For example, chip development is not like software development, and bugs can be constantly changed later. The design problem has not been verified in the early stage, and once it is lost, it can only be reduced to a "brick". Lynda's verification engineer is the goalkeeper to prevent the previous efforts from being in vain.

The importance of this position is self-evident. In many chip companies, the ratio of design engineers to verification engineers will reach 1: 3. However, Lynda looked around after joining the company and found that she not only had a colleague who fought side by side, but also didn't even have a line of verification code.

At this time, Linda began to understand what Henry meant by "starting from scratch" and what a difficult battle she was facing.

0 1

As strong as iron, you don't have a good start.

According to Xie Ming, vice president of Tencent Cloud and general manager of cloud architecture platform department, there are more twists and turns behind "starting from scratch".

Xie Ming's cloud architecture platform department stands behind Tencent's various front-end applications, and is the front line of Tencent's massive business data scouring, effectively supporting national applications such as QQ, email, WeChat, Wei Yun, streaming video and so on.

In 20 13 years, QQ photo album has developed into Tencent's largest storage business. It has become an urgent need for users to access photo albums faster and experience more smoothly. Into the corresponding technical problem, that is, can the picture transcoding speed be faster? Can it be compressed without affecting the image quality? Can it be stored at a lower cost?

They asked again and again.

The team deeply understands the amplification value of the underlying technological innovation to the upper application. Of course, the software architecture will never stop surpassing itself, but they are keenly aware that only by making innovations in hardware can they achieve deeper breakthroughs.

The question is: how does a software team make hardware?

After a circle of research, they decided to use FPGA (programmable array logic) to test the water first. Compared with the common chips in our computers and mobile phones, FPGA is an application specific integrated circuit (ASIC), which can realize flexible "semi-customized" development.

Compared with the chip, FPGA has a higher fault tolerance rate, but it has a good balance in throughput, delay, power consumption and flexibility. Especially when dealing with massive data, FPGA has obvious ultra-low latency advantage compared with GPU, which is very suitable for specific business scenarios.

Facts have verified this judgment. In 20 15, the team devoted themselves to the research and development of image coding FPGA, which achieved higher compression rate and lower delay than CPU coding and software coding, and also helped QQ photo albums greatly reduce storage costs. They saw the possibility of exploring and deepening in the direction of FPGA.

Around 20 16 years ago, the AI craze triggered by Alpha Go brought FPGA into the mainstream. After the team accelerated the CNN algorithm of the deep learning model through FPGA, the processing performance reached four times that of the general CPU, and the unit cost was only 1/3.

Although FPGA works well, the technical threshold is relatively high. "If FPGA is clouded, is it a solution to expand its application?"

With this expectation, Tencent Cloud launched the first FPGA cloud server in China on 20 17120 October, hoping to extend the FPGA capability to more enterprises by means of cloud computing.

In terms of effect, the enterprise that programs FPGA hardware on the FPGA cloud server can really improve the performance to more than 30 times that of the general CPU server, and only need to pay about 40% of the cost of the general CPU. Take a famous genetic testing company as an example. Traditionally, CPU needs to detect a week's gene sequence, and FPGA can compress it to several hours.

However, cloud-based FPGA has not swept the whole industry as quickly as expected.

On the one hand, after all, FPGA is a kind of "semi-customized" circuit, and many enterprises are still unable to independently develop FPGA and need more upper-level services; On the other hand, the rapid decline in the cost of general-purpose chips also makes the cost-performance advantage of FPGA gradually lose.

The frustration of cloud commercialization poured cold water on the team, which suddenly hit the team's enthusiasm from the peak to the bottom. At the same time, two questions were thrown naked in front of the whole team: How valuable is FPGA to business? Can FPGA continue to do it?

Affected by this, the 20 18 team almost fell apart, and the personnel began to leave. Tencent's first exploration of "core-making" attracted a regrettable comma.

02

The future is bright, Penglai comes out.

After the frustration of FPGA cloud server, Tencent needs to rethink how to go on the road of hardware.

20 18, when the team almost disbanded, China's chip industry ushered in a warm spring: Sino-US trade friction popularized the importance of chips to the whole people, the establishment of science and technology innovation board opened the door for semiconductor companies to go public, and the entry of state funds made the whole country in full swing.

However, for Internet companies, making chips is like doing cloud computing, databases, storage systems and so on. , needs to be supported by specific business scenarios, and cannot be "done for the sake of doing". After an unsuccessful exploration, Tencent had to wait for the opportunity brought by the next real demand.

Time entered 20 19. This is the first year of large-scale application of artificial intelligence, and both internal and external businesses have put forward strong demand for AI chips. AI chip, do you want to do it?

When this question was raised, Tencent's management had a voice of opposition, fearing that technicians were just hot-headed and just chasing hot spots. But at the same time, the management also gave enough gray scale, and there was no explicit prohibition of small team-level exploration.

It has become common sense to test water first in small-scale, low-cost and specific application scenarios.

The cloud architecture platform department will finalize the AI reasoning direction of the first chip and name it "Penglai", hoping that this chip can stand firm in the stormy waves like the overseas fairy mountain in ancient China mythology.

This hardware breakthrough team was also officially named "Penglai Lab".

With the accumulation of FPGA exploration experience, Penglai Laboratory has been quite proficient in hardware programming languages, and has also accumulated some platform-based designs in standard interfaces and buses. However, the research and development requirements of the two are not the same.

If FPGA is to build ready-made building blocks, then making chips is to make building blocks directly from logging. FPGA can be reprogrammed if something goes wrong, but the chip only has one chance to slide. Once something goes wrong, all the efforts will be in vain.

In addition, the resources of FPGA are ready-made and fixed, but the resources of the chip are defined by themselves. In a word, it is "digging": doing the biggest thing with the least resources.

Rick, a chip architecture engineer, described the whole Penglai project from "renovation" to "reconstruction". At first, the team thought it could simply turn the previous FPGA technology into a chip. When I made the discovery, I thought that after all, I just thought that there were not many ——FPGA architectures that could be reused directly in the chip, and the team could only tear down the original architecture as a whole, and the amount of rewritten code was as high as 85%.

For the most important thing like DDR memory, chip manufacturers generally have special verifiers, but Penglai Lab, which has just started, doesn't have this condition and can only catch up on homework. Linda later recalled, "I wish I had 48 hours a day."

June, 5438+October, 2020 10, Penglai Chip Streaming was completed and delivered to Shenzhen by the partner. The COVID-19 epidemic has just broken out nationwide, and the company has started collective telecommuting.

Henry, the project leader, took the courier in gloves, carefully disinfected it with alcohol and took it to an empty office building with windows and fans wide open. In the smell of disinfectant, he and several colleagues started a vital lighting operation.

The so-called lighting is to power on the chip, first look at whether there is short circuit and smoke, and then test some basic functions. Whether it is a chip or a "brick", success or failure is at stake.

As a result, the clock frequency of the chip never came out. You know, the clock frequency is the "metronome" of the chip. Without clock frequency, different modules of the chip are equivalent to not having a good watch and can't work together.

Is it the chip? The experimenter changed a chip, but there was still no signal output.

Change another piece, and there is still no one. The scene was silent.

The experimenter dared not do it. Some people can't help joking. Is it time to go home and change your resume?

But apart from depression, everyone is more confused. Because although the project has few people and resources, it is almost from scratch, but Penglai team from designers to verifiers are confident that every step has been done. What's wrong with it?

In a very dignified atmosphere, they continue to put the board, power on and read the signal. ...

The fourth chip is turned on. The rest of the chips are fine.

The reason is actually very simple. The chip defect rate of 28 nm process is only 3%, but the first three chips tested randomly are all bad, and the small probability events just let them catch up. This made them realize the tension of "having children".

In the applause after a false alarm, Tencent's first chip was announced.

03

By going up one flight of stairs, "Zixiao" Lingyun

After mass production, the actual performance of Penglai chip also lived up to expectations, helping Tencent to launch the first intelligent microscope in China that was allowed to enter the hospital for clinical application, realizing the automatic recognition of medical images, counting the number of cells and directly displaying them in the field of vision. The performance fully meets the design requirements.

This sweeping away the gloom of the FPGA cloud server project shows that Tencent can make a chip that is directly application-oriented and has excellent performance.

Penglai, the terminal chip, only completed the task of 0 to 1. The team can't wait to move from 1 to n to a large-scale cloud chip. Alex, the head of Penglai Laboratory, dubbed the application for the big chip project "A round of financing".

After the initial test, the team needs to explain to the company why it needs to invest more money to make large-scale chips. Can you stay ahead in the short and long term? How to combine with internal and external business to create value?

Tencent's decision this time is much easier to make.

The first is the maturity of Penglai laboratory. By marching and growing at the same time, Penglai Laboratory has completed the transformation again and again, and established a complete, rigorous and standardized chip research and development system and process. This is already a "regular army" with a hard-core gas field.

More importantly, the team proved Tencent's advantages and position as a chip.

Xie Ming explained that from an industry perspective, apart from technology and technology, the biggest difficulty in making chips lies in the definition of chips. The advantage of traditional chip manufacturers lies in the former, but after the chip is made, it will match the demand and lose real performance in many scenarios. The advantage of technology companies such as Google and Tencent is that they are the demand side and have the deepest and most thorough understanding and insight into the demand.

There is no problem in direction, technology and technology. Lu shan, Senior Executive Vice President of Tencent and President of TEG (Technical Engineering Division), gave full support and won more heads and funds through the head office.

With the support of the company's strategy, the team is full of ambitions and moves towards a bigger battlefield. Austin, deputy director of Penglai Laboratory, decided to divide the troops into two ways to promote ai reasoning and video coding and decoding in parallel.

The AI team continues to be the 2.0 version of Penglai "Ziyun". This is the name of the palace where Hong Jun's ancestors lived in the Romance of the Gods. Building "Ziyun" firmly on the solid fairy mountain represents a new ambition:

This time, they directly set the goal as the first in the industry.

All the architectures of Zixiao are built around effective computing power. The team optimized the on-chip cache design, abandoned the GDDR6 memory commonly used in competing products, and adopted the advanced 2.5D packaging technology to seal the HBM2e memory with the AI chip, which improved the memory bandwidth by nearly 40%.

Technology iteration is a thousand miles a day. After Zixiao was founded, the highest performance in the industry was refreshed by competing products. Although Zixiao's design performance is "safe" enough compared with this highest performance, the team intends to continue to overweight.

After research, they added computer vision CV accelerator and video codec accelerator inside the chip, which can greatly reduce the interaction and waiting between AI chip and x86 CPU innovatively.

Even though two complex self-developed modules were added, the team still completed the whole process from architecture determination to verification and flow within the planned 6 months.

On 2021September 10, Zixiao lit up smoothly.

In the application scenarios of image and video processing, natural language processing, search and recommendation, etc. This chip breaks the bottleneck that restricts the display of computing power, and finally its performance in actual business scenarios reaches twice the industry standard.

04

Independent research, "the sea" smiles.

The AI team named their chip "Zixiao", while the video codec was named "Canghai", which is quite a sea and sky color.

Unlike Penglai and Zixiao, which focus on ai, Bohai is a video transcoding chip. If the transcoding problem of QQ photo album pictures is the earliest opportunity for Penglai team to do hardware, then the continuous exploration of video codec team in this direction is only a preliminary response.

The difference is that the application scene of "the sea" has gone far beyond the scope of that year.

When multimedia services evolved from the picture era to the live audio and video era, massive 4K/8K ultra-high-definition digital content constantly impacted the cloud computing infrastructure like a flood. Every additional bit of data will bring corresponding transcoding computing power and CDN bandwidth cost.

This is an intuitive and severe math problem, and the goal of the Bohai team is also very clear, that is, to be the strongest video transcoding chip in the industry and maximize the compression rate.

Fortunately, Tencent's rich multimedia application scenarios and many live interactive head customers covered by Tencent Cloud provide unique analysis and verification conditions for the research and development of Bohai.

The team first launched the core self-developed module of Bohai-hardware video encoder "Yaochi", and decided to give Yaochi a big test before Bohai completed its research and development.

This test is the MSU World Coding and Decoding Competition in 2020, which is hosted by Moscow State University (MSU). It has been the most influential top-level competition in the field of video compression in the world for more than ten years, attracting famous technology companies at home and abroad, including Intel, NVIDIA, Google, Huawei, Ali and Tencent.

Results Yaochi realized the real-time video coding of 1080P@60Hz, and won the first place in the evaluation of SSIM (structural similarity), PSNR (peak signal-to-noise ratio), VMAF (video multi-method evaluation fusion) and other objective indicators, as well as the first place in the subjective evaluation of human eyes, one position ahead of the second place.

After this hard battle, the sea has been fully reviewed technically.

On March 5, 2022, Derick and his video codec team received the chip "Canghai" from Streaming, which coincided with Shenzhen's comprehensive telecommuting due to the epidemic.

They applied for special permission to enter the empty office building. This scene is similar to that when Penglai was lit two years ago.

Unexpectedly, the twists and turns when Penglai was lit also reappeared. Overcoming some accidents in debugging, Tencent's third chip and the first self-developed chip were successfully lit up in cheers.

Turn the sea into a drop in the ocean. Canghai finally achieved the same quality video with smaller data volume and smaller bandwidth, and the compression ratio increased by more than 30% compared with the best performance in the industry.

From Penglai to Zixiao and then to the sea, from 28 nm process to 12 nm process, from 8 people to more than 100 people, from no simulation tools to the formal completion of the "Tian Jian verification platform", from trying to keep up with the pace of partners to making a complete SOC independently.

The two teams joined forces successfully. Penglai team has completed a "core" road evolution.

05

In the era of "100G", Shuang Mu stood tall.

It's not just the cloud architecture platform department that has jumped into the core construction tide.

While multimedia and AI processing are actively seeking changes, the underlying cloud servers are also facing similar problems: when the performance improvement brought by software optimization can't make the products obviously different from competing products, how to make the performance break through the existing ceiling?

In 20 19, Tencent ushered in the milestone of cloud computing business-the scale of cloud servers broke through 1 10,000. Zou Xianneng, vice president of Tencent Cloud and general manager of Tencent's network platform department, keenly observed that with the continuous improvement of server access bandwidth, more and more CPU resources are used by servers for network processing.

Can server network processing be realized at a lower cost while providing higher network performance? Tencent's network platform department also turned its attention to software and hardware collaboration and hardware acceleration.

Faced with such a "both necessary and necessary" challenge, Zou Xianneng decided to make a subtraction for the server: "Unload the burden of network data processing from the CPU".

The idea of "smart network card" was born.

The so-called intelligent network card, on the one hand, shoulders the external network access of servers like ordinary network cards, and realizes the network interconnection between different servers and data centers. On the other hand, it has additional intelligent units such as CPU/FPGA/ memory, which can share some virtualization computing tasks of the server and accelerate the overall network and storage performance of the server.

In other words, what the network platform department needs to do is to install a new server in the network card.

At first, the team hoped to find a ready-made commercial board to reduce the workload.

Hayden, the head of network card hardware, led the scheme demonstration and investigation, but the acceleration engine of commercial chips did not support private protocols, which became the first challenge and the biggest obstacle at that time. Some famous network card equipment vendors shook their heads at Tencent's request:

"Now the function of the network card is very simple, and your requirements are too complicated to achieve."

There are also some straightforward questions: "So many network cards require high reliability. Can you handle them yourself?"

Is the smart network card project aborted at the beginning?

Zou Xianneng pointed out the direction for the team: "Since the smart network card is the key component of the cloud data center to pursue the ultimate performance and cost, if there is no product in the market that meets Tencent's needs, then we will build one ourselves."

After the direction is clear, the route is also clear soon: first, start with the self-developed intelligent network card based on FPGA, and then develop the intelligent network card chip.

In September 2020, Tencent's first-generation FPGA-based self-developed smart network card was officially launched, named Metasequoia, which means that the team hopes that the product can be as adaptable and grow rapidly as this rare tree.

During the epidemic, all kinds of sudden demands hit, and the new Metasequoia glyptostroboides was not challenged.

Hayden recalled that a large customer used UDP audio and video protocol, which was inherently "unreliable" and allowed packet loss. It relies heavily on network throughput and stability, but it needs high concurrency and high quality audio and video transmission results.

Metasequoia smart network card faces difficulties directly. By greatly improving the network performance of the server, it helps customers to complete the 24-hour zero-packet-loss limit stress test, run stably online, and hand over a beautiful answer sheet.

After Metasequoia glyptostroboides was put into use, the research and development of the second-generation intelligent network card "Yinshan" was also carried out in an all-round way, and it was officially launched in June 20021. The network ports of this generation of smart network cards have doubled to 2* 100G.

Supported by another towering tree, Tencent Cloud launched the industry's first self-developed sixth generation 100G cloud server. Its computing performance is improved by 220%, and its storage performance is improved by 100%. Compared with the previous generation, the bandwidth of single-node access network is increased by up to 4 times and the delay is reduced by 50%.

"Two Trees" gained a lot in network hardware unloading, which made the team excited.

When the FPGA route gradually approached the bottleneck of performance and power consumption, the network platform department decided to take the initiative in its own hands again. Tencent's fourth chip, the first smart network card chip, was born, and it also has a fairy-like name-"Ling Xuan".

06

Ling Xuan seems to be like this at first glance, but the core problem is not finished.

According to the plan, this 7-nanometer process chip will be rolled out by the end of 2022.

Hayden was ordered to quickly set up a chip research and development team in Ling Xuan, constantly challenging many "impossible tasks".

From the performance index, the number of devices supported by Ling Xuan will increase to more than 10K, which is 6 times higher than that of commercial chips. At the same time, its performance can be improved by 4 times compared with commercial chips. By offloading the virtualization, network/storage IO and other functions originally running on the host CPU to the chip, the zero occupation of the host CPU can be realized.

This short and pithy chip fully explains the ultimate performance "mystery" for the future and the flexible acceleration "spirit" for various business needs.

At present, the Ling Xuan project is in full swing to verify and test the smart network card before streaming media, so as to build the next generation high-performance network infrastructure of Tencent Cloud.

Penglai Lab's ai reasoning chip Zixiao and video transcoding chip Bohai will be mass-produced and deeply integrated with Tencent's business;

Some new chip projects are brewing and growing. We will continue to explore the necessary technical direction and enrich this "classic of mountains and seas".

The new challenges faced by Tencent's massive business and the inevitable requirements of the rapid development of cloud computing have forced Tencent to embark on this core-building road. These chips based on business requirements will definitely go deep into practical applications to prove their value.

"We are not out of thin air, patting our heads to make chips. We knew from the beginning that Tencent's demand was big enough for us to do it. " Lu Shan said.

Starting from 20 10, Tencent began to open its digital technology and connectivity to the outside world in the form of cloud services, and rushed to the era of digital transformation and upgrading of this industry. After Tencent entered the game, it was seen that the deep integration of digital and reality is leading the technological trend of the all-true Internet.

In addition to Tencent, China's technology companies are advancing into innovative deep-water areas, and efforts to break through the bottleneck are becoming more and more important. Whether it is digital integration or upstream innovation, there are hundreds of battles in the sea of hard technology, all of which are in the tide of history.

Being involved in this tide, Tencent's core business will inevitably be echoed in the sea of stars.