Community post originally published on Second State’s blog by Sam Liu, Second State Engineer, CNCF’s WasmEdge Maintainer and Miley Fu, CNCF Ambassador, DevRel at WasmEdge

This is a talk at the track “The Programming Languages Shaping the Future of Software Development” at QCon 2023 Beijing on Sept 6th, 2023. The session aims to address the challenges faced by the current mainstream Python and Docker approach in building infrastructure for large language model(LLM) applications. It introduced the audience to the advantages of the Rust + WebAssembly approach, emphasizing its potential in addressing the performance, security, and efficiency concerns associated with the traditional approach.

Throughout the session, Sam shared insights from practical projects, showcasing the real-world applications of Rust and WebAssembly in constructing robust AI infrastructures. The talk was enriched with references, code snippets, and visual aids to provide a comprehensive understanding of the topic.

Introduction

In the ever-evolving world of technology, the application driven by large language models, commonly referred to as “LLM application”, has become a driving force behind technological innovations across various industries. As such applications gain traction, the massive influx of user demands poses new challenges in terms of performance, security, and reliability of the underlying infrastructure.

Python and Docker have long been the mainstream choice for building machine learning applications. However, when it comes to building infrastructure for Large Language Model (LLM) applications, some of the drawbacks of this combination become more serious, such as Python’s performance issues and Docker’s cold start problems. In this talk, we will focus on the main scenario of building infrastructure for LLM ecosystems, and take a closer look at the problems with the Python and Docker combination, and more importantly, why Rust + WebAssembly (WASM) is superior to Python + Docker. Finally, we will demonstrate how to build a Code Review Bot on the flows.network [1] platform.

The Current Landscape: Python + Docker Approach

In the field of machine learning, Python is almost the king, mainly due to the following three characteristics:

As one of the most popular container management tools today, Docker containers provide great convenience for application deployment:

For the development and deployment of traditional machine learning applications, the Python + Docker mode has demonstrated its advantages. In the construction of infrastructure for LLM ecosystems, however, it faces challenges.

Challenges with Python + Docker

Things often have two sides. The advantages of Python and Docker also naturally bring some shortcomings. However, in the process of building infrastructure for LLM ecosystems, these shortcomings become more prominent and become key obstacles. Let us see the issues Python has first.

Disadvantages of Python

Speedups from performance engineering a program that multiplies two 4096-by-4096 matrices
Fig.1 Speedups from performance engineering a program that multiplies two 4096-by-4096 matrices.

Mixed programming: Python + C/C++/Rust

To improve the performance issues of the Python language itself, a common approach is to use Python as a front-end language responsible for interacting with users, while selecting a high-performance programming language such as C/C++/Rust as a back-end language to handle heavy computing tasks. Many well-known libraries in the Python ecosystem use this approach to meet the demand for high-performance computing, such as Numpy. However, this mixed programming approach inevitably requires additional tools (or libraries) as a bridge to “connect” the two different programming languages. Consequently, this process can introduce new problems.

C++ and Python

Limitations of Docker Containers

These limitations highlight the need for alternative solutions, like the Rust + WebAssembly, which promise to address some of these pain points and offer a more efficient and secure environment for deploying LLM applications.

AGI will be built in Rust, and WebAssembly

Why Rust and WebAssembly can be the language of AGI?

Screenshot showing Bojan Tunguz tweets "AGI will be built with Python. Let that sink in". Elon Musk tweets "Rust"

Rust: The Optimal Choice for the AGI Era

WASM Container: faster, lighter and safer

Shivraj Jadhav compares Docker container and WASM in multiple dimensions[4].

Table WASM vs Docker
Table.1 WASM vs. Docker.

WASI-NN Standard

Besides the advantages mentioned above, the WASI-NN standard of WebAssembly for machine learning applications is also an significant factor.

Use Case: Agent for Code Review

In this section, we will demonstrate how to use the flows.network platform to build an agent for code review. Before diving into the specific example, let’s first see the concept model of Agent and the flows.network platform.

Concept Model of Agent

This is a conceptual framework of an LLM-powered AI Agent raised by Lilian Weng[5].

Overview of LLM-powered autonomous agent system
Fig.3 Overview of LLM-powered autonomous agent system

In this model, LLM functions play the role of the agent’s brain, responsible for core reasoning and decision-making, but it still needs additional modules to enable key capabilities: planning, long/short-term memory, and tool use.

The flows.network platform is built based on the similar idea to Lilian’s model. Fig.4 shows its major components. The entire platform is writen in Rust, compiled to wasm modules, and running on WasmEdge Runtime.

The major components of flows.network
Fig.4 The major components of flows.network

Agent for Code Review

On the flows.network platform, we provid an agent for helping maintainers of open-source projects on GitHub review PRs. We name it Code Review Bot.

The abstract design of the agent is presented in Fig.5. The red block code-review-function in the center of the diagram defines the core agent functions, while each dashed circle surrounding the red block matches the counterpart directly conneted to the agent block in Fig.3.

Abstract Design of Code Review Bot
Fig.5 Abstract Design of Code Review Bot

Fig.6 depicts the architecture of Code Review Bot. Except for the external resources, such as GitHub Service, the agent consists of wasm modules and runs on WasmEdge Runtime. Integration wasm modules are reponsible for connecting WebAssembly functions to external resources via Web APIs. For example, the code-review-function wasm module extract the code in review into prompts, then the openai-integration wasm module sends prompts to the ChatGPT service and waits for the response; finally, sends the comments to the code-review-function wasm module.

Architecture of Code Review Bot
Fig.6 Architecture of Code Review Bot

Fig.7 shows an example of a PR review summary by Code Review Bot. It summarizes the target PR, lists the hidden risks and major changes, and etc. These information would help reviewers put their focuses on the vital parts and save their time.

Example of PR review summary by Code Review Bot
Fig.7 Example of PR review summary by Code Review Bot

The Code Review Bot can be deployed in minutes. If you would like to use it in your projects, this guide can help you.

Conclusion

In the realm of AI infrastructure development, while Python and Docker have served us well, it’s essential to explore and adopt newer technologies that promise better performance, security, and efficiency. The combination of Rust and WebAssembly is a testament to this evolution, offering a compelling alternative for developers and organizations alike.

This article provides a comprehensive overview of the talk by Sam Liu on the topic of Rust + WebAssembly for building large model ecosystems. For a deeper dive and to explore the practical projects in detail, readers are encouraged to join the WasmEdge discord.