Opportunities and Challenges of AI and Web3 Integration: A Full-Stack Revolution from Data to Computing Power

AI+Web3: Towers and Squares

TL;DR

  1. Web3 projects with AI concepts have become targets for capital attraction in the primary and secondary markets.

  2. The opportunities of Web3 in the AI industry are reflected in: using distributed incentives to coordinate potential supply in the long tail, across data, storage, and computing; while establishing open-source models and a decentralized market for AI Agents.

  3. AI is mainly used in the Web3 industry for on-chain finance (crypto payments, trading, data analysis) and assisting in development.

  4. The utility of AI+Web3 is reflected in the complementarity of the two: Web3 is expected to counteract the centralization of AI, while AI is expected to help Web3 break boundaries.

AI+Web3: Towers and Squares

Introduction

In the past two years, the development of AI has been accelerated. This wave initiated by ChatGPT has not only opened up a new world of generative artificial intelligence but has also caused a huge stir in the Web3 space.

With the support of AI concepts, the financing in the cryptocurrency market has significantly boosted compared to the slowdown. In the first half of 2024 alone, 64 Web3+AI projects completed financing, and the AI-based operating system Zyber365 achieved a maximum financing amount of $100 million in its Series A.

The secondary market is more prosperous. According to Coingecko data, in just over a year, the total market value of the AI sector has reached $48.5 billion, with a 24-hour trading volume nearing $8.6 billion. The benefits brought by mainstream AI technology advancements are evident; after the release of OpenAI's Sora text-to-video model, the average price of the AI sector rose by 151%. The AI effect is also radiating to one of the cryptocurrency capital-raising sectors, Meme: the first AI Agent concept MemeCoin—GOAT has quickly gained popularity and achieved a valuation of $1.4 billion, successfully sparking the AI Meme craze.

Research and discussions about AI+Web3 are equally heated, from AI+Depin to AI Memecoin and now to AI Agent and AI DAO. The FOMO sentiment can no longer keep up with the speed of the new narrative rotation.

AI+Web3, this combination of terms filled with hot money, opportunities, and future fantasies, is inevitably seen as a marriage arranged by capital. It seems difficult for us to discern beneath this magnificent robe, whether it is the playground of speculators or the eve of a dawn explosion?

To answer this question, a key consideration for both parties is whether it will become better with each other? Can they benefit from each other's models? In this article, we attempt to examine this pattern by standing on the shoulders of our predecessors: how Web3 can play a role at various stages of the AI technology stack, and what new vitality AI can bring to Web3?

Part.1 What Opportunities Does Web3 Have Under the AI Stack?

Before diving into this topic, we need to understand the technology stack of AI large models:

Using simpler language to express the whole process: the "large model" is like the human brain. In the early stages, this brain belongs to a newborn baby who has just entered the world and needs to observe and absorb vast amounts of information from the surrounding environment to understand this world, which is the "data collection" phase. Since computers do not possess multiple human senses such as vision and hearing, before training, the large-scale unlabelled information from the outside world needs to be converted into a format that computers can understand and utilize through "preprocessing."

After inputting data, AI builds a model with understanding and predictive capabilities through "training", which can be seen as the process of a baby gradually understanding and learning about the outside world. The model's parameters are like the language abilities that the baby adjusts continuously during the learning process. When the content of learning begins to be categorized, or when feedback from communication with people is received and corrections are made, it enters the "fine-tuning" phase of the large model.

As children gradually grow up and learn to speak, they can understand meanings and express their feelings and thoughts in new conversations. This stage is similar to the "reasoning" of large AI models, where the model can predict and analyze new language and text inputs. Infants express feelings, describe objects, and solve various problems through language skills, which is also akin to how large AI models apply reasoning to various specific tasks after completing training and being put into use, such as image classification, speech recognition, and so on.

The AI Agent is moving closer to the next form of large models - capable of independently executing tasks and pursuing complex goals, not only possessing thinking ability but also memory, planning, and the ability to use tools to interact with the world.

Currently, in response to the pain points of AI across various stacks, Web3 has initially formed a multi-layered, interconnected ecosystem that encompasses all stages of the AI model process.

AI+Web3: Towers and Squares

1. Basic Layer: Computing Power and Data's Airbnb

Hash Rate

Currently, one of the highest costs of AI is the computing power and energy required for training and inference models.

An example is that Meta's LLAMA3 requires 16,000 H100 GPUs produced by NVIDIA (which is a top graphics processing unit designed specifically for artificial intelligence and high-performance computing workloads) to complete training in 30 days. The unit price of the latter's 80GB version is between 30,000 and 40,000 dollars, which requires an investment of 400 to 700 million dollars in computing hardware (GPU + network chips), while the monthly training consumes 1.6 billion kilowatt-hours, with energy expenditures nearing 20 million dollars per month.

The release of AI computing power is indeed one of the earliest intersections of Web3 and AI—DePin (Decentralized Physical Infrastructure Network). Currently, the DePin Ninja data website has listed over 1,400 projects, among which representative projects for GPU computing power sharing include io.net, Aethir, Akash, Render Network, and more.

The main logic is that the platform allows individuals or entities with idle GPU resources to contribute their computing power in a permissionless decentralized manner. By creating an online marketplace for buyers and sellers similar to Uber or Airbnb, it improves the utilization rate of underutilized GPU resources, and end users can thus obtain more cost-effective efficient computing resources. At the same time, the staking mechanism ensures that if there are violations of quality control mechanisms or network interruptions, resource providers face corresponding penalties.

Its characteristics are:

  • Gathering idle GPU resources: The suppliers are mainly independent small and medium-sized data centers, surplus computing power resources from operators such as cryptocurrency mining farms, and mining hardware with PoS consensus mechanisms, such as FileCoin and ETH miners. Currently, there are also projects dedicated to launching devices with lower entry barriers, such as exolab using local devices like MacBook, iPhone, and iPad to establish a computing network for running large model inference.

  • Facing the long tail market of AI computing power:

a. "From a technical perspective," the decentralized computing power market is more suitable for inference steps. Training relies more on the data processing capabilities brought by ultra-large cluster scale GPUs, while inference has relatively lower requirements for GPU computing performance, such as Aethir focusing on low-latency rendering tasks and AI inference applications.

b. "From the demand side perspective," small and medium computing power demanders will not train their own large models individually, but will only choose to optimize and fine-tune around a few leading large models, and these scenarios are inherently suitable for distributed idle computing power resources.

  • Decentralized ownership: The technological significance of blockchain lies in the fact that resource owners always retain control over their resources, allowing them to adjust flexibly according to demand while also generating profits.

Data

Data is the foundation of AI. Without data, computation is as useless as floating duckweed, and the relationship between data and models is like the saying "Garbage in, Garbage out"; the quantity of data and the quality of input determine the quality of the final model's output. For the training of current AI models, data determines the model's language ability, understanding ability, and even values and human-like performance. Currently, the challenges of AI data requirements mainly focus on the following four aspects:

  • Data hunger: AI model training relies on a large amount of data input. Public information shows that OpenAI trained GPT-4 with a parameter count reaching the trillion level.

  • Data Quality: With the integration of AI and various industries, the timeliness of data, diversity of data, professionalism of vertical data, and the incorporation of emerging data sources such as social media sentiment have also raised new requirements for its quality.

  • Privacy and compliance issues: Currently, various countries and enterprises are gradually recognizing the importance of high-quality datasets and are imposing restrictions on dataset scraping.

  • High data processing costs: Large data volumes and complex processing. Public information shows that over 30% of AI companies' R&D costs are spent on basic data collection and processing.

Currently, web3 solutions are reflected in the following four aspects:

  1. Data Collection: The availability of free, scraped real-world data is rapidly dwindling, and AI companies' spending on data is increasing year by year. However, this expenditure has not flowed back to the actual contributors of the data; instead, the platforms are enjoying all the value creation brought by the data.

The vision of Web3 is to allow users who genuinely contribute to also participate in the value creation brought by data, and to obtain more private and valuable data from users in a low-cost manner through distributed networks and incentive mechanisms.

  • Grass is a decentralized data layer and network where users can run Grass nodes, contribute idle bandwidth and relay traffic to capture real-time data from the entire internet, and earn token rewards;

  • Vana introduces a unique Data Liquidity Pool (DLP) concept, allowing users to upload their private data (such as shopping records, browsing habits, social media activities, etc.) to a specific DLP and flexibly choose whether to authorize specific third parties to use this data;

  • In PublicAI, users can use #AI或#Web3 as a classification tag on X and @PublicAI to achieve data collection.

  1. Data Preprocessing: In the AI data processing workflow, the collected data is often noisy and contains errors, so it must be cleaned and transformed into a usable format before training the model, involving standardization, filtering, and handling missing values in a repetitive manner. This phase is one of the few manual steps in the AI industry, leading to the emergence of the data labeling profession. As the models' requirements for data quality increase, the threshold for data labelers has also risen, and this task is inherently suited for the decentralized incentive mechanism of Web3.
  • Currently, Grass and OpenLayer are both considering adding data labeling as a key component.

  • Synesis proposed the concept of "Train2earn", emphasizing data quality. Users can earn rewards by providing labeled data, annotations, or other forms of input.

  • The data labeling project Sapien gamifies the labeling tasks and allows users to stake points to earn more points.

  1. Data Privacy and Security: It is important to clarify that data privacy and security are two different concepts. Data privacy involves the handling of sensitive data, while data security protects data information from unauthorized access, destruction, and theft. Thus, the advantages of Web3 privacy technologies and potential application scenarios are reflected in two aspects: (1) training of sensitive data; (2) data collaboration: multiple data owners can jointly participate in AI training without having to share their original data.

Current commonly used privacy technologies in Web3 include:

  • Trusted Execution Environment ( TEE ), such as Super Protocol;

  • Fully Homomorphic Encryption (FHE), such as BasedAI, Fhenix.io or Inco Network;

  • Zero-knowledge technology (zk), such as Reclaim Protocol using zkTLS technology, generates zero-knowledge proofs of HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.

However, the field is still in its early stages, and most projects are still under exploration. One current dilemma is that the computing costs are too high, some examples are:

  • The zkML framework EZKL takes about 80 minutes to generate a proof for a 1M-nanoGPT model.

  • According to data from Modulus Labs, the overhead of zkML is over 1000 times higher than pure computation.

  1. Data Storage: Once the data is available, a place is needed to store the data on-chain, as well as the LLM generated using that data. With data availability (DA) as a core issue, the throughput was 0.08MB before the Ethereum Danksharding upgrade. Meanwhile, training AI models and real-time inference typically require a data throughput of 50 to 100GB per second. This magnitude of difference leaves existing on-chain solutions struggling when faced with "resource-intensive AI applications."
  • 0g.AI is a representative project in this category. It is a centralized storage solution designed for high-performance AI needs, and its concern
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 6
  • Repost
  • Share
Comment
0/400
FromMinerToFarmervip
· 14m ago
Mining has lost so much money, it's better to go farming.
View OriginalReply0
RektCoastervip
· 21h ago
Stop talking about concepts... Let's first create a practical application.
View OriginalReply0
degenonymousvip
· 21h ago
Only the suckers who have been played can talk about ideals.
View OriginalReply0
not_your_keysvip
· 21h ago
Again stirring up old leftovers, Be Played for Suckers.
View OriginalReply0
MidsommarWalletvip
· 21h ago
Looking forward to seeing AI take Web3 to new heights.
View OriginalReply0
ConfusedWhalevip
· 22h ago
I don't understand what is being said.
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)