Indexer: A New Paradigm for Web3 Data Access and Analysis of Mainstream Projects

2025-08-05 10:22:58

The Evolution of Data Access in Web3: Overview of Indexers and Related Projects

In blockchain technology, data plays a crucial role and is the foundation for developing decentralized applications. While the current discussion mainly focuses on data availability, data accessibility is equally important but often overlooked.

In the era of modular blockchain, data availability solutions have become indispensable. They ensure that all participants can access transaction data, enabling real-time verification and maintaining network integrity. However, the data availability layer functions more like a billboard than a database; data is not stored indefinitely but is deleted over time.

In contrast, data accessibility focuses on the ability to retrieve historical data, which is crucial for developing decentralized applications and conducting blockchain analysis. Although less discussed, data accessibility is equally important as data availability. The two play different but complementary roles in the blockchain ecosystem, and a comprehensive data management approach must address both issues simultaneously to support robust and efficient blockchain applications.

Since its inception, blockchain has fundamentally transformed infrastructure and driven the creation of decentralized applications in areas such as gaming, finance, and social networks. However, building these applications requires access to a large amount of blockchain data, which is both difficult and costly.

For developers, one option is to host and run their own archive RPC nodes. These nodes store all historical blockchain data from the beginning, allowing for full access to the data. However, maintaining archive nodes is costly, and their query capabilities are limited. Running cheaper nodes is another option, but these nodes have limited data retrieval capabilities, which may hinder the normal operation of applications.

Another approach is to use commercial RPC node providers. These providers are responsible for the costs and management of the nodes and provide data via RPC endpoints. Public RPC endpoints are free but come with rate limits, which may negatively impact the user experience of applications. Private RPC endpoints offer better performance by reducing congestion, but even simple data retrieval requires a lot of back-and-forth communication, leading to inefficiencies. Additionally, private RPC endpoints are often difficult to scale and lack compatibility across different networks.

Blockchain indexers play a critical role in organizing chain data and sending it to databases for easier querying, hence they are referred to as the "search engine of the blockchain." They work by indexing blockchain data and making it readily available through a SQL-like query language. By providing a unified query interface, indexers allow developers to quickly and accurately retrieve the information they need using standardized query languages, greatly simplifying the process.

Different types of indexers optimize data retrieval in various ways:

Full Node Indexer: Extracts data directly from a complete blockchain node, ensuring data completeness and accuracy, but requires significant storage and processing power.
Lightweight Indexer: Relies on full nodes to retrieve specific data as needed, reducing storage requirements but potentially increasing query time.
Dedicated Indexer: Optimized retrieval for specific types of data or certain blockchains, such as NFT data or DeFi transactions.
Aggregated Indexer: Extracts data from multiple blockchains and sources, including off-chain information, providing a unified query interface, particularly useful for multi-chain applications.

Ethereum alone requires 3TB of storage space, and as the blockchain continues to grow, the data storage of archive nodes will also continue to increase. The indexer protocol deploys multiple indexers, which can efficiently index and quickly query large amounts of data, something that RPC cannot achieve.

The indexer also allows for complex queries, easy data filtering based on different criteria, and post-extraction data analysis. Some indexers also allow for the aggregation of data from multiple sources, avoiding the need to deploy multiple APIs in multi-chain applications. By being distributed across multiple nodes, indexers provide enhanced security and performance, whereas RPC providers may experience interruptions and downtimes due to their centralized nature.

Overall, compared to RPC node providers, indexers improve the efficiency and reliability of data retrieval while also reducing the cost of deploying a single node. This makes blockchain indexer protocols the preferred choice for application developers.

Building decentralized applications requires retrieving and reading blockchain data to operate their services. This includes any type of application, including DeFi, NFT platforms, games, and even social networks, as these platforms need to read data first in order to execute other transactions.

DeFi protocols require different information to provide users with specific prices, ratios, fees, and more. Automated market makers need price and liquidity information from liquidity pools to calculate swap rates, while lending protocols need utilization rates to determine borrowing rates and the liquidation debt ratio. It is essential to input this information into their applications before calculating the rates executed by users.

GameFi requires fast indexing and access to data to ensure that users can play games smoothly. Only through lightning-fast data retrieval and execution can Web3 games compete in performance with Web2 games, thereby attracting more users. These games need data such as land ownership, in-game token balances, and in-game actions. By using indexers, they can better ensure stable data flow and stable uptime to guarantee a perfect gaming experience.

NFT markets and lending platforms need to index data to access various information, such as NFT metadata, ownership and transfer data, royalty information, etc. Quickly indexing such data can avoid browsing through each NFT one by one to find ownership or NFT attribute data.

Whether it's a DeFi automated market maker that requires price and liquidity information, or a social application that needs to update posts from new users, the ability to quickly retrieve data is crucial for the normal functioning of the application. With the help of an indexer, they can efficiently and accurately retrieve data, thereby providing a smooth user experience.

The indexer provides a way to extract specific data from raw blockchain data. This offers the opportunity for more specific data analysis, thereby providing comprehensive insights.

For example, perpetual trading protocols can identify which tokens have high trading volumes and which tokens incur fees, thus deciding whether to list these tokens as perpetual contracts on their platform. Developers of decentralized exchanges can create dashboards for their products to gain insights into which liquidity pools offer the highest returns or strongest liquidity. They can also create public dashboards, allowing developers to freely and flexibly query any type of data they wish to display on the charts.

As there are multiple blockchain indexers available, identifying the differences between indexing protocols is crucial to ensure that developers choose the indexer that best fits their needs.

The Graph is the first indexing protocol launched on Ethereum, allowing easy access to previously hard-to-reach transaction data. It uses subgraphs to define and filter subsets of data collected from the blockchain. Using index proofs, indexers stake native tokens for indexing and query services, and delegators can choose to stake their tokens here. Curators can access high-quality subgraphs to help indexers determine which subgraphs to curate data for in order to earn the best query fees.

Its infrastructure brings the average cost of every million queries to $40, which is much lower than the cost of self-hosted nodes. Using file data sources, it also supports parallel indexing of both on-chain and off-chain data for efficient data retrieval.

The rewards for The Graph's indexers have steadily increased over the past few quarters. This is partly due to the increase in query volume, but also attributed to the rise in token prices, as they plan to integrate AI-assisted queries in the future.

Subsquid is a peer-to-peer, horizontally scalable decentralized data lake that efficiently aggregates large amounts of on-chain and off-chain data while being protected by zero-knowledge proofs. As a decentralized worker network, each node is responsible for storing data from a specific subset of blocks, speeding up the data retrieval process by quickly identifying the nodes that hold the required data.

Subsquid also supports real-time indexing, allowing indexing before the block is finalized. It supports storing data in a format chosen by the developer, making it easier to analyze using various tools. Additionally, subgraphs can be deployed on the Subsquid network without migrating to the Squid SDK, enabling no-code deployment.

Despite still being in the testnet phase, Subsquid has achieved impressive statistics, with over 80,000 testnet users, more than 60,000 Squid indexers deployed, and over 20,000 verified developers on the network. On June 3, Subsquid launched the mainnet of its data lake.

In addition to indexing, the Subsquid Network data lake can also replace RPC in use cases such as analysis, ZK/TEE co-processors, AI agents, and Oracles.

SubQuery is a decentralized middleware infrastructure network that provides RPC and indexing data services. It initially supported the Polkadot and Substrate networks and has now expanded to include over 200 chains. Its operation is similar to The Graph, which uses indexing proofs, where indexers index data and provide query requests, and delegators stake their shares to indexers. However, it introduces consumers to submit purchase orders to ensure that indexers' income is guaranteed, rather than relying on managers.

It will introduce SubQuery data nodes that support sharding to prevent continuous synchronization of new data between each node, thereby optimizing query efficiency and moving towards greater decentralization. Users can choose to pay a computational fee of approximately 1 SQT token for every 1000 requests, or set custom fees for indexers through the protocol.

Although SubQuery launched its token earlier this year, the issuance rewards for nodes and delegators have also increased in USD value on a month-over-month basis, which also represents a continuous increase in the number of query services offered on its platform. Since the token generation event, the total amount of staked SQT has increased from 6 million to 125 million, highlighting the growth in network participation.

Covalent is a decentralized indexing network, where blockchain data copies are created by a network of block sample producers through batch exports and published as proof on the Covalent L1 blockchain. This data is then refined by block result producer nodes according to set rules to filter out the data that meets the requirements.

Through a unified API, developers can easily extract relevant blockchain data in a consistent request and response format without having to write custom complex queries to access the data. The CQT token, which can be settled on a specific blockchain, can be used as a payment method to retrieve these pre-configured datasets from network operators.

The rewards from Covalent seem to show an overall upward trend from the first quarter of 2023 to the first quarter of 2024, partly due to the rise in the price of the Covalent token CQT.

When choosing an indexer, the following factors need to be considered:

Customizability of Data: Some indexers are generic indexers that provide standard pre-configured datasets via API. While they may be fast, they lack the flexibility needed for developers who require custom datasets. Using an indexer framework allows for more customized data processing to meet specific application needs.

Security: Indexed data must be secure; otherwise, applications built on these indexers are also vulnerable to attacks. Although all indexers adopt some form of security through staking tokens, other indexer solutions may use proofs to further enhance security.

Speed and Scalability: As the blockchain continues to grow, the volume of transactions increases, making it more cumbersome to index large amounts of data. Maintaining efficiency becomes more challenging, but indexing protocols introduce solutions to meet these growing demands.

Supported Networks: Although most blockchain activities are still conducted on Ethereum, various blockchains are gaining popularity over time. Supporting certain chains that are not supported by other indexing protocols can capture more market share fees.

Despite the widespread adoption of indexers in decentralized application development, their potential remains immense, especially in the context of integrating artificial intelligence. As AI continues to proliferate in Web2 and Web3, its improvement capabilities depend on access to relevant data.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

7 Likes