Pagination Strategies for Large Datasets and Why Offset Pagination Fails
Offset pagination — the pattern where a consumer requests a page by specifying how many records to skip — is the default choice for most APIs because it maps naturally to SQL’s LIMIT and OFFSET clauses and allows consumers to request any page directly by number. It is also the pagination strategy that fails most visibly at scale and produces the most confusing behavior when underlying data changes between page requests.
The failure modes of offset pagination are well understood and predictable. At large offsets, the database must scan and discard millions of records to find the starting position of the requested page — an operation whose cost grows linearly with the page number regardless of page size. An API that performs acceptably when a consumer requests page 1 may time out when the same consumer requests page 10,000 of the same dataset.
The Consistency Problem
Offset pagination produces inconsistent results when the underlying dataset changes between page requests. If a consumer fetches page 1 of a result set and a new record is inserted at the beginning of the result ordering before page 2 is fetched, the record that was at position 10 on page 1 will also appear at position 1 on page 2. The consumer receives duplicate records across pages without any indication that duplication has occurred.
The inverse problem — records disappearing between pages — occurs when records are deleted. A record at position 15 that was on page 2 at the time of the page 1 request may no longer exist when page 2 is fetched, causing the consumer to skip a record that existed when the traversal began.
For read-heavy, relatively static datasets where the consistency problem is rare or tolerable, offset pagination’s simplicity may be acceptable. For datasets with frequent writes or where consumers are processing all records and cannot tolerate gaps or duplicates, offset pagination is a poor choice regardless of its implementation simplicity.
Cursor-Based Pagination
Cursor pagination addresses both the performance and consistency problems of offset pagination. Instead of specifying a position by offset, the consumer specifies a cursor — an opaque value that encodes the position in the result set relative to a specific record. The most common cursor implementation encodes the sort key value of the last record on the previous page, allowing the database to seek directly to the starting position of the next page using an index rather than scanning and discarding rows.
A cursor-based query that uses the record’s ID or timestamp as the seek key executes in O(log n) time regardless of how many pages have been fetched — the same performance as a single-record lookup. This makes cursor pagination uniformly performant across the entire dataset rather than progressively slower for deeper pages.
The consistency guarantee is stronger than offset pagination’s: a cursor encodes a position relative to a specific record, so insertions before the current position do not cause that record to be shifted into a previously returned page. The cursor traversal is stable relative to the records that existed when it began.
The limitation is that cursor pagination does not support random access — a consumer cannot jump directly to page 47 without traversing pages 1 through 46. This limitation is acceptable for most API use cases, where consumers process pages sequentially, and is a real constraint for use cases that require non-sequential access.
Keyset Pagination
Keyset pagination is a specific implementation of cursor pagination where the cursor is the value of the sort key column rather than an encoded position. A consumer requests records with created_at > 2026-01-15T10:30:00Z rather than specifying a page number or an opaque cursor. The query uses the index on created_at directly. Performance is consistent across all pages.
The advantage over opaque cursors is transparency — the consumer can construct the next page request without the API providing a cursor value, because the sort key value is present in the last record of each page. The disadvantage is that the pagination is tied to the sort key schema, which is an implementation detail that the consumer should not need to depend on.
The Right Choice by Use Case
Small, static, user-navigable datasets where random access to arbitrary page numbers is valuable: offset pagination is acceptable. The failure modes at scale do not apply to small datasets and the user experience benefit of direct page navigation is real.
Large, dynamic datasets where consumers process records sequentially — data exports, feed readers, event log processing: cursor or keyset pagination. The performance and consistency properties are essential at this scale and the random access limitation does not matter for sequential processing.
Any dataset large enough that deep offset queries would be slow: cursor or keyset pagination, regardless of whether random access is needed. The performance cliff of offset pagination at large offsets is a production reliability issue that the cursor approach prevents by design.