Introduce Pagination

Updated:

 Note: This is a preview release and subject to change. Feedback welcome! Contact Information and Background (PDF)

also known as: Slice Response Message, Paginate Responses

Context and Motivation

An API operation returns a large sequence of data elements. For example, such a sequence may enumerate posts in a social media site or list products in an e-commerce shop. The API clients are interested in all data elements in the sequence, but have reported that processing a large amount of data at once is challenging for them.

As the API provider, I want to return data sets in manageable chunks so that clients are not overwhelmed by huge amounts of data.

Stakeholder Concerns (including Quality Attributes and Design Forces)

#performance
Transferring all data elements at once can lead to huge response messages that burden receiving clients and the underlying infrastructure (i.e., network and application frameworks as well as the database) with high workload. For instance, single page applications that receive several megabytes of JSON might freeze until the data has been decoded.
#data-access-characteristics
In principle, the client wants to access all data elements, but not all of these have to be received at once or every time. For example, older posts to a social media site are less relevant than recent ones and can be retrieved separately.

Initial Position Sketch

The API provider currently returns a large sequence of data elements in the response messages of the operation1:

This refactoring targets an API operation and its request and response messages.

Smells / Drivers

High latency/poor response time
Responses take a long time to arrive at the client because a lot of data has to be assembled/transmitted. This might become evident in a provider-side log file analysis or from client-side metrics.

Instructions (Steps)

  1. Decide for a variant of Pagination that best fits your API, Page-Based, Offset-Based, Cursor-Based or Time-Based Pagination. Clients request the data differently in these variants, so the first step is to choose one. See the Pagination pattern for details on the variants and their pros and cons.
  2. All variants involve certain metadata, so if the current response message is a direct representation of the underlying domain model elements, possibly contained in a list, wrap the structure in a Data Transfer Object (DTO) first by applying the Introduce Data Transfer Object refactoring.
  3. Add additional response attributes to the DTO (or specification) to hold the metadata required for Pagination (for instance, page size, page number and total number of pages for the Page-Based pattern variant).
  4. Adjust the expected parameters in the request message to give the client control over the amount of results returned. Provide default values so that existing clients will continue to work.
  5. Enhance the unit and integration tests to check for these additional attributes.
  6. Update API description, sample code, tutorials, etc. with the information about Pagination (for instance, variant, metadata syntax and semantics, session management concerns). Increase the version number as suggested under Semantic Versioning.

Target Solution Sketch (Evolution Outline)

After the refactoring, the client indicates the desired amount and position of data in their request messages (depending on the Pagination variant). In the following figure, this metadata — the amount of elements, offset (desired first data element, that is), and so on — is represented by the Metadata Element.

More but smaller messages are exchanged after the refactoring has been applied.

Example(s)

In this example, we will add Offset-Based Pagination to the Customer Core service of the Lakeside Mutual sample application. The customers endpoint in this service returns a list of customer representations:

$ curl http://localhost:8110/customers
 [ {
  "customerId" : "bunlo9vk5f",
  "firstname" : "Ado",
  "lastname" : "Kinnett",
  ...
}, {
  "customerId" : "bd91pwfepl",
  "firstname" : "Bel",
  "lastname" : "Pifford",
  ...

Note that the response is a JSON array of objects. To transmit the Pagination metadata, we first wrap the response in a JSON object, with a customers property to hold the entities:

$ curl http://localhost:8110/customers
{
  "customers" : [ {
    "customerId" : "bunlo9vk5f",
    "firstname" : "Ado",
    "lastname" : "Kinnett",
    ...
  }, {
    "customerId" : "bd91pwfepl",
    "firstname" : "Bel",
    "lastname" : "Pifford",
    ...

Unfortunately, this makes the response backwards incompatible. This is the reason why API guidelines (e.g. from Zalando) recommend to always return an object as the top-level data structure.

With the basic structure in place, we may now add HTTP query parameters (limit, offset) and return the Pagination metadata (limit, offset, size) in our response. Here is a request for the next chunk of elements (including the JSON response to it):

$ curl http://localhost:8110/customers?limit=2&offset=2
{
  "limit" : 2,
  "offset" : 2,
  "size" : 50,
  "customers" : [ {
    "customerId" : "qpa66qpilt",
    "firstname" : "Devlin",
    "lastname" : "Daly",
    ...
  }, {
    "customerId" : "en2fzxutxm",
    "firstname" : "Dietrich",
    "lastname" : "Cordes",
    ...
  } ],
}

For the full Spring Boot implementation, including HATEOAS links and filtering, see the Lakeside Mutual repository.

Hints and Pitfalls to Avoid

  • The data elements returned by the operation typically have an identical structure, as in our example above, but Pagination can also be used if the structures of the individual data elements differ from each other. If the structure of the response is not repetitive, the Extract Information Holder refactoring offers an alternative solution to reduce the amount of data to be transferred.
  • When already following the API best practice of always returning an object as top-level data structure, it is possible to implement Pagination in a backwards-compatible manner by returning all results as a single, large page.
  • The order of elements should always be specified when implementing Pagination. Otherwise, clients might receive inconsistent results.

If the API deployment infrastructure involves load balancers and failover/standby configurations, keep the following in mind2:

  • The request for a follow-up page (Step 3 of the “Target Solution Sketch”) could go to a different service instance than the first initial request. In that case that (second) instance would need to perform another database request to retrieve the second page. However, the data of that second page could have changed in the repository between the two page requests. So this only works for static data that does not change often.
  • Data consistency/transaction mechanism : Assuming we are dealing with highly dynamic repository data (as in our case, the backend database is constantly changing), we need to either make sure that all page requests reach the same service instance that initially retrieved the data from the database (effectively making the service stateful), or develop a caching mechanism in the repository so that data changes between page requests are not causing data inconsistencies in the client.
  • If the service instance fails between the two page requests (assuming the service is now stateful, and we have a routing rule to reach the same instance with each page request), the provider has to notify the client that pagination has failed entirely, and the client then must retrieve the first page again.

The Introduce Data Transfer Object refactoring prepares request and response messages for the introduction of the pagination metadata.

The Microservice API Patterns describe Pagination and its variants in details, and point at additional information.

  1. Note that the icons in the response message represent Data Elements

  2. Thanks to Andrei Furda for suggesting this advice.