Extract Information Holder

Updated: October 7, 2024 Published: EuroPLoP 2024

also known as: Extract Resource Representation, Provide Information Holder Link, Share Load, Split Load

Context and Motivation

An API operation returns multiple related, possibly deeply nested data structures to provide clients with a rich dataset in a single response. We call such data elements Embedded Entities [Zimmermann et al. 2020]. For example, in an e-commerce application, the request for the profile of a customer might also return their complete purchasing history. This API is very convenient for clients requiring all the information simultaneously. However, it might not be appropriate for all use cases; some API clients might want to retrieve selected purchasing data through subsequent individual requests when they need it.

As an API client, I prefer to retrieve related data elements step-by-step over having to process large structured data sets appearing in a single response message so that I can process individual responses and the data in them quickly and on demand.

Stakeholder Concerns

#performance, #green-software: Assembling, transferring, and processing a response utilizes resources both on the provider and client side. These resources should not be wasted but handled with care and respect for the environment and the energy consumed. Bandwidth and computing power are examples of valuable and costly resources.
#evolvability, #coupling: Systems and components evolve at different speeds. Hence, they should not depend on each other unless this is justified in the business requirements. Data dependencies often introduce unwanted coupling that is difficult to detect and resolve.
#data-currentness: Data returned by an API might age at different rates. In the e-commerce shop scenario, for instance, the master data of customers (e.g., names, shipping addresses) will change less frequently than transactional data (such as orders). API clients might want to cache some of the data retrieved, which is harder if faster-changing data is embedded in slower-changing data.
#security: Not all API clients have the same access privileges. More fine-grained data Retrieval Operations make it easier to enforce related controls and rules, avoiding the risk that restricted data “slips through” accidentally. To revisit the e-commerce scenario, what if the shop software also includes public ratings of products that show the name and picture of the rating customer? Here, only limited and carefully selected information about the customer should be returned.

Initial Position Sketch

The API implementation shown in Figure 1 returns Data Elements that contain further nested data.

Extract Information Holder: Initial Position Sketch: An API provider responds to a request from a client (1) with a message (2) that contains several, possibly nested, [Data Elements](https://api-patterns.org/patterns/structure/elementStereotypes/DataElement). The client does not require all the received data.

Figure 1: Extract Information Holder: Initial Position Sketch: An API provider responds to a request from a client (1) with a message (2) that contains several, possibly nested, Data Elements. The client does not require all the received data.

The refactoring targets response messages in API operations that return rich data representation elements.

Design Smells

God endpoint: The endpoint offering this operation might have to access many data sources or backend systems to assemble the response. Derived from the “God Class” smell in object-oriented design, the term describes a class or an object that controls numerous other system parts [Riel 1996]. Many such dependencies on external systems and data make the API implementation harder to operate and evolve.
Data lifetime mismatches: Conflating Data Elements with different lifetimes makes caching, especially cache invalidation, harder. This may happen when slow-changing master data contains fast-changing transactional data (for example, in an Operational Data Holder), but also if transactional data that is often refreshed by clients contains embedded master data that infrequently changes.
Overfetching: Clients throw away parts of the received data because the API design follows a one-size-fits-all approach, and the provider includes all data in responses that any present or future client might be interested in. For example, in an e-commerce API, product procurement information might only interest a few clients, while most want to learn about current prices and items in stock.
Sell what is on the truck: Implementation data is exposed just because it is there, without any client-side use case.

Instructions

As a preparation for the refactoring, make sure that the following preconditions are met:

Decide on which parts of the message to extract. See the Embedded Entity and Linked Information Holder patterns for advice [Zimmermann et al. 2022].
Ensure the API offers a dedicated Retrieval Operation for the data that is currently embedded and will be extracted. If this is not already the case, apply the Split Operation refactoring first. An Extract Operation or Segregate Commands from Queries refactoring might also be appropriate to avoid the god endpoint smell.
(Optional) If the API operation does not already use a dedicated Data Transfer Object (DTO), apply the Introduce Data Transfer Object refactoring to decouple the API response message from the internal data model. The presence of a DTO allows changing the response message structure without affecting the internal data model. Depending on how deep the Embedded Entity is nested in the response data structure, the Introduce Data Transfer Object refactoring may have to be applied several times. You might be using a programming language or framework where this step is not required. In that case, you can just skip it as long as you have a means to modify the response message structure.

Replace an Embedded Entity with a Linked Information Holder in the following steps:

Add a Link Element to the response message that points clients to a Retrieval Operation in an Information Holder Resource. This link realizes/applies the Linked Information Holder pattern; when a DTO is present, it is placed in it.
Adjust the tests to the new response structure and run them to observe the changed responses.
(Optional) Deprecate or remove the Embedded Entity in the original response message.
Clean up the implementation code. For example, services, utilities, or repositories previously used to retrieve the embedded data might not be required anymore here; hence, they should either be moved or removed.
Check security policies to ensure that clients can access the linked data.
Adjust API clients under your control to issue additional API calls to retrieve the data available at the endpoint referenced in the new Link Element as needed.
Update API Description [Lübke et al. 2019a], version number, sample code, tutorials, etc., as required. API directories and gateways might have to be updated as well.

Target Solution Sketch (Evolution Outline)

The client can use the Link Element returned in response to the initial request to retrieve the related data in a follow-up call, as shown in Step 3 in Figure 2.

Figure 2: Extract Information Holder: Target Solution Sketch: An API client requests (1) a resource from a provider, which responds with a message (2) containing a Linked Information Holder. The client can then request (3) this data when it needs this data. The provider responds (4) with a Data Element that was embedded in the response in the Initial Position Sketch.

To reap the full benefits of this refactoring, backward compatibility has to be given up. In the first step, the Embedded Entity could be marked as deprecated to give the clients time to adjust. At a time defined and announced when applying the refactoring, the Embedded Entity is removed from the message payload. The Limited Lifetime Guarantee pattern in Lübke et al. [2019b] describes this lifecycle management strategy in detail.

Example(s)

The following API Description shows an endpoint to retrieve a CustomerProfileDTO, which includes the Embedded Entity PurchaseOrderDTOs.

API description ECommerceAPI

data type CustomerProfileId {"id": ID<string>}

data type CustomerProfileDTO {
  "id": CustomerProfileId,
  "name": Data<string>,
  <<Embedded_Entity>> "purchaseHistory": PurchaseOrderDTO*
} 

data type PurchaseOrderDTO "DTODesignToBeContinued" 

endpoint type CustomerProfileEndpoint 
serves as INFORMATION_HOLDER_RESOURCE
exposes 
  operation getCustomerProfile 
    with responsibility RETRIEVAL_OPERATION
    expecting payload CustomerProfileId
    delivering payload CustomerProfileDTO
    
API provider ECommerceAPIProvider
  offers CustomerProfileEndpoint

API client ECommerceClient
  consumes CustomerProfileEndpoint

This example uses the MDSL notation introduced in Zimmermann et al. [2022].

Having applied the refactoring, the client will now receive a link (notice the purchaseHistory link in CustomerProfileDTO):


data type CustomerProfileDTO {
  "id": CustomerProfileId,
  "name": Data<string>,
---  <<Embedded_Entity>> "purchaseHistory": PurchaseOrderDTO*
+++  <<Linked_Information_Holder>> 
+++    "purchaseHistory": Link<string>
} 

data type PurchaseOrderDTO "DTODesignToBeContinued"

+++ endpoint type PurchaseHistoryEndpoint
+++ serves as INFORMATION_HOLDER_RESOURCE
+++ exposes
+++   operation getPurchaseHistory
+++     with responsibility RETRIEVAL_OPERATION
+++     expecting payload CustomerProfileId
+++     delivering payload PurchaseOrderDTO*
    
API provider ECommerceAPIProvider
  offers CustomerProfileEndpoint 
+++   offers PurchaseHistoryEndpoint

Hints and Pitfalls to Avoid

Comparing the Target Solution Sketch from Figure 2 with the Initial Position Sketch shown in Figure 1 shows that the first resource now accesses fewer repositories to assemble the response message. This enables further architectural refactorings such as Split Application Backend.

Monitor the API to maintain and challenge the rationale for pattern usage. If most or all client calls follow the given Linked Information Holder, consider embedding the target element in the original representation again using the Inline Information Holder refactoring. A deeper discussion of the benefits and liabilities of the two patterns involved in this refactoring and its inverse, Embedded Entity and Linked Information Holder, can be found in the pattern texts in Zimmermann et al. [2022].

For the specific question of whether it is preferable to exchange several small messages or a few larger ones, please refer to our article What is the Right Service Granularity in APIs?

The inverse API refactoring is Inline Information Holder.

If there is no operation to retrieve the linked data yet, the Split Operation refactoring can be used to create one.

After a Split Operation refactoring, Extract Information Holder can be used to further “split” the response messages of the operations.

The Wish List and Wish Template patterns (and related Add Wish List and Add Wish Template refactorings) offer alternative solutions to the problem of how an API client can inform the API provider at runtime about the data it is interested in.

Context Mapper [Kapferer and Zimmermann 2020], a modeling framework and Domain-specific Language (DSL) for Domain-Driven Design (DDD), implements a refactoring called Split Aggregate by Entity. A DDD Aggregate [Evans 2003] establishes a transactional boundary around a group of Entities that are persisted together; a data-centric Aggregate could be exposed via an Information Holder Resources on the API level. Splitting such an Aggregate therefore can be seen to correspond to splitting or extracting parts from an API-level Information Holder Resource.

As another example not related to APIs but Web application frontend design, consider the difference between single and multi-page websites. All information is available on a single page regardless of whether it is relevant to each reader. In a multi-page design, the home page gives an overview, and additional information is provided via hyperlinks that can be followed on demand.

References

Evans, Eric. 2003. Domain-Driven Design: Tacking Complexity in the Heart of Software. Addison-Wesley.

Kapferer, Stefan, and Olaf Zimmermann. 2020. “Domain-Driven Service Design.” In Service-Oriented Computing, edited by Schahram Dustdar, 189–208. Springer International Publishing. https://doi.org/10.1007/978-3-030-64846-6_11.

Lübke, Daniel, Olaf Zimmermann, Cesare Pautasso, Uwe Zdun, and Mirko Stocker. 2019a. “Interface Evolution Patterns: Balancing Compatibility and Extensibility Across Service Life Cycles.” In Proceedings of the 24th European Conference on Pattern Languages of Programs. EuroPLop ’19. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3361149.3361164.

———. 2019b. “Interface Evolution Patterns: Balancing Compatibility and Extensibility Across Service Life Cycles.” In Proceedings of the 24th European Conference on Pattern Languages of Programs. EuroPLop ’19. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3361149.3361164.

Riel, Arthur J. 1996. Object-Oriented Design Heuristics. Reading, MA: Addison-Wesley.

Zimmermann, Olaf, Mirko Stocker, Daniel Lübke, Cesare Pautasso, and Uwe Zdun. 2020. “Introduction to Microservice API Patterns (MAP).” In Joint Post-Proceedings of the First and Second International Conference on Microservices (Microservices 2017/2019), edited by Luı́s Cruz-Filipe, Saverio Giallorenzo, Fabrizio Montesi, Marco Peressotti, Florian Rademacher, and Sabine Sachweh, 78:4:1–17. OpenAccess Series in Informatics (OASIcs). Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. https://doi.org/10.4230/OASIcs.Microservices.2017-2019.4.

Zimmermann, Olaf, Mirko Stocker, Daniel Lübke, Uwe Zdun, and Cesare Pautasso. 2022. Patterns for API Design: Simplifying Integration with Loosely Coupled Message Exchanges. Addison-Wesley Signature Series (Vernon). Addison-Wesley Professional.