Extract Information Holder

Updated:

 Note: This is a preview release and subject to change. Feedback welcome! Contact Information and Background (PDF)

also known as: Extract Resource Representation

Context and Motivation

An API operation returns multiple related, possibly deeply nested data structures to provide clients with a rich dataset in a single response. We call such data elements Embedded Entities [Zimmermann et al. 2020]. For example, in an e-commerce application, the request for the profile of a customer might also return their complete purchasing history. This API is very convenient for clients that require all the information at once. However, it is not appropriate for all use cases; the alternative pattern Linked Information Holder suits some clients better as they can retrieve selected data on demand through subsequent individual requests.

As an API client, I prefer to retrieve related data elements step-by-step over having to process large structured data sets appearing in a single response message so that I can process individual responses and the data in them quickly and on demand.

Stakeholder Concerns (including Quality Attributes and Design Forces)

#performance
Assembling, transferring, and processing a response utilizes resources both on the provider and client side. These resources should not be wasted but treated with care. Bandwidth and computing power are examples of precious and costly resources.
#usability including #developer-experience
The implementation effort on the client side decreases if fewer requests and less client-side state management are required to fetch the desired data.
#evolvability
Systems and components evolve at different speeds. Hence, they should not depend on each other unless this is justified in the business requirements. Data dependencies often introduce undesired, hard-to-spot coupling.

 

#data-currentness
Data returned by an API might age at different rates. In the e-commerce shop scenario, for instance, the master data of customers (e.g., names, shipping addresses) will change less frequently than transactional data (such as orders). API clients might want to cache some of the data retrieved, which is harder if faster-changing data is embedded in slower-changing data.
#security
Not all API clients have the same access privileges. More fine-grained data Retrieval Operations make it easier to enforce related controls and rules, avoiding the risk that restricted data “slips through” accidentally. To revisit the e-commerce scenario, what if the shop software also includes public ratings of products that show the name and picture of the rating customer? Here, only limited and carefully selected information about the customer should be returned.

Initial Position Sketch

The API implementation returns Data Elements, that contain further nested data, represented by the various icons from “Patterns for API Design” [Zimmermann et al. 2022].

Extract Information Holder: Initial Position Sketch

Extract Information Holder: Initial Position Sketch

The refactoring targets response message in operations and their data representation elements.

Design Smells

God endpoint
The endpoint offering this operation might have to access many data sources or backend systems to assemble the response. A large amount of such dependencies on external systems and data makes the API implementation harder to operate and evolve.
Data lifetime mismatches
Conflating Data Elements with different life times makes caching and especially the cache invalidation harder. This may happen when slow-changing master data contains fast-changing transactional data (for example, in an Operational Data Holder), but also if transactional data that is often refreshed by clients contains embedded master-data that infrequently changes.
Overfetching
Clients call multiple API operations to get all data they require because these calls do not offer any way to define the targeted representation elements (publishing parts or all of a domain model’s entities and their attributes).

Instructions (Steps)

As a preparation, make sure that the following preconditions hold:

  1. Ensure that the API offers a dedicated Retrieval Operation for the currently embedded data. If this is not already the case, apply the Split Operation refactoring first. An Extract Endpoint or Segregate Commands from Queries refactoring might also be appropriate to prevent the responsibility mishmash smell.
  2. If the API operation does not already use a dedicated Data Transfer Object (DTO), apply the Introduce Data Transfer Object refactoring.

Depending on how deep the Embedded Entity is nested in the response data structure, the refactoring may have to be applied several times.

To replace an Embedded Entity with a Linked Information Holder:

  1. Add a Link Element (the Linked Information Holder) to the DTO, referring clients to another endpoint, typically an Information Holder Resource.
  2. Adjust the tests to the new response structure and run them to observe the changed responses.
  3. Deprecate or remove the Embedded Entity.
  4. Clean up the implementation code. For example, service/utility classes or repositories previously used to retrieve the embedded data might not be required anymore.
  5. Check security policies to ensure that clients can still access the linked data.
  6. If under your control, adjust API clients to issue additional API calls to retrieve the data available at the endpoint referenced in the new Link Element.
  7. Update API Description, version number, sample code, tutorials, etc. as required. API directories and gateways might have to be updated as well.

Target Solution Sketch (Evolution Outline)

The client can use the link returned in the initial request to retrieve the related data:

Extract Information Holder: Target Solution Sketch

Extract Information Holder: Target Solution Sketch

To reap the full benefits of this refactoring, backward compatibility has to be given up. In a first step, the Embedded Entity could be marked as deprecated to give the clients time to adjust. At a time defined and announced when applying the refactoring, the Embedded Entity is removed from the message payload. This lifecycle management strategy is described in the Limited Lifetime Guarantee pattern [Lübke et al. 2019].

Example(s)

The following API Description shows an endpoint to retrieve a CustomerProfileDTO, which includes the Embedded Entity PurchaseOrderDTOs.

API description ECommerceAPI

data type CustomerProfileId {"id": ID<string>}

data type CustomerProfileDTO {
  "id": CustomerProfileId,
  "givenName": Data<string>,
  "familyName": Data<string>,
  <<Embedded_Entity>> "purchaseHistory": PurchaseOrderDTO*
} 

data type PurchaseOrderDTO "ToBeContinued" // incomplete specification (placeholder)

endpoint type CustomerProfileEndpoint serves as INFORMATION_HOLDER_RESOURCE
exposes 
  operation getCustomerProfile with responsibility RETRIEVAL_OPERATION
    expecting payload CustomerProfileId
    delivering payload CustomerProfileDTO
    
API provider ECommerceAPIProvider
  offers CustomerProfileEndpoint

API client ECommerceClient
  consumes CustomerProfileEndpoint

Having applied the refactoring, the client will now receive a link (notice the purchaseHistory link in CustomerProfileDTO).

API description ECommerceAPI

data type CustomerProfileId {"id": ID<string>}

data type CustomerProfileDTO {
  "id": CustomerProfileId,
  "givenName": Data<string>,
  "familyName": Data<string>,
  <<Linked_Information_Holder>> "purchaseHistory": Link<string>
} 

data type PurchaseOrderDTO "ToBeContinued"

endpoint type CustomerProfileEndpoint serves as INFORMATION_HOLDER_RESOURCE
exposes 
  operation getCustomerProfile with responsibility RETRIEVAL_OPERATION
    expecting payload CustomerProfileId
    delivering payload CustomerProfileDTO // contains <<Embedded_Entity>>
    
// new API endpoint:
endpoint type PurchaseHistoryEndpoint serves as INFORMATION_HOLDER_RESOURCE
exposes
  operation getPurchaseHistory with responsibility RETRIEVAL_OPERATION
    expecting payload CustomerProfileId
    delivering payload PurchaseOrderDTO*
    
API provider ECommerceAPIProvider
  offers CustomerProfileEndpoint 
  offers PurchaseHistoryEndpoint

API client ECommerceClient
  consumes CustomerProfileEndpoint

Hints and Pitfalls to Avoid

Comparing the Target Solution sketch with the Initial Position shows that the first resource now accesses fewer repositories to assemble the response message. This enables further architectural refactorings such as Split Application Kernel.

A deeper discussion of the benefits and liabilities of these two patterns can be found in the Embedded Entity and Linked Information Holder patterns.

The inverse API refactoring is Inline Information Holder.

If there is no operation to retrieve the linked data, the Split Operation refactoring can be used to create one.

After a Split Operation refactoring, Extract Information Holder can be used to “split” the response messages of the operations.

The Wish List and Wish Template patterns in MAP (and related refactorings) offer alternative solutions to the problem of how an API client can inform the API provider at runtime about the data it is interested in.

References

Lübke, Daniel, Olaf Zimmermann, Cesare Pautasso, Uwe Zdun, and Mirko Stocker. 2019. “Interface Evolution Patterns: Balancing Compatibility and Extensibility Across Service Life Cycles.” In Proceedings of the 24th European Conference on Pattern Languages of Programs. EuroPLop ’19. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3361149.3361164.

Zimmermann, Olaf, Mirko Stocker, Daniel Lübke, Cesare Pautasso, and Uwe Zdun. 2020. “Introduction to Microservice API Patterns (MAP).” In Joint Post-Proceedings of the First and Second International Conference on Microservices (Microservices 2017/2019), edited by Luı́s Cruz-Filipe, Saverio Giallorenzo, Fabrizio Montesi, Marco Peressotti, Florian Rademacher, and Sabine Sachweh, 78:4:1–17. OpenAccess Series in Informatics (OASIcs). Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. https://doi.org/10.4230/OASIcs.Microservices.2017-2019.4.

Zimmermann, Olaf, Mirko Stocker, Daniel Lübke, Uwe Zdun, and Cesare Pautasso. 2022. Patterns for API Design: Simplifying Integration with Loosely Coupled Message Exchanges. Addison-Wesley Signature Series (Vernon). Addison-Wesley Professional.