Extract Information Holder
also known as: Extract Resource Representation
Context and Motivation
An API operation returns multiple related, possibly deeply nested data structures to provide clients with a rich dataset in a single response. The Microservice API Patterns call such data elements Embedded Entities. For example, in an e-commerce application, the request for the profile of a customer might also return their complete purchasing history. While this API is very convenient for clients that process all the information at once, it is not appropriate for all use cases. A Linked Information Holder might suit some clients better as they can retrieve selected data on demand through subsequent individual requests.
As an API client, I want to be able to retrieve related data elements on demand instead of large structured data sets arriving in a single message so that I can process subsequent responses and the data in them quicker.
Stakeholder Concerns (including Quality Attributes and Design Forces)
- #performance
- Assembling, transferring, and processing a response utilizes resources both on the provider and on the client side. These resources should not be wasted but treated with care. Bandwidth and computing power are examples of precious (and sometimes costly) resources.
- #usability including #developer-experience
- The implementation effort on the client side decreases if fewer requests and less client-side state management are required to fetch the desired data.
- #evolvability
- Systems and components evolve at different speeds. Hence, they should not depend on each other unless this is justified in the business requirements. Data dependencies often introduce undesired, hard-to-spot coupling.
- #data-currentness
- Data returned by an API might age at different rates. In the e-commerce shop scenario, for instance, the master data of customers (e.g., names, shipping addresses) will change less frequently than transactional data such as orders. API clients might want to cache some of the data retrieved, which is harder if faster-changing data is embedded in a slower-changing data.
- #security
- Not all API clients have the same access privileges. More fine-grained data retrieval operations make it easier to enforce related controls and rules, avoiding the risk that restricted data “slips through” accidentally. To revisit the e-commerce scenario, what if the shop software also includes public ratings of products that show the name and picture of the rating customer? Here, only limited and carefully selected information about the customer should be returned.
Initial Position Sketch
The API implementation returns Data Elements, that contain further nested data, represented by the various icons from MAP [Zimmermann et al. 2020].
The refactoring targets response message in operations and their data representation elements.
Smells / Drivers
- God endpoint
- The endpoint offering this operation might have to access many data sources or backend systems to assemble the response. A large amount of such dependencies on external systems and data makes the API implementation harder to evolve.
- Data lifetime mismatches
- Conflating data elements with different life times makes caching and especially the cache invalidation harder. This may happen when slow-changing master data contains fast-changing transactional data, but also if the relation is reversed and transactional data that is often refreshed by clients contains embedded master-data that infrequently changes.
- Overfetching
- Clients make multiple API calls to get all data they require because these calls do not offer any way to define the targeted representation elements (publishing parts or all of a domain model’s entities and their attributes).
Instructions (Steps)
Preparation/Preconditions:
- Ensure that the API offers a separate Retrieval Operation for data. If this is not already the case, apply the Split Operation refactoring first.
- Add a Linked Information Holder to the refactored response message so that clients know how to fetch the linked data.1
- If the API operation does not already use a dedicated Data Transfer Object (DTO), apply the Introduce Data Transfer Object refactoring.
Depending on how deep the Embedded Entity is nested in the response data structure, the refactoring may have to be applied several times.
To replace an Embedded Entity with a Linked Information Holder:
- Add a Link Element to the DTO.
- Adjust the tests to the new response structure and run them to observe the changed responses.
- Deprecate or remove the Embedded Entity.
- Clean up the implementation code. For example, service/utility classes or repositories previously used to retrieve the embedded data might not be required anymore.
- Check security policies to ensure that clients can still access the linked data.
- If under your control, adjust API clients to issue additional API calls to retrieve the data available at the endpoint referenced in the new Link Element.
- Update API description, version number, sample code, tutorials, etc. as required. API directories and gateways might have to be updated as well.
Target Solution Sketch (Evolution Outline)
The client can use the link returned in the initial request to retrieve the related data:
To reap the full benefits of this refactoring, backwards compatibility has to be given up. In a first step, the Embedded Entity could be marked as deprecated to give the clients time to adjust.
Example(s)
The following API description shows an endpoint to retrieve the CustomerProfileDTO
, which includes the Embedded Entity PurchaseOrderDTOs
.
API description ECommerceAPI
data type CustomerProfileId {"id": ID<string>}
data type CustomerProfileDTO {
"id": CustomerProfileId,
"givenName": Data<string>,
"familyName": Data<string>,
<<Embedded_Entity>> "purchaseHistory": PurchaseOrderDTO*
}
data type PurchaseOrderDTO "ToBeContinued"
endpoint type CustomerProfileEndpoint serves as INFORMATION_HOLDER_RESOURCE
exposes
operation getCustomerProfile with responsibility RETRIEVAL_OPERATION
expecting payload CustomerProfileId
delivering payload CustomerProfileDTO
API provider ECommerceAPIProvider
offers CustomerProfileEndpoint
API client ECommerceClient
consumes CustomerProfileEndpoint
Having applied the refactoring, the client will now receive a link (notice the purchaseHistory
link in CustomerProfileDTO
).
API description ECommerceAPI
data type CustomerProfileId {"id": ID<string>}
data type CustomerProfileDTO {
"id": CustomerProfileId,
"givenName": Data<string>,
"familyName": Data<string>,
<<Linked_Information_Holder>> "purchaseHistory": Link<string>
}
data type PurchaseOrderDTO "ToBeContinued"
endpoint type CustomerProfileEndpoint serves as INFORMATION_HOLDER_RESOURCE
exposes
operation getCustomerProfile with responsibility RETRIEVAL_OPERATION
expecting payload CustomerProfileId
delivering payload CustomerProfileDTO
endpoint type PurchaseHistoryEndpoint serves as INFORMATION_HOLDER_RESOURCE
exposes
operation getPurchaseHistory with responsibility RETRIEVAL_OPERATION
expecting payload CustomerProfileId
delivering payload PurchaseOrderDTO*
API provider ECommerceAPIProvider
offers CustomerProfileEndpoint
offers PurchaseHistoryEndpoint
API client ECommerceClient
consumes CustomerProfileEndpoint
Hints and Pitfalls to Avoid
Comparing the Target Solution sketch with Initial Position shows that the first resource now accesses fewer repositories to assemble the response message. This enables further architectural refactorings such as Split Application Kernel.
A deeper discussion of the benefits and liabilities of these two patterns can be found in the Embedded Entity and Linked Information Holder patterns.
HAL proposed a hyperlink notation.
Related Content
The inverse API refactoring is Inline Information Holder.
If there is no operation to retrieve the linked data, the Split Operation refactoring can be used to create one.
After a Split Operation refactoring, Extract Information Holder can be used to “split” the response messages of the operations.
The Wish List and Wish Template patterns in MAP (and related refactorings) offer alternative solutions to the problem of how an API client can inform the API provider at runtime about the data it is interested in.
References
Zimmermann, Olaf, Mirko Stocker, Daniel Lübke, Cesare Pautasso, and Uwe Zdun. 2020. “Introduction to Microservice API Patterns (MAP).” In Joint Post-Proceedings of the First and Second International Conference on Microservices (Microservices 2017/2019), edited by Luı́s Cruz-Filipe, Saverio Giallorenzo, Fabrizio Montesi, Marco Peressotti, Florian Rademacher, and Sabine Sachweh, 78:4:1–17. OpenAccess Series in Informatics (OASIcs). Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. https://doi.org/10.4230/OASIcs.Microservices.2017-2019.4.
-
A Linked Information Holder comes as a Link Element pointing at another endpoint, typically an Information Holder Resource. ↩