Skip to content

Latest commit

 

History

History
324 lines (260 loc) · 10.4 KB

best-practices.adoc

File metadata and controls

324 lines (260 loc) · 10.4 KB

Best practices

The best practices presented in this section are not part of the actual guidelines, but should provide guidance for common challenges we face when implementing RESTful APIs.

Cursor-based pagination in RESTful APIs

Cursor-based pagination is a very powerful and valuable technique, that allows to efficiently provide a stable view on changing data. This is obtained by using an anchor element that allows to retrieve all page elements directly via an ordering combined-index, usually based on created_at or modified_at. Simple said, the cursor is the information set needed to reconstruct the database query to retrieves the minimal page information from the data storage.

The {cursor} itself is an opaque string, transmitted forth and back between service and clients, that must never be inspected or constructed by clients. Therefore, it is good practice to encode (encrypt) its content in a non-human-readable form.

The {cursor} content usually consists of a pointer to the anchor element defining the page position in the collection, a flag whether the element is included or excluded into/from the page, the retrieval direction, and a hash over the applied query filters (or the query filter itself) to safely re-create the collection. It is important to note, that a {cursor} should be always defined in relation to the current page to anticipate all occurring changes when progressing.

The {cursor} is usually defined as an encoding of the following information:

Cursor:
  descriptions: >
    Cursor structure that contains all necessary information to efficiently
    retrieve a page from the data store.
  type: object
  properties:
    position:
      description: >
        Object containing the keys pointing to the anchor element that is
        defining the collection resource page. Normally the position is given
        by the first or the last page element. The position object contains all
        values required to access the element efficiently via the ordered,
        combined index, e.g `modified_at`, `id`.
      type: object
      properties: ...

    element:
      description: >
        Flag whether the anchor element, which is pointed to by the `position`,
        should be *included* or *excluded* from the result set. Normally, only
        the current page includes the pointed to element, while all others are
        exclude it.
      type: string
      enum: [ INCLUDED, EXCLUDED ]

    direction:
      description: >
        Flag for the retrieval direction that is defining which elements to
        choose from the collection resource starting from the anchor elements
        position. It is either *ascending* or *descending* based on the
        ordering combined index.
      type: string
      enum: [ ASCENDING, DESCENDING ]

    query_hash:
      description: >
        Stable hash calculated over all query filters applied to create the
        collection resource that is represented by this cursor.
      type: string

    query:
      description: >
        Object containing all query filters applied to create the collection
        resource that is represented by this cursor.
      type: object
      properties: ...

  required:
    - position
    - element
    - direction

Note: In case of complex and long search requests, e.g. when {GET-with-body} is already required, the {cursor} may not be able to include the query because of common HTTP parameter size restrictions. In this case the query filters should be transported via body - in the request as well as in the response, while the pagination consistency should be ensured via the query_hash.

Remark: It is also important to check the efficiency of the data-access. You need to make sure that you have a fully ordered stable index, that allows to efficiently resolve all elements of a page. If necessary, you need to provide a combined index that includes the id to ensure the full order and additional filter criteria to ensure efficiency.

Optimistic locking in RESTful APIs

Introduction

Optimistic locking might be used to avoid concurrent writes on the same entity, which might cause data loss. A client always has to retrieve a copy of an entity first and specifically update this one. If another version has been created in the meantime, the update should fail. In order to make this work, the client has to provide some kind of version reference, which is checked by the service, before the update is executed. Please read the more detailed description on how to update resources via {PUT} in the HTTP Requests Section.

A RESTful API usually includes some kind of search endpoint, which will then return a list of result entities. There are several ways to implement optimistic locking in combination with search endpoints which, depending on the approach chosen, might lead to performing additional requests to get the current version of the entity that should be updated.

ETag with If-Match header

An {ETag} can only be obtained by performing a {GET} request on the single entity resource before the update, i.e. when using a search endpoint an additional request is necessary.

Example:

< GET /orders

> HTTP/1.1 200 OK
> {
>   "items": [
>     { "id": "O0000042" },
>     { "id": "O0000043" }
>   ]
> }

< GET /orders/BO0000042

> HTTP/1.1 200 OK
> ETag: osjnfkjbnkq3jlnksjnvkjlsbf
> { "id": "BO0000042", ... }

< PUT /orders/O0000042
< If-Match: osjnfkjbnkq3jlnksjnvkjlsbf
< { "id": "O0000042", ... }

> HTTP/1.1 204 No Content

Or, if there was an update since the {GET} and the entity’s {ETag} has changed:

> HTTP/1.1 412 Precondition failed

Pros

  • RESTful solution

Cons

  • Many additional requests are necessary to build a meaningful front-end

ETags in result entities

The ETag for every entity is returned as an additional property of that entity. In a response containing multiple entities, every entity will then have a distinct {ETag} that can be used in subsequent {PUT} requests.

In this solution, the etag property should be readonly and never be expected in the {PUT} request payload.

Example:

< GET /orders

> HTTP/1.1 200 OK
> {
>   "items": [
>     { "id": "O0000042", "etag": "osjnfkjbnkq3jlnksjnvkjlsbf", "foo": 42, "bar": true },
>     { "id": "O0000043", "etag": "kjshdfknjqlowjdsljdnfkjbkn", "foo": 24, "bar": false }
>   ]
> }

< PUT /orders/O0000042
< If-Match: osjnfkjbnkq3jlnksjnvkjlsbf
< { "id": "O0000042", "foo": 43, "bar": true }

> HTTP/1.1 204 No Content

Or, if there was an update since the {GET} and the entity’s {ETag} has changed:

> HTTP/1.1 412 Precondition failed

Pros

  • Perfect optimistic locking

Cons

  • Information that only belongs in the HTTP header is part of the business objects

Version numbers

The entities contain a property with a version number. When an update is performed, this version number is given back to the service as part of the payload. The service performs a check on that version number to make sure it was not incremented since the consumer got the resource and performs the update, incrementing the version number.

Since this operation implies a modification of the resource by the service, a {POST} operation on the exact resource (e.g. POST /orders/O0000042) should be used instead of a {PUT}.

In this solution, the version property is not readonly since it is provided at {POST} time as part of the payload.

Example:

< GET /orders

> HTTP/1.1 200 OK
> {
>   "items": [
>     { "id": "O0000042", "version": 1,  "foo": 42, "bar": true },
>     { "id": "O0000043", "version": 42, "foo": 24, "bar": false }
>   ]
> }

< POST /orders/O0000042
< { "id": "O0000042", "version": 1, "foo": 43, "bar": true }

> HTTP/1.1 204 No Content

or if there was an update since the {GET} and the version number in the database is higher than the one given in the request body:

> HTTP/1.1 409 Conflict

Pros

  • Perfect optimistic locking

Cons

  • Functionality that belongs into the HTTP header becomes part of the business object

  • Using {POST} instead of PUT for an update logic (not a problem in itself, but may feel unusual for the consumer)

Last-Modified / If-Unmodified-Since

In HTTP 1.0 there was no {ETag} and the mechanism used for optimistic locking was based on a date. This is still part of the HTTP protocol and can be used. Every response contains a {Last-Modified} header with a HTTP date. When requesting an update using a {PUT} request, the client has to provide this value via the header {If-Unmodified-Since}. The server rejects the request, if the last modified date of the entity is after the given date in the header.

This effectively catches any situations where a change that happened between {GET} and {PUT} would be overwritten. In the case of multiple result entities, the {Last-Modified} header will be set to the latest date of all the entities. This ensures that any change to any of the entities that happens between {GET} and {PUT} will be detectable, without locking the rest of the batch as well.

Example:

< GET /orders

> HTTP/1.1 200 OK
> Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
> {
>   "items": [
>     { "id": "O0000042", ... },
>     { "id": "O0000043", ... }
>   ]
> }

< PUT /block/O0000042
< If-Unmodified-Since: Wed, 22 Jul 2009 19:15:56 GMT
< { "id": "O0000042", ... }

> HTTP/1.1 204 No Content

Or, if there was an update since the {GET} and the entities last modified is later than the given date:

> HTTP/1.1 412 Precondition failed

Pros

  • Well established approach that has been working for a long time

  • No interference with the business objects; the locking is done via HTTP headers only

  • Very easy to implement

  • No additional request needed when updating an entity of a search endpoint result

Cons

  • If a client communicates with two different instances and their clocks are not perfectly in sync, the locking could potentially fail

Conclusion

We suggest to either use the {ETag} in result entities or {Last-Modified} / {If-Unmodified-Since} approach.