Qore DataStreamUtil Module Reference  1.1

Introduction to the DataStreamUtil Module

The DataStreamUtil module provides client and server support for YAML-encoded HTTP 1.1 (RFC-2616) chunked transfers where each chunk is a unique data entity; allowing data to be streamed from remote servers and used as soon as it is received. This module provides client and server support to allow Qore code to deal with data on the sending and receiving ends by taking care of the encoding and serialization issues to allow the serialized data to be sent with optional data compression over standard HTTP chunked transfers and be usable immediately on receipt on the remote end.

In Qore, DataStream support is implemented on top of and is designed to extend the REST infrastructure provided by the Qore library.

This module is used automatically by the DataStreamClient and DataStreamRequestHandler user modules; to use this module directly for low-level DataStream protocol support, use "%requires DataStreamUtil" in your code.

All the public symbols in the module are defined in the DataStreamUtil namespace

Functions:

DataStream Protocol

The DataStream protocol is based on HTTP 1.1 (RFC-2616) chunked transfers where each chunk contains UTF-8 encoded YAML-serialized data with optional compression and where each chunk is an independently decodable and parsable entity. This differs from standard HTTP chunked transfers in that content encoding and semantic completeness of a message are defined over the entire message body. By using DataStream instead of standard HTTP chunked transfer, data can be streamed from one server to another and be usable immediately on receipt on the remote end.

A DataStream transfer with streaming involves at least one chunked transfer; either the request or the reply must be sent with chunked transfer encoding to have the DataStream protocol applied. Non-chunked, monolithic requests and responses are also supported, but in these cases standard HTTP encoding and decoding rules are applied.

A DataStream request-response pair without streaming is equivalent to a standard HTTP request-response pair but with the addition of DataStream headers which are ignored in the case that no chunked transfers are made.

DataStream runs over HTTP 1.1 (RFC-2616) and uses standard HTTP features with custom headers to identify the data serialization, character encoding, and content encoding applied to each chunk. DataStream is currently defined using UTF-8 encoded YAML for data serialization, but was designed to be extensible for future use with other data serialization methods through the use of appropriate headers.

Note
RFC 6648 deprecates the use of an "X-" prefix in non-standardized HTTP headers and trailers, therefore no such prefix exists for DataStream headers and trailers.

DataStream Data Serialization

DataStream uses UTF-8 encoded YAML for data serialization to allow for maximum data fidelity over the HTTP link.

For non-chunked messages, the Content-Type header is "text/x-yaml;charset=utf8" ("utf8" is case-insensitive and may contain a hyphen before the 8), and the data is sent in a normal HTTP message body. Data compression is supported with normal HTTP content encoding as described in the previous link, applied to the serialized YAML data before sending, and the reverse operation is applied on receipt before YAML data deserialization to native data structures by the receiver.

For chunked DataStream requests and responses, the Content-Type header is set to "application/octet-stream" (to make the chunked data opaque to the standard HTTP protocol since the semantic completeness of the message body is not defined over the entire body but rather over each chunk) and the content-type of each atomic chunk is given by the DataStream-Content-Type header which should be set to "text/x-yaml;charset=utf8" ("utf8" is case-insensitive and may contain a hyphen before the 8). Data compression is supported and is applied to each chunk atomically as DataStream-specific content encoding after chunk data serialization; the reverse operation is applied before YAML data deserialization to native data structures by the receiver.

DataStream Chunked Request Headers

The following headers are set with DataStream chunked requests:
DataStream Chunked Request Headers

Header Value Description
Content-Type application/octet-stream MUST be present to make the chunked data opaque to the standard HTTP protocol since the semantic completeness of the message body is not defined over the entire body but rather over each chunk
Accept text/x-yaml,application/octet-stream other media types MAY be included, but at least the following MUST be included:
- text/x-yaml: MUST be included in case a non-chunked response is returned
- application/octet-stream: MUST be included in case of a DataStream chunked response
[Accept-Encoding] gzip,bzip2,deflate optional header declaring the content encoding methods supported by the sender (if present, clients MUST set this to the same value as the DataStream-Accept-Encoding header)
DataStream-Content-Type text/x-yaml;charset=utf8 MUST be present to identify the content type of each chunk as YAML-encoded data ("utf8" is case-insensitive and may contain a hyphen before the 8)
DataStream-Accept text/x-yaml clients MUST include this header with this value to indicate that the requestor can accept DataStream responses; the server MAY still reply with a non-chunked response; if a DataStream server receives a request without this header, then no DataStream reply can be returned; either a monolithic HTTP reply must be returned or a 406 "Not Acceptable" error must be returned
[DataStream-Accept-Encoding] gzip,bzip2,deflate optional header declaring the DataStream content encoding methods supported by the sender (if present, clients MUST set this to the same value as the Accept-Encoding header)
[DataStream-Content-Encoding] one of identity, bzip2, gzip, or deflate this header is optional; MUST included if DataStream data compression is used in the request body. This header MUST NOT contain more than one value, if present
Transfer-Encoding chunked MUST be included for HTTP chunked transfers: RFC-2616 3.6.1 Chunked Transfer Encoding
Note
"Content-Encoding" and "Content-Length" headers MUST NOT be included in DataStream chunked transfers

DataStream Non-Chunked Request Headers

The following headers are set with DataStream non-chunked requests:
DataStream Non-Chunked Request Headers

Header Value Description
Accept text/x-yaml,application/octet-stream other media types MAY be included, but at least the following MUST be included:
- text/x-yaml: MUST be included in case a non-chunked response is returned
- application/octet-stream: MUST be included in case of a DataStream chunked response
[Accept-Encoding] gzip,bzip2,deflate optional header declaring the content encoding methods supported by the sender (if present, clients MUST set this to the same value as the DataStream-Accept-Encoding header)
DataStream-Accept text/x-yaml clients MUST include this header with this value to indicate that the requestor can accept DataStream responses; the server MAY still reply with a non-chunked response; if a DataStream server receives a request without this header, then no DataStream reply can be returned; either a monolithic HTTP reply must be returned or a 406 "Not Acceptable" error must be returned
[DataStream-Accept-Encoding] gzip,bzip2,deflate optional header declaring the DataStream content encoding methods supported by the sender (if present, clients MUST set this to the same value as the Accept-Encoding header)
[Content-Type] text/x-yaml;charset=utf8 MUST be included in requests with a message body; this reflects the content type of the body as YAML encoded data; MAY be included in requests without a message body in which case it MUST be ignored by the server ("utf8" is case-insensitive and may contain a hyphen before the 8)
[Content-Encoding] one of identity, bzip2, gzip, or deflate this header is optional; MUST included if data compression is used in the request body
[Content-Length] number This header is required in non-chunked requests with a message body

DataStream Chunked Response Headers

The following headers are set with DataStream chunked responses:
DataStream Chunked Response Headers

Content-Type application/octet-stream MUST be present to make the chunked data opaque to the standard HTTP protocol since the semantic completeness of the message body is not defined over the entire body but rather over each chunk
DataStream-Content-Type text/x-yaml;charset=utf8 MUST be present to identify the content type of each chunk as YAML-encoded data ("utf8" is case-insensitive and may contain a hyphen before the 8)
[DataStream-Content-Encoding] one of identity, bzip2, gzip, or deflate this header is optional; MUST included if DataStream data compression is used in the request body. This header MUST NOT contain more than one value, if present
Transfer-Encoding chunked MUST be included for HTTP chunked transfers: RFC-2616 3.6.1 Chunked Transfer Encoding
Trailer DataStream-Error MUST be included as this trailer record will be sent after chunked data is transferred if an error occurs on the sending side, in which case the trailer will be assigned a string giving information about the error that occurred
Note
"Content-Encoding" and "Content-Length" headers MUST NOT be included in DataStream chunked transfers

DataStream Non-Chunked Response Headers

The following headers are set with DataStream non-chunked responses:
DataStream Non-Chunked Response Headers

Content-Type text/x-yaml;charset=utf8 MUST be included in responses with a message body; this reflects the content type of the body as YAML encoded data; MAY be included in responses without a message body in which case it MUST be ignored by the client ("utf8" is case-insensitive and may contain a hyphen before the 8)
Content-Length number MUST be included in non-chunked responses with a message body
[Content-Encoding] one of identity, bzip2, gzip, or deflate this header is optional; MUST included if data compression is used in the response body

DataStream Trailers

The following trailer may be sent with DataStream chunk responses after all data has been transferred:
DataStream Chunked Response Trailers

DataStream-Error This trailer is sent when the chunked data transfer is complete if there were any errors on the sending side. If so, the value will be a string describing the error

DataStream Compression

Compression of chunked message bodies is supported by applying DataStream content encoding as specified by the DataStream-Content-Encoding header on each chunk individually (after YAML data serialization) before sending and then applying the reverse operation each chunk immediately after reception and before YAML deserialization; this is analogous to standard HTTP content encoding (which is applied to the message body as a whole) but is applied to each chunk separately.

Data compression is identified in a DataStream transfer by the following header:

  • "DataStream-Content-Encoding": set to one of "identity", "bzip2", "gzip", or "deflate" if data compression is used

DataStream server implementations MUST support at least the above content encoding methods. This allows clients to include DataStream compression with the first request in case of streaming data to the server.

DataStream clients claim support for these content encoding methods by including them in the Accept-Encoding and DataStream-Accept-Encoding headers in the request; both of these headers must contain the same values in client requests.

See also
RFC-2616

Example DataStream Request

PUT /api/system?action=dataStream HTTP/1.1
Accept: text/x-yaml,application/x-yaml,text/xml,application/xml,application/json,application/octet-stream
User-Agent: Qore-DataStreamClient/1.1
Content-Type: application/octet-stream
DataStream-Content-Type: text/x-yaml;charset=utf8
DataStream-Accept: text/x-yaml
DataStream-Accept-Encoding: gzip,bzip2,deflate
DataStream-Content-Encoding: gzip
Transfer-Encoding: chunked
Accept-Encoding: bzip2,deflate
Connection: Keep-Alive
Host: localhost:8001

Example DataStream Response

HTTP/1.1 200 OK
Content-Type: application/octet-stream
Transfer-Encoding: chunked
Datastream-Content-Type: text/x-yaml;charset=utf8
Datastream-Content-Encoding: bzip2
Trailer: DataStream-Error
Connection: Keep-Alive
Date: Sun, 20 Apr 2014 07:49:51 GMT
Server: Qorus-HTTP-Server/0.3.7

Release Notes

DataStreamUtil v1.1

  • minor updates for complex types

DataStreamUtil v1.0.1

  • fixed a bug handling chunked non-DataStream messages (issue 1438)

DataStreamUtil v1.0

  • initial release of the module