// -*- mode: c++; indent-tabs-mode: nil -*-
// @file DataStreamUtil.qm Qore user module implementing support for the DataStream protocol: YAML-encoded HTTP chunked transfers where each chunk is a unique data entity
/* DataStreamUtil.qm Copyright (C) 2014 - 2016 Qore Technologies, s.r.o.
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
*/
// this module requires Qore 0.8.12 or better
// require type definitions everywhere
// enable all warnings
// do not use $ signs in declarations
/* Version History: see docs below
*/
/** @mainpage DataStreamUtil Module
@tableofcontents
@section datastreamutilintro Introduction to the DataStreamUtil Module
The %DataStreamUtil module provides client and server support for YAML-encoded HTTP 1.1 (RFC-2616) chunked transfers where each chunk is a unique data entity; allowing data to be streamed from remote servers and used as soon as it is received. This module provides client and server support to allow %Qore code to deal with data on the sending and receiving ends by taking care of the encoding and serialization issues to allow the serialized data to be sent with optional data compression over standard HTTP chunked transfers and be usable immediately on receipt on the remote end.
In %Qore, DataStream support is implemented on top of and is designed to extend the REST infrastructure provided by the %Qore library.
This module is used automatically by the DataStreamClient and DataStreamRequestHandler user modules; to use this module directly for low-level DataStream protocol support, use \c "%requires DataStreamUtil" in your code.
All the public symbols in the module are defined in the DataStreamUtil namespace
Functions:
- @ref DataStreamUtil::ds_get_content_decode() "ds_get_content_decode()": returns a @ref call_reference "call reference" (or @ref nothing) for decoding content encoded data
- @ref DataStreamUtil::ds_get_content_encode() "ds_get_content_encode()": returns a @ref call_reference "call reference" (or @ref nothing) for encoding content encoded data
- @ref DataStreamUtil::ds_get_send() "ds_get_send()": returns a @ref call_reference "call reference" for serializing and encoding data for sending DataStream chunked data
- @ref DataStreamUtil::ds_get_recv() "ds_get_recv()": returns a @ref call_reference "call reference" for decoding and deserializing data for receiving DataStream chunked data
- @ref DataStreamUtil::ds_set_chunked_headers() "ds_set_chunked_headers()": sets up HTTP headers for DataStream chunked data transfers
- @ref DataStreamUtil::ds_set_non_chunked_headers() "ds_set_non_chunked_headers()": sets up HTTP headers for DataStream non-chunked data transfers
@section datastreamprotocol DataStream Protocol
The DataStream protocol is based on HTTP 1.1 (RFC-2616) chunked transfers where each chunk contains UTF-8 encoded YAML-serialized data with optional compression and where each chunk is an independently decodable and parsable entity. This differs from standard HTTP chunked transfers in that content encoding and semantic completeness of a message are defined over the entire message body. By using DataStream instead of standard HTTP chunked transfer, data can be streamed from one server to another and be usable immediately on receipt on the remote end.
A DataStream transfer with streaming involves at least one chunked transfer; either the request or the reply must be sent with chunked transfer encoding to have the DataStream protocol applied. Non-chunked, monolithic requests and responses are also supported, but in these cases standard HTTP encoding and decoding rules are applied.
A DataStream request-response pair without streaming is equivalent to a standard HTTP request-response pair but with the addition of DataStream headers which are ignored in the case that no chunked transfers are made.
DataStream runs over HTTP 1.1 (RFC-2616) and uses standard HTTP features with custom headers to identify the data serialization, character encoding, and content encoding applied to each chunk. DataStream is currently defined using UTF-8 encoded YAML for data serialization, but was designed to be extensible for future use with other data serialization methods through the use of appropriate headers.
@note RFC 6648 deprecates the use of an \c "X-" prefix in non-standardized HTTP headers and trailers, therefore no such prefix exists for DataStream headers and trailers.
@subsection datastreamprotocoldata DataStream Data Serialization
DataStream uses UTF-8 encoded YAML for data serialization to allow for maximum data fidelity over the HTTP link.
For non-chunked messages, the \c Content-Type header is \c "text/x-yaml;charset=utf8" (\c "utf8" is case-insensitive and may contain a hyphen before the 8), and the data is sent in a normal HTTP message body. @ref datastreamprotocolcompression "Data compression" is supported with normal HTTP content encoding as described in the previous link, applied to the serialized YAML data before sending, and the reverse operation is applied on receipt before YAML data deserialization to native data structures by the receiver.
For chunked DataStream requests and responses, the \c Content-Type header is set to \c "application/octet-stream" (to make the chunked data opaque to the standard HTTP protocol since the semantic completeness of the message body is not defined over the entire body but rather over each chunk) and the content-type of each atomic chunk is given by the \c DataStream-Content-Type header which should be set to \c "text/x-yaml;charset=utf8" (\c "utf8" is case-insensitive and may contain a hyphen before the 8). Data compression is supported and is applied to each chunk atomically as DataStream-specific content encoding after chunk data serialization; the reverse operation is applied before YAML data deserialization to native data structures by the receiver.
@subsection datastreamprocotolchunkedrequests DataStream Chunked Request Headers
The following headers are set with DataStream chunked requests:\n
DataStream Chunked Request Headers
@htmlonly @endhtmlonly
Header |
Value |
Description |
\c Content-Type |
\c application/octet-stream |
MUST be present to make the chunked data opaque to the standard HTTP protocol since the semantic completeness of the message body is not defined over the entire body but rather over each chunk |
\c Accept |
text/x-yaml,application/octet-stream |
other media types MAY be included, but at least the following MUST be included:\n - \c text/x-yaml: MUST be included in case a non-chunked response is returned\n - \c application/octet-stream: MUST be included in case of a DataStream chunked response |
[\c Accept-Encoding] |
gzip,bzip2,deflate |
optional header declaring the content encoding methods supported by the sender (if present, clients MUST set this to the same value as the \c DataStream-Accept-Encoding header) |
\c DataStream-Content-Type |
text/x-yaml;charset=utf8 |
MUST be present to identify the content type of each chunk as YAML-encoded data (\c "utf8" is case-insensitive and may contain a hyphen before the 8) |
\c DataStream-Accept |
\c text/x-yaml |
clients MUST include this header with this value to indicate that the requestor can accept DataStream responses; the server MAY still reply with a non-chunked response; if a DataStream server receives a request without this header, then no DataStream reply can be returned; either a monolithic HTTP reply must be returned or a 406 \c "Not Acceptable" error must be returned |
[\c DataStream-Accept-Encoding] |
gzip,bzip2,deflate |
optional header declaring the DataStream content encoding methods supported by the sender (if present, clients MUST set this to the same value as the \c Accept-Encoding header) |
[\c DataStream-Content-Encoding] |
one of identity, bzip2, gzip, or deflate |
this header is optional; MUST included if DataStream data compression is used in the request body. This header MUST NOT contain more than one value, if present |
\c Transfer-Encoding |
\c chunked |
MUST be included for HTTP chunked transfers: RFC-2616 3.6.1 Chunked Transfer Encoding |
@note \c "Content-Encoding" and \c "Content-Length" headers MUST NOT be included in DataStream chunked transfers
@subsection datastreamprocotolnonchunkedrequests DataStream Non-Chunked Request Headers
The following headers are set with DataStream non-chunked requests:\n
DataStream Non-Chunked Request Headers
Header |
Value |
Description |
\c Accept |
text/x-yaml,application/octet-stream |
other media types MAY be included, but at least the following MUST be included:\n - \c text/x-yaml: MUST be included in case a non-chunked response is returned\n - \c application/octet-stream: MUST be included in case of a DataStream chunked response |
[\c Accept-Encoding] |
gzip,bzip2,deflate |
optional header declaring the content encoding methods supported by the sender (if present, clients MUST set this to the same value as the \c DataStream-Accept-Encoding header) |
\c DataStream-Accept |
\c text/x-yaml |
clients MUST include this header with this value to indicate that the requestor can accept DataStream responses; the server MAY still reply with a non-chunked response; if a DataStream server receives a request without this header, then no DataStream reply can be returned; either a monolithic HTTP reply must be returned or a 406 \c "Not Acceptable" error must be returned |
[\c DataStream-Accept-Encoding] |
gzip,bzip2,deflate |
optional header declaring the DataStream content encoding methods supported by the sender (if present, clients MUST set this to the same value as the \c Accept-Encoding header) |
[\c Content-Type] |
text/x-yaml;charset=utf8 |
MUST be included in requests with a message body; this reflects the content type of the body as YAML encoded data; MAY be included in requests without a message body in which case it MUST be ignored by the server (\c "utf8" is case-insensitive and may contain a hyphen before the 8) |
[\c Content-Encoding] |
one of identity, bzip2, gzip, or deflate |
this header is optional; MUST included if data compression is used in the request body |
[\c Content-Length] |
number |
This header is required in non-chunked requests with a message body |
@subsection datastreamprocotolchunkedresponses DataStream Chunked Response Headers
The following headers are set with DataStream chunked responses:\n
DataStream Chunked Response Headers
\c Content-Type |
\c application/octet-stream |
MUST be present to make the chunked data opaque to the standard HTTP protocol since the semantic completeness of the message body is not defined over the entire body but rather over each chunk |
\c DataStream-Content-Type |
\c text/x-yaml;charset=utf8 |
MUST be present to identify the content type of each chunk as YAML-encoded data (\c "utf8" is case-insensitive and may contain a hyphen before the 8) |
[\c DataStream-Content-Encoding] |
one of identity, bzip2, gzip, or deflate |
this header is optional; MUST included if DataStream data compression is used in the request body. This header MUST NOT contain more than one value, if present |
\c Transfer-Encoding |
\c chunked |
MUST be included for HTTP chunked transfers: RFC-2616 3.6.1 Chunked Transfer Encoding |
\c Trailer |
\c DataStream-Error |
MUST be included as this trailer record will be sent after chunked data is transferred if an error occurs on the sending side, in which case the trailer will be assigned a string giving information about the error that occurred |
@note \c "Content-Encoding" and \c "Content-Length" headers MUST NOT be included in DataStream chunked transfers
@subsection datastreamprocotolnonchunkedresponses DataStream Non-Chunked Response Headers
The following headers are set with DataStream non-chunked responses:\n
DataStream Non-Chunked Response Headers
\c Content-Type |
text/x-yaml;charset=utf8 |
MUST be included in responses with a message body; this reflects the content type of the body as YAML encoded data; MAY be included in responses without a message body in which case it MUST be ignored by the client (\c "utf8" is case-insensitive and may contain a hyphen before the 8) |
\c Content-Length |
number |
MUST be included in non-chunked responses with a message body |
[\c Content-Encoding] |
one of identity, bzip2, gzip, or deflate |
this header is optional; MUST included if data compression is used in the response body |
@subsection datastreamtrailers DataStream Trailers
The following trailer may be sent with DataStream chunk responses after all data has been transferred:\n
DataStream Chunked Response Trailers
\c DataStream-Error |
This trailer is sent when the chunked data transfer is complete if there were any errors on the sending side. If so, the value will be a string describing the error |
@subsection datastreamprotocolcompression DataStream Compression
Compression of chunked message bodies is supported by applying DataStream content encoding as specified by the \c DataStream-Content-Encoding header on each chunk individually (after YAML data serialization) before sending and then applying the reverse operation each chunk immediately after reception and before YAML deserialization; this is analogous to standard HTTP content encoding (which is applied to the message body as a whole) but is applied to each chunk separately.
Data compression is identified in a DataStream transfer by the following header:
- \c "DataStream-Content-Encoding": set to one of \c "identity", \c "bzip2", \c "gzip", or \c "deflate" if data compression is used
DataStream server implementations MUST support at least the above content encoding methods. This allows clients to include DataStream compression with the first request in case of streaming data to the server.
DataStream clients claim support for these content encoding methods by including them in the \c Accept-Encoding and \c DataStream-Accept-Encoding headers in the request; both of these headers must contain the same values in client requests.
@see RFC-2616
@subsection datastreamrequestexample Example DataStream Request
@verbatim
PUT /api/system?action=dataStream HTTP/1.1
Accept: text/x-yaml,application/x-yaml,text/xml,application/xml,application/json,application/octet-stream
User-Agent: Qore-DataStreamClient/1.0.1
Content-Type: application/octet-stream
DataStream-Content-Type: text/x-yaml;charset=utf8
DataStream-Accept: text/x-yaml
DataStream-Accept-Encoding: gzip,bzip2,deflate
DataStream-Content-Encoding: gzip
Transfer-Encoding: chunked
Accept-Encoding: bzip2,deflate
Connection: Keep-Alive
Host: localhost:8001
@endverbatim
@subsection datastreamresponseexample Example DataStream Response
@verbatim
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Transfer-Encoding: chunked
Datastream-Content-Type: text/x-yaml;charset=utf8
Datastream-Content-Encoding: bzip2
Trailer: DataStream-Error
Connection: Keep-Alive
Date: Sun, 20 Apr 2014 07:49:51 GMT
Server: Qorus-HTTP-Server/0.3.7
@endverbatim
@section datastreamutilrelnotes Release Notes
@subsection datastreamutil_v1_0_1 DataStreamUtil v1.0.1
- fixed a bug handling chunked non-DataStream messages (issue 1438)
@subsection datastreamutil_v1_0 DataStreamUtil v1.0
- initial release of the module
*/
//! the DataStreamUtil namespace contains all the public objects in the DataStreamUtil module
namespace DataStreamUtil {
//! @defgroup DataStramHeaders Data Stream Headers
/** These are the data stream header values for HTTP chunked transfers where each chunk is encoded separately
*/
//@{
//! HTTP header for the data stream Content-Type header equivalent where each HTTP chunk is encoded/decoded separately
const DataStreamContentType = "DataStream-Content-Type";
//! HTTP header for the data stream Content-Encoding header equivalent where each HTTP chunk is encoded/decoded separately
const DataStreamContentEncoding = "DataStream-Content-Encoding";
//! HTTP header for the data stream Accept header equivalent where each HTTP chunk is encoded/decoded separately
const DataStreamAccept = "DataStream-Accept";
//! HTTP header for the data stream Accept-Encoding header equivalent where each HTTP chunk is encoded/decoded separately
const DataStreamAcceptEncoding = "DataStream-Accept-Encoding";
//! HTTP trailer to be sent after chunked data has been transferred in case of an error on the sending side, giving a string describing the error
const DataStreamError = "DataStream-Error";
//! supported values for the DataStream-Accept-Encoding header
const DataStreamContentEncodingHash = (
"gzip": True,
"bzip2": True,
"deflate": True,
"identity": True,
);
//@}
const DataStreamDeserializeYaml = (
"code": "yaml",
"in": \parse_yaml(),
);
const DataStreamDeserializeXmlRpc = (
"code": "xml",
"in": \parse_xmlrpc_value(),
);
const DataStreamDeserializationSupport = (
MimeTypeYamlRpc: DataStreamDeserializeYaml,
MimeTypeYaml: DataStreamDeserializeYaml,
MimeTypeJson: (
"code": "json",
"in": \parseJSON(),
),
MimeTypeXml: DataStreamDeserializeXmlRpc,
MimeTypeXmlApp: DataStreamDeserializeXmlRpc,
);
//! returns a @ref call_reference "call reference" (or @ref nothing) based on an optional \c "Content-Encoding" header value for decoding HTTP encoded data
/** @par Example:
@code
*code decode = ds_get_content_decode(hdr."content-encoding");
@endcode
The following \c "Content-Encoding" values are recognized:
- \c "deflate", \c "x-deflate": returns a call reference to @ref Qore::uncompress_to_string() "uncompress_to_string()"
- \c "gzip", \c "x-gzip": returns a call reference to @ref Qore::gunzip_to_string() "gunzip_to_string()"
- \c "bzip2", \c "x-bzip2": returns a call reference to @ref Qore::bunzip2_to_string() "bunzip2_to_string()"
- \c "identity", @ref nothing: returns @ref nothing
@return a @ref call_reference "call reference" (or @ref nothing) based on an optional \c "Content-Encoding" header value for decoding HTTP encoded data
@throw DESERIALIZATION-ERROR this exception is thrown if the content encoding is not recognized
@note HTTP headers in requests are converted to lower-case only on receipt
*/
__7_ code ds_get_content_decode(__7_ string ce);
//! returns a @ref call_reference "call reference" (or @ref nothing) based on an optional \c "Content-Encoding" header value for encoding HTTP encoded data
/** @par Example:
@code
*code decode = ds_get_content_encode("bzip2");
@endcode
The following \c "Content-Encoding" values are recognized:
- \c "deflate": returns a call reference to @ref Qore::compress() "compress()"
- \c "gzip": returns a call reference to @ref Qore::gzip() "gzip()"
- \c "bzip2": returns a call reference to @ref Qore::bzip2() "bzip2()"
- \c "identity", @ref nothing: returns @ref nothing
@return a @ref call_reference "call reference" (or @ref nothing) based on an optional \c "Content-Encoding" header value for encoding HTTP encoded data
@throw SERIALIZATION-ERROR this exception is thrown if the content encoding is not recognized
*/
__7_ code ds_get_content_encode(__7_ string ce);
//! returns a @ref call_reference "call reference" useful for sending HTTP chunked data with %Qore methods taking send callbacks; data is encoded with YAML and optionally a @ref ds_get_content_encode "content encoding" @ref call_reference "call reference" or @ref closure "closure"
/** @par Example:
@code
httpclient.sendWithSendCallback(ds_get_send(scb, enc_func), method, path, hdr, timeout_ms, False, \info);
@endcode
@param scb the send data callback that should return data for sending
@param enc_func an optional @ref call_reference "call reference" or @ref closure "closure" for performing content encoding after YAML serialization; see @ref ds_get_content_encode()
@return a @ref call_reference "call reference" useful for sending HTTP chunked data with %Qore methods taking send callbacks; data is encoded with YAML and optionally a @ref ds_get_content_encode "content encoding" @ref call_reference "call reference" or @ref closure "closure"; the return value of this function is design to be used as the send callback parameter \a scb in the following methods:
- @ref Qore::HTTPClient::sendWithSendCallback() "HTTPClient::sendWithSendCallback()"
- @ref Qore::HTTPClient::sendWithCallbacks() "HTTPClient::sendWithCallbacks()"
- @ref Qore::Socket::sendHTTPMessageWithCallback() "Socket::sendHTTPMessageWithCallback()"
- @ref Qore::Socket::sendHTTPResponseWithCallback() "Socket::sendHTTPResponseWithCallback()"
@note if using content encoding; the appropriate \c "DataStream-Content-Encoding" header will be added by @ref ds_set_chunked_headers() if the \a content_encoding argument is used
*/
code ds_get_send(code scb, __7_ code enc_func);
//! returns a @ref call_reference "call reference" useful for receiving HTTP chunked data with %Qore methods taking receive callbacks; YAML data is deserialized and passed as %Qore data to the given callback argument
/** @par Example:
@code
httpclient.sendWithRecvCallback(ds_get_recv(rcb), body, method, path, hdr, timeout_ms, False, \info);
@endcode
@param rcb the data callback; this is called once for each deserialized chunked transfer from the remote end with the deserialized data, and then once with @ref nothing when all data has been received
@param ecb the "end of data" callback; this must accept a @ref string_or_nothing "*string" argument; this is called with no arguments once all data has been received if the sender does not report a send error, otherwise it's called with a single string giving the send error reported by the sending side in the \c DataStream-Error trailer record
@param bcb an optional "body callback"; if this argument is passed, it must take a string argument which will be the content-decoded raw message body before any data deserialization and a second string argument giving the content-type of the body
@return a @ref call_reference "call reference" useful for receiving HTTP chunked data with %Qore methods taking receive callbacks; when HTTP headers are received, the closure sets up content decoding by calling @ref ds_get_content_decode() on the \c "Content-Encoding" or "DataStream-Content-Encoding" header values; when raw chunked data is received, first the chunked body is decoded (if content encoding was indicated in the message headers), then YAML data is deserialized and passed as %Qore data to the given callback argument; the return value of this function is design to be used as the receive callback parameter \a rcb in the following methods:
- @ref Qore::HTTPClient::sendWithRecvCallback() "HTTPClient::sendWithRecvCallback()"
- @ref Qore::HTTPClient::sendWithCallbacks() "HTTPClient::sendWithCallbacks()"
- @ref Qore::Socket::readHTTPChunkedBodyBinaryWithCallback() "Socket::readHTTPChunkedBodyBinaryWithCallback()"
- @ref Qore::Socket::readHTTPChunkedBodyWithCallback() "Socket::readHTTPChunkedBodyWithCallback()"
@note the callback returned here can throw a \c "DESERIALIZATION-ERROR" if the header's \c "Content-Type" or "DataStream-Content-Type" is not \c "text/yaml" or the \c "Content-Encoding" or \c "DataStream-Content-Encoding" headers give unrecognized content encodings.
*/
code ds_get_recv(code rcb, code ecb, __7_ code bcb);
//! sets up HTTP headers for DataStream chunked data transfers
/** @par Example:
@code{.py}
ds_set_chunked_headers(\hdr, "bzip");
@endcode
@param hdr a reference to a hash of message headers when sending chunked DataStream data; the following headers are set:
- \c "Content-Type: application/octet-stream" (set if header not already present in hash)
- \c "DataStream-Content-Type: text/x-yaml;charset=utf8" (set if header not already present in hash)
- \c "DataStream-Content-Encoding": set to the \c content_encoding argument, if present
- \c "Transfer-Encoding: chunked"
.
For requests, the following headers are processed:
- \c "Accept: text/x-yaml,application/octet-stream" (set if header not already present in hash)
- \c "DataStream-Accept: text/x-yaml" (set if header not already present in hash)
@param content_encoding an optional string giving the \c "DataStream-Content-Encoding" header to set (must be a recognized content encoding as recognized by @ref ds_get_content_encode())
@param req set to @ref Qore::True "True" if the headers are required for a request
*/
nothing ds_set_chunked_headers(reference hdr, __7_ string content_encoding, __7_ softbool req);
//! sets up HTTP headers for DataStream non-chunked data transfers
/** @par Example:
@code{.py}
ds_set_non_chunked_headers(\hdr, "bzip");
@endcode
@param hdr a reference to a hash of message headers when sending chunked DataStream data; the following headers are set:
- \c "Content-Type: text/x-yaml" (set if header not already present in hash)
.
For requests, the following headers are processed:
- \c "Accept: text/x-yaml,application/octet-stream" (set if header not already present in hash)
- \c "DataStream-Accept: text/x-yaml" (set if header not already present in hash)
@param content_encoding an optional string giving the \c "Content-Encoding" header to set (must be a recognized content encoding as recognized by @ref ds_get_content_encode())
@param req set to @ref Qore::True "True" if the headers are required for a request
*/
nothing ds_set_non_chunked_headers(reference hdr, __7_ string content_encoding, __7_ softbool req);
//! returns the \c "DataStream-Accept-Encoding" header value corresponding to the \c "Accept-Encoding" header passed as an argument, or \c "identity" if none present
/** @par Example:
@code
string dae = ds_get_ds_accept_enc_header(hdr."Accept-Encoding");
@endcode
@param ae the value of the \c "Accept-Encoding" header or \c "identity" if not present
@return the \c "DataStream-Accept-Encoding" value corresponding to the \c "Accept-Encoding" header passed as an argument, or @ref nothing if none should be sent
*/
string ds_get_ds_accept_enc_header(__7_ string ae);
};
// private namespace for non-exported definitions
namespace DataStreamUtilPrivate {
// private function
ds_do_request_headers(reference hdr);
};