// -*- mode: c++; indent-tabs-mode: nil -*- // @file DataStreamUtil.qm Qore user module implementing support for the DataStream protocol: YAML-encoded HTTP chunked transfers where each chunk is a unique data entity /* DataStreamUtil.qm Copyright (C) 2014 - 2016 Qore Technologies, s.r.o. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ // this module requires Qore 0.8.12 or better // require type definitions everywhere // enable all warnings // do not use $ signs in declarations /* Version History: see docs below */ /** @mainpage DataStreamUtil Module @tableofcontents @section datastreamutilintro Introduction to the DataStreamUtil Module The %DataStreamUtil module provides client and server support for YAML-encoded HTTP 1.1 (RFC-2616) chunked transfers where each chunk is a unique data entity; allowing data to be streamed from remote servers and used as soon as it is received. This module provides client and server support to allow %Qore code to deal with data on the sending and receiving ends by taking care of the encoding and serialization issues to allow the serialized data to be sent with optional data compression over standard HTTP chunked transfers and be usable immediately on receipt on the remote end. In %Qore, DataStream support is implemented on top of and is designed to extend the REST infrastructure provided by the %Qore library. This module is used automatically by the DataStreamClient and DataStreamRequestHandler user modules; to use this module directly for low-level DataStream protocol support, use \c "%requires DataStreamUtil" in your code. All the public symbols in the module are defined in the DataStreamUtil namespace Functions: - @ref DataStreamUtil::ds_get_content_decode() "ds_get_content_decode()": returns a @ref call_reference "call reference" (or @ref nothing) for decoding content encoded data - @ref DataStreamUtil::ds_get_content_encode() "ds_get_content_encode()": returns a @ref call_reference "call reference" (or @ref nothing) for encoding content encoded data - @ref DataStreamUtil::ds_get_send() "ds_get_send()": returns a @ref call_reference "call reference" for serializing and encoding data for sending DataStream chunked data - @ref DataStreamUtil::ds_get_recv() "ds_get_recv()": returns a @ref call_reference "call reference" for decoding and deserializing data for receiving DataStream chunked data - @ref DataStreamUtil::ds_set_chunked_headers() "ds_set_chunked_headers()": sets up HTTP headers for DataStream chunked data transfers - @ref DataStreamUtil::ds_set_non_chunked_headers() "ds_set_non_chunked_headers()": sets up HTTP headers for DataStream non-chunked data transfers @section datastreamprotocol DataStream Protocol The DataStream protocol is based on HTTP 1.1 (RFC-2616) chunked transfers where each chunk contains UTF-8 encoded YAML-serialized data with optional compression and where each chunk is an independently decodable and parsable entity. This differs from standard HTTP chunked transfers in that content encoding and semantic completeness of a message are defined over the entire message body. By using DataStream instead of standard HTTP chunked transfer, data can be streamed from one server to another and be usable immediately on receipt on the remote end. A DataStream transfer with streaming involves at least one chunked transfer; either the request or the reply must be sent with chunked transfer encoding to have the DataStream protocol applied. Non-chunked, monolithic requests and responses are also supported, but in these cases standard HTTP encoding and decoding rules are applied. A DataStream request-response pair without streaming is equivalent to a standard HTTP request-response pair but with the addition of DataStream headers which are ignored in the case that no chunked transfers are made. DataStream runs over HTTP 1.1 (RFC-2616) and uses standard HTTP features with custom headers to identify the data serialization, character encoding, and content encoding applied to each chunk. DataStream is currently defined using UTF-8 encoded YAML for data serialization, but was designed to be extensible for future use with other data serialization methods through the use of appropriate headers. @note RFC 6648 deprecates the use of an \c "X-" prefix in non-standardized HTTP headers and trailers, therefore no such prefix exists for DataStream headers and trailers. @subsection datastreamprotocoldata DataStream Data Serialization DataStream uses UTF-8 encoded YAML for data serialization to allow for maximum data fidelity over the HTTP link. For non-chunked messages, the \c Content-Type header is \c "text/x-yaml;charset=utf8" (\c "utf8" is case-insensitive and may contain a hyphen before the 8), and the data is sent in a normal HTTP message body. @ref datastreamprotocolcompression "Data compression" is supported with normal HTTP content encoding as described in the previous link, applied to the serialized YAML data before sending, and the reverse operation is applied on receipt before YAML data deserialization to native data structures by the receiver. For chunked DataStream requests and responses, the \c Content-Type header is set to \c "application/octet-stream" (to make the chunked data opaque to the standard HTTP protocol since the semantic completeness of the message body is not defined over the entire body but rather over each chunk) and the content-type of each atomic chunk is given by the \c DataStream-Content-Type header which should be set to \c "text/x-yaml;charset=utf8" (\c "utf8" is case-insensitive and may contain a hyphen before the 8). Data compression is supported and is applied to each chunk atomically as DataStream-specific content encoding after chunk data serialization; the reverse operation is applied before YAML data deserialization to native data structures by the receiver. @subsection datastreamprocotolchunkedrequests DataStream Chunked Request Headers The following headers are set with DataStream chunked requests:\n DataStream Chunked Request Headers @htmlonly @endhtmlonly
Header Value Description
\c Content-Type \c application/octet-stream MUST be present to make the chunked data opaque to the standard HTTP protocol since the semantic completeness of the message body is not defined over the entire body but rather over each chunk
\c Accept text/x-yaml,application/octet-stream other media types MAY be included, but at least the following MUST be included:\n - \c text/x-yaml: MUST be included in case a non-chunked response is returned\n - \c application/octet-stream: MUST be included in case of a DataStream chunked response
[\c Accept-Encoding] gzip,bzip2,deflate optional header declaring the content encoding methods supported by the sender (if present, clients MUST set this to the same value as the \c DataStream-Accept-Encoding header)
\c DataStream-Content-Type text/x-yaml;charset=utf8 MUST be present to identify the content type of each chunk as YAML-encoded data (\c "utf8" is case-insensitive and may contain a hyphen before the 8)
\c DataStream-Accept \c text/x-yaml clients MUST include this header with this value to indicate that the requestor can accept DataStream responses; the server MAY still reply with a non-chunked response; if a DataStream server receives a request without this header, then no DataStream reply can be returned; either a monolithic HTTP reply must be returned or a 406 \c "Not Acceptable" error must be returned
[\c DataStream-Accept-Encoding] gzip,bzip2,deflate optional header declaring the DataStream content encoding methods supported by the sender (if present, clients MUST set this to the same value as the \c Accept-Encoding header)
[\c DataStream-Content-Encoding] one of identity, bzip2, gzip, or deflate this header is optional; MUST included if DataStream data compression is used in the request body. This header MUST NOT contain more than one value, if present
\c Transfer-Encoding \c chunked MUST be included for HTTP chunked transfers: RFC-2616 3.6.1 Chunked Transfer Encoding
@note \c "Content-Encoding" and \c "Content-Length" headers MUST NOT be included in DataStream chunked transfers @subsection datastreamprocotolnonchunkedrequests DataStream Non-Chunked Request Headers The following headers are set with DataStream non-chunked requests:\n DataStream Non-Chunked Request Headers
Header Value Description
\c Accept text/x-yaml,application/octet-stream other media types MAY be included, but at least the following MUST be included:\n - \c text/x-yaml: MUST be included in case a non-chunked response is returned\n - \c application/octet-stream: MUST be included in case of a DataStream chunked response
[\c Accept-Encoding] gzip,bzip2,deflate optional header declaring the content encoding methods supported by the sender (if present, clients MUST set this to the same value as the \c DataStream-Accept-Encoding header)
\c DataStream-Accept \c text/x-yaml clients MUST include this header with this value to indicate that the requestor can accept DataStream responses; the server MAY still reply with a non-chunked response; if a DataStream server receives a request without this header, then no DataStream reply can be returned; either a monolithic HTTP reply must be returned or a 406 \c "Not Acceptable" error must be returned
[\c DataStream-Accept-Encoding] gzip,bzip2,deflate optional header declaring the DataStream content encoding methods supported by the sender (if present, clients MUST set this to the same value as the \c Accept-Encoding header)
[\c Content-Type] text/x-yaml;charset=utf8 MUST be included in requests with a message body; this reflects the content type of the body as YAML encoded data; MAY be included in requests without a message body in which case it MUST be ignored by the server (\c "utf8" is case-insensitive and may contain a hyphen before the 8)
[\c Content-Encoding] one of identity, bzip2, gzip, or deflate this header is optional; MUST included if data compression is used in the request body
[\c Content-Length] number This header is required in non-chunked requests with a message body
@subsection datastreamprocotolchunkedresponses DataStream Chunked Response Headers The following headers are set with DataStream chunked responses:\n DataStream Chunked Response Headers
\c Content-Type \c application/octet-stream MUST be present to make the chunked data opaque to the standard HTTP protocol since the semantic completeness of the message body is not defined over the entire body but rather over each chunk
\c DataStream-Content-Type \c text/x-yaml;charset=utf8 MUST be present to identify the content type of each chunk as YAML-encoded data (\c "utf8" is case-insensitive and may contain a hyphen before the 8)
[\c DataStream-Content-Encoding] one of identity, bzip2, gzip, or deflate this header is optional; MUST included if DataStream data compression is used in the request body. This header MUST NOT contain more than one value, if present
\c Transfer-Encoding \c chunked MUST be included for HTTP chunked transfers: RFC-2616 3.6.1 Chunked Transfer Encoding
\c Trailer \c DataStream-Error MUST be included as this trailer record will be sent after chunked data is transferred if an error occurs on the sending side, in which case the trailer will be assigned a string giving information about the error that occurred
@note \c "Content-Encoding" and \c "Content-Length" headers MUST NOT be included in DataStream chunked transfers @subsection datastreamprocotolnonchunkedresponses DataStream Non-Chunked Response Headers The following headers are set with DataStream non-chunked responses:\n DataStream Non-Chunked Response Headers
\c Content-Type text/x-yaml;charset=utf8 MUST be included in responses with a message body; this reflects the content type of the body as YAML encoded data; MAY be included in responses without a message body in which case it MUST be ignored by the client (\c "utf8" is case-insensitive and may contain a hyphen before the 8)
\c Content-Length number MUST be included in non-chunked responses with a message body
[\c Content-Encoding] one of identity, bzip2, gzip, or deflate this header is optional; MUST included if data compression is used in the response body
@subsection datastreamtrailers DataStream Trailers The following trailer may be sent with DataStream chunk responses after all data has been transferred:\n DataStream Chunked Response Trailers
\c DataStream-Error This trailer is sent when the chunked data transfer is complete if there were any errors on the sending side. If so, the value will be a string describing the error
@subsection datastreamprotocolcompression DataStream Compression Compression of chunked message bodies is supported by applying DataStream content encoding as specified by the \c DataStream-Content-Encoding header on each chunk individually (after YAML data serialization) before sending and then applying the reverse operation each chunk immediately after reception and before YAML deserialization; this is analogous to standard HTTP content encoding (which is applied to the message body as a whole) but is applied to each chunk separately. Data compression is identified in a DataStream transfer by the following header: - \c "DataStream-Content-Encoding": set to one of \c "identity", \c "bzip2", \c "gzip", or \c "deflate" if data compression is used DataStream server implementations MUST support at least the above content encoding methods. This allows clients to include DataStream compression with the first request in case of streaming data to the server. DataStream clients claim support for these content encoding methods by including them in the \c Accept-Encoding and \c DataStream-Accept-Encoding headers in the request; both of these headers must contain the same values in client requests. @see RFC-2616 @subsection datastreamrequestexample Example DataStream Request @verbatim PUT /api/system?action=dataStream HTTP/1.1 Accept: text/x-yaml,application/x-yaml,text/xml,application/xml,application/json,application/octet-stream User-Agent: Qore-DataStreamClient/1.0.1 Content-Type: application/octet-stream DataStream-Content-Type: text/x-yaml;charset=utf8 DataStream-Accept: text/x-yaml DataStream-Accept-Encoding: gzip,bzip2,deflate DataStream-Content-Encoding: gzip Transfer-Encoding: chunked Accept-Encoding: bzip2,deflate Connection: Keep-Alive Host: localhost:8001 @endverbatim @subsection datastreamresponseexample Example DataStream Response @verbatim HTTP/1.1 200 OK Content-Type: application/octet-stream Transfer-Encoding: chunked Datastream-Content-Type: text/x-yaml;charset=utf8 Datastream-Content-Encoding: bzip2 Trailer: DataStream-Error Connection: Keep-Alive Date: Sun, 20 Apr 2014 07:49:51 GMT Server: Qorus-HTTP-Server/0.3.7 @endverbatim @section datastreamutilrelnotes Release Notes @subsection datastreamutil_v1_0_1 DataStreamUtil v1.0.1 - fixed a bug handling chunked non-DataStream messages (issue 1438) @subsection datastreamutil_v1_0 DataStreamUtil v1.0 - initial release of the module */ //! the DataStreamUtil namespace contains all the public objects in the DataStreamUtil module namespace DataStreamUtil { //! @defgroup DataStramHeaders Data Stream Headers /** These are the data stream header values for HTTP chunked transfers where each chunk is encoded separately */ //@{ //! HTTP header for the data stream Content-Type header equivalent where each HTTP chunk is encoded/decoded separately const DataStreamContentType = "DataStream-Content-Type"; //! HTTP header for the data stream Content-Encoding header equivalent where each HTTP chunk is encoded/decoded separately const DataStreamContentEncoding = "DataStream-Content-Encoding"; //! HTTP header for the data stream Accept header equivalent where each HTTP chunk is encoded/decoded separately const DataStreamAccept = "DataStream-Accept"; //! HTTP header for the data stream Accept-Encoding header equivalent where each HTTP chunk is encoded/decoded separately const DataStreamAcceptEncoding = "DataStream-Accept-Encoding"; //! HTTP trailer to be sent after chunked data has been transferred in case of an error on the sending side, giving a string describing the error const DataStreamError = "DataStream-Error"; //! supported values for the DataStream-Accept-Encoding header const DataStreamContentEncodingHash = ( "gzip": True, "bzip2": True, "deflate": True, "identity": True, ); //@} const DataStreamDeserializeYaml = ( "code": "yaml", "in": \parse_yaml(), ); const DataStreamDeserializeXmlRpc = ( "code": "xml", "in": \parse_xmlrpc_value(), ); const DataStreamDeserializationSupport = ( MimeTypeYamlRpc: DataStreamDeserializeYaml, MimeTypeYaml: DataStreamDeserializeYaml, MimeTypeJson: ( "code": "json", "in": \parseJSON(), ), MimeTypeXml: DataStreamDeserializeXmlRpc, MimeTypeXmlApp: DataStreamDeserializeXmlRpc, ); //! returns a @ref call_reference "call reference" (or @ref nothing) based on an optional \c "Content-Encoding" header value for decoding HTTP encoded data /** @par Example: @code *code decode = ds_get_content_decode(hdr."content-encoding"); @endcode The following \c "Content-Encoding" values are recognized: - \c "deflate", \c "x-deflate": returns a call reference to @ref Qore::uncompress_to_string() "uncompress_to_string()" - \c "gzip", \c "x-gzip": returns a call reference to @ref Qore::gunzip_to_string() "gunzip_to_string()" - \c "bzip2", \c "x-bzip2": returns a call reference to @ref Qore::bunzip2_to_string() "bunzip2_to_string()" - \c "identity", @ref nothing: returns @ref nothing @return a @ref call_reference "call reference" (or @ref nothing) based on an optional \c "Content-Encoding" header value for decoding HTTP encoded data @throw DESERIALIZATION-ERROR this exception is thrown if the content encoding is not recognized @note HTTP headers in requests are converted to lower-case only on receipt */ __7_ code ds_get_content_decode(__7_ string ce); //! returns a @ref call_reference "call reference" (or @ref nothing) based on an optional \c "Content-Encoding" header value for encoding HTTP encoded data /** @par Example: @code *code decode = ds_get_content_encode("bzip2"); @endcode The following \c "Content-Encoding" values are recognized: - \c "deflate": returns a call reference to @ref Qore::compress() "compress()" - \c "gzip": returns a call reference to @ref Qore::gzip() "gzip()" - \c "bzip2": returns a call reference to @ref Qore::bzip2() "bzip2()" - \c "identity", @ref nothing: returns @ref nothing @return a @ref call_reference "call reference" (or @ref nothing) based on an optional \c "Content-Encoding" header value for encoding HTTP encoded data @throw SERIALIZATION-ERROR this exception is thrown if the content encoding is not recognized */ __7_ code ds_get_content_encode(__7_ string ce); //! returns a @ref call_reference "call reference" useful for sending HTTP chunked data with %Qore methods taking send callbacks; data is encoded with YAML and optionally a @ref ds_get_content_encode "content encoding" @ref call_reference "call reference" or @ref closure "closure" /** @par Example: @code httpclient.sendWithSendCallback(ds_get_send(scb, enc_func), method, path, hdr, timeout_ms, False, \info); @endcode @param scb the send data callback that should return data for sending @param enc_func an optional @ref call_reference "call reference" or @ref closure "closure" for performing content encoding after YAML serialization; see @ref ds_get_content_encode() @return a @ref call_reference "call reference" useful for sending HTTP chunked data with %Qore methods taking send callbacks; data is encoded with YAML and optionally a @ref ds_get_content_encode "content encoding" @ref call_reference "call reference" or @ref closure "closure"; the return value of this function is design to be used as the send callback parameter \a scb in the following methods: - @ref Qore::HTTPClient::sendWithSendCallback() "HTTPClient::sendWithSendCallback()" - @ref Qore::HTTPClient::sendWithCallbacks() "HTTPClient::sendWithCallbacks()" - @ref Qore::Socket::sendHTTPMessageWithCallback() "Socket::sendHTTPMessageWithCallback()" - @ref Qore::Socket::sendHTTPResponseWithCallback() "Socket::sendHTTPResponseWithCallback()" @note if using content encoding; the appropriate \c "DataStream-Content-Encoding" header will be added by @ref ds_set_chunked_headers() if the \a content_encoding argument is used */ code ds_get_send(code scb, __7_ code enc_func); //! returns a @ref call_reference "call reference" useful for receiving HTTP chunked data with %Qore methods taking receive callbacks; YAML data is deserialized and passed as %Qore data to the given callback argument /** @par Example: @code httpclient.sendWithRecvCallback(ds_get_recv(rcb), body, method, path, hdr, timeout_ms, False, \info); @endcode @param rcb the data callback; this is called once for each deserialized chunked transfer from the remote end with the deserialized data, and then once with @ref nothing when all data has been received @param ecb the "end of data" callback; this must accept a @ref string_or_nothing "*string" argument; this is called with no arguments once all data has been received if the sender does not report a send error, otherwise it's called with a single string giving the send error reported by the sending side in the \c DataStream-Error trailer record @param bcb an optional "body callback"; if this argument is passed, it must take a string argument which will be the content-decoded raw message body before any data deserialization and a second string argument giving the content-type of the body @return a @ref call_reference "call reference" useful for receiving HTTP chunked data with %Qore methods taking receive callbacks; when HTTP headers are received, the closure sets up content decoding by calling @ref ds_get_content_decode() on the \c "Content-Encoding" or "DataStream-Content-Encoding" header values; when raw chunked data is received, first the chunked body is decoded (if content encoding was indicated in the message headers), then YAML data is deserialized and passed as %Qore data to the given callback argument; the return value of this function is design to be used as the receive callback parameter \a rcb in the following methods: - @ref Qore::HTTPClient::sendWithRecvCallback() "HTTPClient::sendWithRecvCallback()" - @ref Qore::HTTPClient::sendWithCallbacks() "HTTPClient::sendWithCallbacks()" - @ref Qore::Socket::readHTTPChunkedBodyBinaryWithCallback() "Socket::readHTTPChunkedBodyBinaryWithCallback()" - @ref Qore::Socket::readHTTPChunkedBodyWithCallback() "Socket::readHTTPChunkedBodyWithCallback()" @note the callback returned here can throw a \c "DESERIALIZATION-ERROR" if the header's \c "Content-Type" or "DataStream-Content-Type" is not \c "text/yaml" or the \c "Content-Encoding" or \c "DataStream-Content-Encoding" headers give unrecognized content encodings. */ code ds_get_recv(code rcb, code ecb, __7_ code bcb); //! sets up HTTP headers for DataStream chunked data transfers /** @par Example: @code{.py} ds_set_chunked_headers(\hdr, "bzip"); @endcode @param hdr a reference to a hash of message headers when sending chunked DataStream data; the following headers are set: - \c "Content-Type: application/octet-stream" (set if header not already present in hash) - \c "DataStream-Content-Type: text/x-yaml;charset=utf8" (set if header not already present in hash) - \c "DataStream-Content-Encoding": set to the \c content_encoding argument, if present - \c "Transfer-Encoding: chunked" . For requests, the following headers are processed: - \c "Accept: text/x-yaml,application/octet-stream" (set if header not already present in hash) - \c "DataStream-Accept: text/x-yaml" (set if header not already present in hash) @param content_encoding an optional string giving the \c "DataStream-Content-Encoding" header to set (must be a recognized content encoding as recognized by @ref ds_get_content_encode()) @param req set to @ref Qore::True "True" if the headers are required for a request */ nothing ds_set_chunked_headers(reference hdr, __7_ string content_encoding, __7_ softbool req); //! sets up HTTP headers for DataStream non-chunked data transfers /** @par Example: @code{.py} ds_set_non_chunked_headers(\hdr, "bzip"); @endcode @param hdr a reference to a hash of message headers when sending chunked DataStream data; the following headers are set: - \c "Content-Type: text/x-yaml" (set if header not already present in hash) . For requests, the following headers are processed: - \c "Accept: text/x-yaml,application/octet-stream" (set if header not already present in hash) - \c "DataStream-Accept: text/x-yaml" (set if header not already present in hash) @param content_encoding an optional string giving the \c "Content-Encoding" header to set (must be a recognized content encoding as recognized by @ref ds_get_content_encode()) @param req set to @ref Qore::True "True" if the headers are required for a request */ nothing ds_set_non_chunked_headers(reference hdr, __7_ string content_encoding, __7_ softbool req); //! returns the \c "DataStream-Accept-Encoding" header value corresponding to the \c "Accept-Encoding" header passed as an argument, or \c "identity" if none present /** @par Example: @code string dae = ds_get_ds_accept_enc_header(hdr."Accept-Encoding"); @endcode @param ae the value of the \c "Accept-Encoding" header or \c "identity" if not present @return the \c "DataStream-Accept-Encoding" value corresponding to the \c "Accept-Encoding" header passed as an argument, or @ref nothing if none should be sent */ string ds_get_ds_accept_enc_header(__7_ string ae); }; // private namespace for non-exported definitions namespace DataStreamUtilPrivate { // private function ds_do_request_headers(reference hdr); };