TIP #400: SETTING THE COMPRESSION DICTIONARY AND OTHER 'ZLIB' UPDATES ======================================================================= Version: $Revision: 1.7 $ Author: Donal K. Fellows State: Final Type: Project Tcl-Version: 8.6 Vote: Done Created: Friday, 30 March 2012 URL: https://tip.tcl-lang.org400.html Post-History: ------------------------------------------------------------------------- ABSTRACT ========== Sometimes it is necessary to set the compression dictionary so that a sequence of bytes may be compressed more efficiently (and decompressed as well). This TIP exposes that functionality. It also reduces the number of inconsistencies in the *zlib* command. RATIONALE =========== The SPDY protocol extensions to HTTP require the seeding of the zlib compression dictionary (which greatly improves the performance of compression on small amounts of data, such as HTTP headers). In order to allow a pure Tcl implementation of the SPDY protocol, it is therefore necessary to provide a mechanism whereby the compression dictionary (a byte-array, normally up to 262 bytes long according to the zlib documentation). There is to be no mechanism for retrieving the compression dictionary generated by the compression engine; there is no API for doing that. A side issue discovered during working on this TIP was that there was considerable variation in what could be achieved by various parts of the API. In partcular, it was identified that the API was inconsistent, providing access to some features in "simplified" parts of the API that could not be controlled from the "advanced" parts (e.g., there was no way to set the GZIP header descriptor with *zlib stream gzip*). PROPOSED CHANGES: TCL ======================= CHANGES TO THE CHANNEL TRANSFORMS ----------------------------------- The *zlib push* command will gain two extra options, *-dictionary* and *-limit*: *-dictionary* /bytes/ This option will provide a compression dictionary to be used (/bytes/ is a byte-array used to initialize the compression engine) which will be supplied to the zlib compression engine at the correct moment during compression or provided on request of the compression engine on decompression. The /bytes/ argument must be non-empty if given (we will not enforce a limit on the length of the dictionary, but using an excessively long one may cause the zlib engine to issue errors). This will be illegal to use with *gzip* and *gunzip* streams, and its use with raw (*deflate*) streams will be not recommended due to the difficulty of detecting whether a compression dictionary was applied; the zlib-format header adds very little overhead. This value can also be set with *chan configure*, though doing so after data has started to be pushed through the compression engine (except if an error requesting a compression dictionary was received) is not recommended. *-limit* /size/ This option (valid on the three decompressing transforms only, and where /size/ must be a positive integer of no more than 0x10000) allows for control over the size of chunks read from the underlying channel for feeding into the decompression engine. Its default is 1, which makes for the correct behavior under the widest range of conditions, but at a significant cost in terms of computational complexity: when the underlying data source is known to never block for long and to have complete data, a larger value can be used which will greatly improve performance. This value can be set at runtime using *chan configure*. CHANGES TO THE STREAMS ------------------------ The *zlib stream* command will also gain some complexity. In particular, the *compress*, *decompress*, *deflate* and *inflate* subcommands will gain the ability to take an extra *-dictionary* /bytes/ pair of options (same interpretation as above), as will the *add* and *put* subcommands of the stream instance command. In addition (as a correction to the functionality originally proposed in [TIP #234]) the *zlib stream gzip* subcommand will also gain the ability to take: *-header* /dict/ (where /dict/ is a Tcl dictionary such as is passed to the *-header* option to *zlib gzip* and not a compression dictionary), and the stream instance subcommand will gain a *header* subcommand to retrieve the gzip header (it will be an error to use it on a stream not produced by *zlib stream gunzip*). In order to facilitate the above change, the compression level used in that case will be altered to be specified via an option: *-level* /compressionLevel/ PROPOSED CHANGE: C ==================== At the C level, one additional function will be provided: void *Tcl_ZlibStreamSetCompressionDictionary*(Tcl_ZlibStream /zshandle/, Tcl_Obj */compressionDictionaryObj/) This sets the compression dictionary for a particular stream to the given (byte-array) Tcl_Obj, which will be duplicated. It is the caller's responsibility to dispose of the object passed in if they allocated it; they may do so immediately after calling this function. COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows