TIP #287: Add a Commands for Determining Size of Buffered Data

TIP:	287
Title:	Add a Commands for Determining Size of Buffered Data
Version:	$Revision: 1.12 $
Author:	Michael A. Cleverly <michael at cleverly dot com>
State:	Final
Type:	Project
Tcl-Version:	8.5
Vote:	Done
Created:	Thursday, 26 October 2006
Keywords:	Tcl, channel, chan, pendinginput, pendingoutput

Abstract

Many network servers programmed in Tcl (including the venerable tclhttpd) are vulnerable to DoS (denial of service) attacks because they lack any way to introspect the amount of buffered data on a non-blocking socket that is in line buffering mode. This TIP proposes a new subcommand to chan to allow the amount of buffered input and buffered output (for symmetry) to be inspected from Tcl.

Rationale

Many network protocols are inherently line-oriented (HTTP, SMTP, etc.) and the natural approach to implementing servers for these protocols in Tcl is to configure the incoming client sockets to use non-blocking I/O and to have line buffering and then define a readable fileevent callback.

    proc accept {sock addr port} {
        fconfigure $sock -buffering line -blocking 0
        fileevent $sock readable [list callback $sock ...]
    }
    socket -server accept $listenPort

Recall that a readable fileevent will be called even when there is an incomplete line buffered. As the fileevent manual page states:

A channel is considered to be readable if there is unread data available on the underlying device. A channel is also considered to be readable if there is unread data in an input buffer, except in the special case where the most recent attempt to read from the channel was a gets call that could not find a complete line in the input buffer.

The fblocked (and in 8.5 chan blocked) command provides the Tcl programmer a means to test whether:

the most recent input operation ... returned less information than requested because all available input was exhausted.

There is currently no way at the Tcl level to see how much data is buffered and could be read safely (via read instead of gets).

There is also no way to specify any kind of upper limit on the length of a line; when in line-buffering mode all input is buffered until an end-of-line sequence is encountered or the EOF on the channel is reached.

The practical result is that all network daemons written in Tcl using line-oriented I/O (gets) can be fed repeated input lacking an end-of-line sequence until all physical memory is exhausted.

This vulnerability has been recognized since at least 2001. See, for example, the discussion between George Peter Staplin and Donald Porter on the gets page on the Tcl'ers Wiki [1].

Proposed Change

At the C level Tcl already has a function, Tcl_InputBuffered which returns the number of unread bytes buffered for a channel and a corresponding Tcl_OutputBuffered which returns the number of bytes buffered for output that have not yet been flushed out.

This TIP proposes to implement a new chan pending command which will take two arguments: a mode and a channelId. The mode argument can be either input or output.

When the mode is input the command returns the value of Tcl_InputBuffered() (if the channel was open for input or -1 otherwise).

When the mode is output the command returns the value of Tcl_OutputBuffered() (if the channel was output for output or -1 otherwise).

This allows a programmer developing network daemons at the Tcl level to implement their own policy decisions based on the size of the unread line. Potential DoS situations could be avoided (in an application specific manner) long before all memory was exhausted.

 if {[chan blocked $sock] && [chan pending input $sock] > $limit} {
     # Take application specific steps (i.e., [close $sock] or
     # [read $sock] to process a partial line and drain the buffer, etc.)
 }

Rejected Alternatives

Adding a flag to fblocked to return the number of unread bytes instead of just 0 or 1 (since fblocked is now considered deprecated as per TIP #208).
Polluting the global namespace with a new favailable, fpending or fqueued command.
A chan unread because of potential confusion as to whether it performed ungetch() type functionality (un-reed vs un-red).
Any sort of -maxchars or -maxbytes flag to gets in order to not complicate the semantics of gets. Additionally without even further complicating gets semantics one could not distinguish input of exactly $limit characters from the case where only $limit characters were returned (with some input remaining unread).
The initial version of this TIP called for a chan available command. This was changed to pendinginput (and pendingoutput added for symmetry's sake) following suggestions on news:comp.lang.tcl from Donald Arseneau and Donal Fellows, and later to chan pending that takes a mode argument (input or output) based on suggestions from Donald Porter and Joe English.

Reference Implementation

[RFE 1586860] at SourceForge now contains a patch implementing chan pendinginput and chan pendingoutput (including updated chan man page and corresponding test cases) [2].

Copyright

This document is in the public domain.

[Index] [History] [HTML Format] [Source Format] [LaTeX Format] [Text Format] [XML Format] [*roff Format (experimental)] [RTF Format (experimental)]

TIP AutoGenerator - written by Donal K. Fellows