TIP: 287 Title: Add a Commands for Determining Size of Buffered Data Version: $Revision: 1.12 $ Author: Michael A. Cleverly State: Final Type: Project Vote: Done Created: 26-Oct-2006 Post-History: Keywords: Tcl,channel,chan,pendinginput,pendingoutput Tcl-Version: 8.5 ~ Abstract Many network servers programmed in Tcl (including the venerable tclhttpd) are vulnerable to DoS (denial of service) attacks because they lack any way to introspect the amount of buffered data on a non-blocking socket that is in line buffering mode. This TIP proposes a new subcommand to '''chan''' to allow the amount of buffered input and buffered output (for symmetry) to be inspected from Tcl. ~ Rationale Many network protocols are inherently line-oriented (HTTP, SMTP, etc.) and the natural approach to implementing servers for these protocols in Tcl is to configure the incoming client sockets to use non-blocking I/O and to have ''line'' buffering and then define a readable fileevent callback. | proc accept {sock addr port} { | fconfigure $sock -buffering line -blocking 0 | fileevent $sock readable [list callback $sock ...] | } | socket -server accept $listenPort Recall that a readable fileevent will be called even when there is an incomplete line buffered. As the '''fileevent''' manual page states: > A channel is considered to be readable if there is unread data available on the underlying device. A channel is also considered to be readable if there is unread data in an input buffer, except in the special case where the most recent attempt to read from the channel was a gets call that could not find a complete line in the input buffer. The '''fblocked''' (and in 8.5 '''chan blocked''') command provides the Tcl programmer a means to test whether: > the most recent input operation ... returned less information than requested because all available input was exhausted. There is currently no way at the Tcl level to see how much data is buffered and could be read safely (via '''read''' instead of '''gets'''). There is also no way to specify any kind of upper limit on the length of a line; when in line-buffering mode all input is buffered until an end-of-line sequence is encountered or the EOF on the channel is reached. The practical result is that all network daemons written in Tcl using line-oriented I/O ('''gets''') can be fed repeated input lacking an end-of-line sequence until all physical memory is exhausted. This vulnerability has been recognized since at least 2001. See, for example, the discussion between George Peter Staplin and Donald Porter on the ''gets'' page on the Tcl'ers Wiki [http://wiki.tcl.tk/gets]. ~ Proposed Change At the C level Tcl already has a function, ''Tcl_InputBuffered'' which returns the number of unread bytes buffered for a channel and a corresponding ''Tcl_OutputBuffered'' which returns the number of bytes buffered for output that have not yet been flushed out. This TIP proposes to implement a new ''chan pending'' command which will take two arguments: a ''mode'' and a ''channelId''. The mode argument can be either ''input'' or ''output''. When the mode is ''input'' the command returns the value of ''Tcl_InputBuffered()'' (if the channel was open for input or -1 otherwise). When the mode is ''output'' the command returns the value of ''Tcl_OutputBuffered()'' (if the channel was output for output or -1 otherwise). This allows a programmer developing network daemons at the Tcl level to implement their own policy decisions based on the size of the unread line. Potential DoS situations could be avoided (in an application specific manner) long before all memory was exhausted. | if {[chan blocked $sock] && [chan pending input $sock] > $limit} { | # Take application specific steps (i.e., [close $sock] or | # [read $sock] to process a partial line and drain the buffer, etc.) | } ~ Rejected Alternatives * Adding a flag to '''fblocked''' to return the number of unread bytes instead of just 0 or 1 (since '''fblocked''' is now considered deprecated as per [208]). * Polluting the global namespace with a new '''favailable''', '''fpending''' or '''fqueued''' command. * A '''chan unread''' because of potential confusion as to whether it performed ''ungetch()'' type functionality (''un-reed'' vs ''un-red''). * Any sort of '''-maxchars''' or '''-maxbytes''' flag to '''gets''' in order to not complicate the semantics of '''gets'''. Additionally without even further complicating '''gets''' semantics one could not distinguish input of exactly $limit characters from the case where only $limit characters were returned (with some input remaining unread). * The initial version of this TIP called for a ''chan available'' command. This was changed to ''pendinginput'' (and ''pendingoutput'' added for symmetry's sake) following suggestions on news:comp.lang.tcl from Donald Arseneau and Donal Fellows, and later to ''chan pending'' that takes a ''mode'' argument (''input'' or ''output'') based on suggestions from Donald Porter and Joe English. ~ Reference Implementation [[RFE 1586860]] at SourceForge now contains a patch implementing '''chan pendinginput''' and '''chan pendingoutput''' (including updated '''chan''' man page and corresponding test cases) [http://sourceforge.net/tracker/index.php?func=detail&aid=1586860&group_id=10894&atid=360894]. ~ Copyright This document is in the public domain.