TIP #10000: Dummy Proposal for Testing Editing Interfaces


TIP:10000
Title:Dummy Proposal for Testing Editing Interfaces
Version:$Revision: 1.216 $
Authors: Don Porter <dgp at users dot sourceforge dot net>
Andreas Kupries <a dot kupries at westend dot com>
Richard Suchenwirth <richard dot suchenwirth at kst dot siemens dot de>
Kevin B KENNY <kennykb at acm dot org>
Jeff Hobbs <hobbs at users dot sourceforge dot net>
Vince Darley <vincentdarley at users dot sourceforge dot net>
Fabrice Pardo <Fabrice dot Pardo at l2m dot cnrs dot fr>
Joe Mistachkin <joe at mistachkin dot com>
Donal K. Fellows <donal dot k dot fellows at manchester dot ac dot uk>
Mark Janssen <mpc dot janssen at gmail dot com>
Reinhard Max <max at tclers dot tk>
Andreas Leitgeb <avl at logic dot at>
Andreas Kupries <akupries at shaw dot ca>
Francois Vogel <fvogelnew1 at free dot fr>
KEVIN KENNY <kevin dot b dot kenny at gmail dot com>
State:Draft
Type:Informative
Vote:Pending
Created:Sunday, 03 December 2000

Abstract

This TIP proposes to complete the separation between string and numeric comparison operations in [expr] and related commands ([for], [if], [while], etc.). It introduces new comparison operators ge, gt, le, and lt, (along with the corresponding commands in the ::tcl::mathop namespace), and to restrict the six operators ==, >=, >, <=, < and != to comparisons of numeric values.

Rationale

Tcl throughout its history has had comparison operators that freely compare numeric and string values. These operators behave as expected if both their arguments are numeric: they compare values on the real number line. Hence, 15 < 0x10 < 0b10001. Similarly, if presented with non-numeric strings, they compare the strings in lexicographic order, as a programmer might expect: "bambam" < "barney" < "betty" < "fred".

Trouble arises, however, when numeric and non-numeric strings are compared. The rule for comparison is that mixed-type comparisons like this are treated as string comparisons. The result is that < does not induce an order. There are inconsistent comparison results, rendering < and friends worthless for sorting. 0x10 < 0y < 1 < 0x10.

The problems with this inconsistency prompted changes in May of 2000, introducing eq and ne operators that always perform string comparison. For whatever reason, the four inequality operations never followed. This leads to pitfalls for the unwary. It's fairly well entrenched in the Tcl folklore that comparisons other than eq and ne should be reserved for numeric arguments only, and experienced Tcl programmers know to write:

 if {[string compare $x $y] < 0} { ... }

in place of

 if {$x < $y} { ... }

Proposal

Two things are proposed.

  1. [8.x] Four new bareword operators, ge, gt, le and lt shall be added to the expression parser and to the ::tcl::mathop command set. They will have precedence identical to the existing operators >=, >, <= and <. They will accept string values, and return 0 or 1 according to lexicographic string comparison of their operators. This change is entirely backward compatible (it uses syntax that would previously have been erroneous), and should go in as soon as possible - no later than the next point release, but ideally even in a patchlevel - so that programmers can begin conversion as soon as possible. Use of the ==, >=, >, <=, <, and != for comparing non-numeric values shall immediately be deprecated.

  2. [9.0] Passing of non-numeric values to the ==, >=, >, <=, <, and != operators (or to their tcl::mathop equivalents) shall be forbidden and result in an error being thrown.

Discussion

Forcing numeric comparisons in today's Tcl

Programmers who wish to prepare for the change, once the four new operators are in place, can adapt places in their code where they wish to force numeric comparisons by replacing expressions of the form:

 if {$x < $y} { ... }

with

 if {+$x < +$y} { ... }

The second comparison will have the effect of forcing both operands to be numeric, and the existing comparison code will then provide the correct semantics.

Deprecation and compatibility

It may be possible to introduce some sort of per-interpreter or per-namespace option to control the behaviour of numeric comparisons when evaluated in the given interpreter or namespace. The author of this TIP has not investigated how such an option might be implemented, and encourages those who propose it to do so. Until and unless a concrete implementation plan emerges from their investigation, the plan is to leave a "backward compatibility" setting out of scope.

In addition, at least one reader of this TIP has requested a setting whereby a warning can be delivered that functionality is deprecated. Since in the past, we have not been able to identify and standardize a mechanism whereby such warnings could be delivered, this functionality is also considered to be out of scope.

Rejected alternatives

One possible alternative to excluding non-numeric arguments from the comparison operators is to change their semantics so that all non-numeric strings are greater than all numbers. This change would at least yield a consistent ordering. The ordering that it yields would, however, be somewhat surprising, and not terribly useful. (It would at least be compatible with today's scheme for numeric comparisons.)

Objections (and rebuttals)

In out-of-band discussions, several objections were raised. This section attempts to address them.

  1. Tcl's expression parser has a hard limit of 64 different binary operators. This proposal consumes four of them, leaving only 28. There is a concern that this is a less-than-effective use of a limited resource.

The limit is self-imposed, in an effort to make the nodes of an expression parse tree fit in exactly 16 bytes (or four int's). It is far from obvious that this pretty size is actually useful. Few expressions are more than a few dozen parse nodes, and typical expressions are not parsed multiple times. It appears that neither the speed of the parse nor the size of the tree will be critical issues in most applications. In any case, we still have nearly half the operators left.

  1. There is some concern that using barewords for operators was a bad idea in the first place. The fact that

                   expr {"foo"}
    

and

               set x foo; expr {$x}

both work, while

               expr {foo}

is an invalid bareword is arguably surprising.

Nevertheless, we have committed to the approach with the 'eq', 'ne', 'in' and 'ni' operators. These are unlikely to go away. Adding 'lt', 'le', 'gt' and 'ge' will make this problem no better nor worse.

Moreover, the language of [expr] is not the same as Tcl. It does not strip comments, parse into words, and apply Tcl's precise substitution rules - and it would be surprising if it did! There are other 'little languages' throughout Tcl - regular expressions, glob patterns, assembly code, and so on. [expr] is one among many.

  1. There is concern that [expr], which was originally intended almost exclusively for numeric calculations, is being abused with string arguments and possibly string results.

The author of this TIP contends that we introduced string values to [expr] a long time ago, certainly by the time that the eq, ne, in and ni operations were introduced. It is true that the use of numeric conversions in [expr] is incoherent, as seen in:

               % proc tcl::mathfunc::cat {args} { join $args {} }
               % expr {cat(0x1,0x2,"a")}
               0x10x2a
               % expr {cat(0x1)}
               1

(Bug [e7c21ed678] is another manifestation of this general problem.) Once again, adding additional string operations that behave, with respect to data types, exactly the same as ones that are already there will neither fix nor exacerbate the general problem.

  1. Because [expr] has no interpreted form, the operations must have bytecode representations. The space of available bytecodes is under even more pressure than the space of available operators, and must not be squandered on operations that are duplicative of already-available functionality such as [string compare]

The obvious rebuttal is that [string compare] is already bytecoded. There are no new operations required, merely a compiler that is smart enough to emit a short codeburst rather than a single bytecode. As an example, the code for the expression

               {$x lt $y}

could be:

    (0) loadScalar1 %v0        # var "x"
    (2) loadScalar1 %v1        # var "y"
    (4) strcmp 
    (5) push1 0        # "0"
    (7) lt 

For the other string operators, only the last bytecode in the burst would change. No new bytecode operations are needed. In fact, this codeburst is identical code to that generated for

               {[string compare $x $y] < 0}

Powered by Tcl[Index] [History] [HTML Format] [Source Format] [LaTeX Format] [Text Format] [XML Format] [*roff Format (experimental)] [RTF Format (experimental)]

TIP AutoGenerator - written by Donal K. Fellows