TIP #90: ENABLE [RETURN -CODE] IN CONTROL STRUCTURE PROCS =========================================================== Version: $Revision: 1.39 $ Author: Don Porter Donal K. Fellows State: Final Type: Project Tcl-Version: 8.5 Vote: Done Created: Friday, 15 March 2002 URL: https://tip.tcl-lang.org90.html Post-History: ------------------------------------------------------------------------- ABSTRACT ========== This TIP analyzes existing limitations on the coding of control structure commands as /proc/s, and presents expanded forms of /catch/ and /return/ to remove those limitations. BACKGROUND ============ It is a distinguishing feature of Tcl that everything is a command, including control structure functionality that in many other languages are part of the language itself, such as /if/, /for/, and /switch/. The command interface of Tcl, including both a return code and a result, allows extensions to create their own control structure commands. Control structure commands have the feature that one or more of their arguments is a script, often called a /body/, meant to be evaluated in the caller's context. The control structure command exists to control whether, when, in what context, or how many times that script is evaluated. When the body is evaluated, however, it is intended to behave as if it were interpreted directly in the place of the control structure command. The built-in commands of Tcl provide the ability for scripts themselves to define new commands. Notably, the /proc/ command makes this possible. In addition, other commands such as /catch/, /return/, /uplevel/, and /upvar/ offer enough control and access to the caller's context that it is possible to create new control structure commands for Tcl, entirely at the script level. Almost. There is one limitation that separates control structure commands created by /proc/ from those created in C by a direct call to /Tcl_Create(Obj)Command/. It is most easily seen in the following example that compares the built-in command /while/ to the command /control::do/ created by /proc/ in the control package of tcllib. % package require control % proc a {} {while 1 {return -code error}} % proc b {} {control::do {return -code error} while 1} % catch a 1 % catch b 0 The control structure command /control::do/ fails to evaluate /return -code error/ in such a way that it acts the same as if /return -code error/ was evaluated directly within proc /b/. ANALYSIS ========== There are two deficiencies in Tcl's built-in commands that lead to this incapacity in control structure commands defined by /proc/. First, /catch/ is not able to capture the information. Consider: % set code [catch { return -code error -errorinfo foo -errorcode bar baz } message] After evaluation, /code/ contains "2" (/TCL_RETURN/), and /message/ contains "baz", but the other values are locked away in internal fields of the /Tcl_Interp/ structure as /interp->returnCode/, /interp->errorCode/, and /interp->errorInfo/. The "-errorcode" and "-errorinfo" values will be copied to the global variables "::errorCode" and "::errorInfo", respectively, but there will be no way at the script level to get at the /interp->returnCode/ value which was the value of the original "-code" option. Second, even if the information were available, there is no built-in command in Tcl that can be evaluated within the body of a proc to make the proc itself act as if it were the command /return -code/. Stated another way, it is not possible to create a command with /proc/ that behaves exactly the same as /return -code/. Because of that, it is also not possible to create a command with /proc/ that behaves exactly the same as /while/, /if/, etc. - any command that evaluates any of its arguments as a script in the caller's context. This is a curious, and likely unintentional, limitation. Tcl goes to great lengths to be sure I can create my own /break/ replacement with /proc/. proc myBreak {} {return -code break} It would be a welcome completion of Tcl's set of built-in commands to be able to create a replacement for every one of them using /proc/. SPECIFICATION =============== The /return/ command shall have syntax: return ?option value ...? ?result? There can be any number of /option value/ pairs, and any value at all is acceptable for an /option/ argument. The legal values of a /value/ argument are limited for some /option/s, as follows: the /value/ after a "-code" must be either an integer (32-bit only), or one of the strings, "ok", "error", "return", "break", or "continue", just as in the 8.4 spec for /return/. The default /value/ for the "-code" option is "0". the /value/ after a "-level" must be a non-negative integer. The default /value/ for the "-level" option is "1". the /value/ after a "-options" must be a dictionary ([TIP #111]). The default /value/ for the "-options" option is an empty dictionary. The keys and values in the dictionary /value/ of the "-options" option are pulled out and treated as additional /option value/ arguments to the /return/ command. Note that this "-options" option for option expansion is offered only because Tcl itself has no syntax for argument expansion, as observed many, many times before (for example, [TIP #103]). The /result/ argument, if any, is stored in the interp as the result of the /return/ command. In default operation, this becomes the result of the procedure in which the /return/ command is evaluated. The return code of the /return/ command is determined by the /value/s of the "-code" and "-level" options. If the /value/ of the "-level" option is non-zero, then the return code of /return/ is TCL_RETURN. If the /value/ of the "-level" option is "0", then the return code of /return/ is the /value/ of the "-code" option, translated from string, as needed. In this way, return -level 0 -code break is a synonym for break while return -code break spelled out with defaults filled in as: return -level 1 -code break continues to function as before, causing the procedure in which the /return/ is evaluated to return the TCL_BREAK return code. All /option value/ arguments to /return/ are stored in a return options dictionary kept in the interp, just as the /result/ argument gets stored in the result of the interp. The TclUpdateReturnInfo() function is modified, so that each level of procedure returning decrements the value of the "-level" key in the return options dictionary. When the value of the "-level" key reaches "0", the return code from the current procedure will be the value of the "-code" key in the return options dictionary. Otherwise, the return code of the current procedure will be TCL_RETURN. In this way, return -level 2 -code ok is equivalent to return -code return and should (absent some intervening /catch/) cause a normal return to the caller's caller. Likewise, return -level 3 -code ok would cause a normal return to the caller's caller's caller (again absent an intervening /catch/), something that can't currently be accomplished. The /catch/ command shall have syntax: catch script ?resultVar? ?optionsVar? The new argument /optionsVar/, if present, will be the name of a variable in which a dictionary of return options should be stored. The return options stored in that dictionary are exactly those needed so that the evaluation of catch $script result options return -options $options $result is completely indistinguishable (except for the existence and values of variables "result" and "options") from the direct evaluation of /$script/ by the interpreter. In particular, any values of the "::errorCode" and "::errorInfo" variables are the same as if there were never a /catch/ in the first place. In addition, when the result of /catch/ is TCL_ERROR, the value in the /errorLine/ field of the /Interp/ struct will be stored as the value of the "-errorline" key in the return options dictionary. This specification may seem a bit complex, but it makes possible very simple solutions to the problems posed above. EXAMPLES ========== First lets revisit the analysis: % set code [catch { return -code error -errorinfo foo -errorcode bar baz } message options] After evaluation, /code/ contains "2" (/TCL_RETURN/), /message/ contains "baz", and now /options/ contains: -errorcode bar -errorinfo foo -code 1 -level 1 So, the /options/ variable now contains the information that was previously inaccessible. We can now return -options $options $message to get the same results as if the /catch/ had never been there in the first place. In 8.4 Tcl, it is not possible to implement a replacement for the /return/ command as a proc. After this proposal, such a replacement is: proc myReturn args { set result "" if {[llength $args] % 2} { set result [lindex $args end] set args [lrange $args 0 end-1] } set options [eval [list dict create -level 1] $args] dict incr options -level return -options $options $result } In every way /myReturn/ should be an equivalent to /return/. The new ability to exactly reproduce stack traces makes a /catch/ of large scripts more attractive. For example, a procedure that allocates some resource, then performs operations, and finally frees the resource before returning. In order to be sure the resource is freed, we must /catch/ any errors that might cause the procedure to return before the freeing of the resource. The solution looks like: proc doSomething {} { set resource [allocate] catch { # Arbitrarily long script of operations } result options deallocate $resource return -options $options $result } With that structure, we are confident the resource is always freed, but any error or exception will be presented to the caller exactly as if it had never been caught in the first place. Here are two examples of how to use the new features in a control structure proc. The essence of a control structure command is its ability to evaluate a script in the caller's context, preserving the illusion that no additional stack frame was ever used. So, a proc replacement for /eval/ illustrates the technique. The first approach assumes one knows the internal details of how the /uplevel/ command adds to the stack trace. This is straightforward, but will require a rewrite if /uplevel/ ever changes how it manipulates the stack trace. proc myEval script { if {[catch {uplevel 1 $script} result options] == 1} { set stack [dict get $options -errorinfo] regsub {\s+invoked from within\s+"uplevel 1 \$script"$} $stack {} stack regsub {\("uplevel" body line (\d+)\)$} $stack [subst -nobackslashes \ {("[lindex [info level 0] 0]" body line \1)}] stack dict set options -errorinfo $stack } dict incr options -level return -options $options $result } A second, more robust solution is possible, but requires a bit more context gymnastics. namespace eval control { proc eval script { variable result variable options set code [uplevel 1 \ [list ::catch $script [namespace which -variable result] \ [namespace which -variable options]]] if {$code == 1} { set line [dict get $options -errorline] dict append options -errorinfo \ "\n (\"[lindex [info level 0] 0]\" body line $line)" } dict incr options -level return -options $options $result } } Note that in the second solution we did not have to strip away the contributions of /uplevel/ to the stack trace, because we captured the stack trace before /uplevel/ added anything. Then we could add our own information (drawing in part on the new "-errorline" value available to us now at the script level). We confirm that either approach solves the original problem: % proc a {} {eval {return -code error}} % proc b {} {myEval {return -code error}} % proc c {} {control::eval {return -code error}} % catch a 1 % catch b 1 % catch c 1 Finally, the new features make possible a utility command that can be of use to people making simple control structure commands, or doing simple wrapping, where there is no need to augment the stack trace, or to treat any return codes in a special way: namespace eval control { proc ascaller script { if {[info level] < 2} { return -code error \ "[lindex [info level 0] 0] called outside a proc" } variable result variable options set code [uplevel 2 \ [list ::catch $script [namespace which -variable result] \ [namespace which -variable options]]] if {$code == 0} { return $result } dict incr options -level 2 return -options $options $result } } Within a proc, /ascaller $script/ will take care of all aspects of evaluating /$script/ in the caller context, and exiting as appropriate for all non-TCL_OK return codes. EXTENSIBILITY =============== The /return -code/ command has always accepted any integer value as a valid argument, allowing package and application authors to define their own new return codes as needed by their own control structure commands. Now that /return/ will accept any /option/ argument, and /catch/ can capture all /option value/ argument pairs passed to the caught /return/ command, package and application authors now have the ability to augment their custom return codes with additional data. Some prefix convention should be established to avoid key name conflicts in the return options dictionary. POTENTIAL CONCERNS ==================== Reviewers of drafts of this TIP wondered whether the new "-level" option to /return/ raised the possibility of trouble with an attempt to return more levels than beyond the top of the call stack. It should be understood that /return -level N/ does not take any shortcut past the intervening levels. Each level of the call stack gets a TCL_RETURN return code, and a "-level" value, dropping by one each step up the stack. Any level in the stack might choose to /catch/ the TCL_RETURN and treat it as it wishes. This is exactly the way the existing /return -code return/ is handled. Normally, it would cause a normal return to the caller's caller, but if the caller chooses to 'catch' it, then the caller has control. At the toplevel we run out of callers. Then the question becomes how is a TCL_RETURN code at toplevel handled? % return -level 0 ;# same as a TCL_OK at toplevel % return -level 1 ;# same as [return] % return -level 2 ;# same as [return -code return] command returned bad code: 2 From the C level, /Tcl_AllowExceptions()/ can be used to modify this toplevel behavior. The following proc will produce the same results as above, but from any level in the call stack (absent an intervening /catch/): % proc escape level { set x [info level] incr x $level return -level $x } % escape 0 % escape 1 % escape 2 command returned bad code: 2 Another concern was whether this proposal gave slave interpreters any new powers over their masters. The return code from evaluation of an untrusted script in a slave interpreter should always be wrapped in a /catch/ already, lest a TCL_ERROR in the script blow the stack. Given that, the only thing this proposal does is give the /catch/ command more information to use to decide how to handle the misbehaving script. COMPATIBILITY =============== It is the author's belief that this proposal is completely compatible with prior Tcl 8.X releases. Any error-free script that ran before, should continue to run with the same results. At the C level, only internal changes are made, and no new interfaces are defined. Any extension or embedding C program that sticks to the public stubs interface should see no visible change. PROTOTYPE =========== This proposal is implemented by Tcl Patch 531640 at SourceForge. The prototype covers all described functionality, but might be further improved with more substantial bytecompiling of [return]. FUTURE CONSIDERATIONS ======================= The main reason the global variables /::errorInfo/ and /::errorCode/ exist is to give the script level access to stack and error code information following the /catch/ of a script that raises an error. After this proposal, the /catch/ command itself provides access to that information, so the global variables are not required. One can imagine deprecating them, asking users of Tcl 8.5 to stop writing code that accesses them. They could still have apparent existence, to satisfy the needs of scripts written for earlier Tcl 8.X releases, by means of read traces. In time, Tcl 9 could either continue the read trace scheme, or not provide these global variables at all. One part of Tcl itself that currently makes use of the /::errorCode/ and /::errorInfo/ variables is the /bgerror/ command. Currently, /bgerror/ accepts exactly one argument, the error message. To make use of stack or error code information, /bgerror/ must retrieve them from the global variables. The proper values of these global variables are re-set by /Tcl_BackgroundError()/ prior to evaluation of /bgerror/. As an alternative, /Tcl_BackgroundError()/ could first attempt to call /bgerror/ with /two/ arguments, first the message, then a dictionary of options. If that call returned TCL_ERROR, then a second attempt could be made with a single message argument. In that way, cleaner /bgerror/ commands that get all data from arguments could be supported, while still keeping support for those /bgerror/ commands that were defined for single argument use. It has been noted several times that the processing of the value of /::errorInfo/ is rather difficult because it is an arbitrary string with no documented structure. A different, more structured way of representing stack trace information would be an improvement. This proposal does not propose an alternative, but because it offers an extensible dictionary for storing arbitrary return options data, it does provide an infrastructure where such approaches might be tried out. ACKNOWLEDGMENTS ================= This proposal is a synthesis of ideas from many sources. As best I can recall, major contributions came from Joe English, Andreas Leitgeb, Reinhard Max, and Kevin Kenny. If you like the idea, give them some credit; it you don't, blame me for combining the ideas badly. SEE ALSO ========== Documentation for tcllib's control package: COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows