TIP #463: COMMAND-DRIVEN SUBSTITUTIONS FOR REGSUB =================================================== Version: $Revision: 1.6 $ Author: Donal Fellows State: Final Type: Project Tcl-Version: 8.7 Vote: Done Created: Saturday, 11 February 2017 URL: https://tip.tcl-lang.org463.html Post-History: ------------------------------------------------------------------------- ABSTRACT ========== The *regsub* command can only do substitutions of a limited complexity. This TIP adds an option to generate substitution text using another Tcl command, allowing a more complex range of substitutions to be performed easily and safely. RATIONALE AND OUTLINE PROPOSAL ================================ Many scripts wish to perform subsitutions on a string where the text to be substituted can be described by a regular expression, but where the text to be substituted in cannot easily be generated by the *regsub* command. There are workarounds for this, as seen in this example (from the Wiki): set text [subst [regsub -all {[a-zA-Z]} [\ regsub -all "\[\[$\\\\\]" $text {\\&}] {[ set c [scan & %c] format %c [expr {$c\&96|(($c\&31)+12)%26+1}] ]}]] But it is not at all trivial to write such things! Instead, we should be able to do this: set text [regsub -all -command {[a-zA-Z]} $text {apply {c { scan $c %c c format %c [expr {$c&96|(($c&31)+12)%26+1}] }}}] It's going to be both safer (as there's no required non-obvious metadata defanging preprocessing step) and faster (as we can do this as a command call rather than a *subst* that needs separate bytecode compilation). The parallels with Perl's "e" flag to its regular expression substitution operator should be obvious. PROPOSED CHANGE ================= My proposal is that we add a flag to the *regsub* command, *-command*, that changes the interpretation and processing of the substitution argument. When the flag is passed, instead of that argument being a string that is processed for *&* and backslash-number sequences, it is instead interpreted as a command prefix; the various captured substrings (minimally the entire string passed in, but also any captured substrings specified in the RE) will become extra arguments added, and the result will be evaluated and the result of that evaluation will be used as the string to substitute in. If the *-all* option is not given, the substitution command will be called at most once, whereas if *-all* is given, the substitution command will be called for as many times as the regular expression matches. The indices in the original script that matched will not be available. Non-OK results will be passed through to the surrounding script. Substitutions too complex to be described by a simple command can be done by using a procedure or *apply*/lambda-term (as in the example above). The arguments received by the command invoked by *regsub -command* will be exactly the substrings that were matched, with no other substitutions performed on them. EXAMPLES ---------- The command: regsub -all -command {\w} "ab-cd-ef-gh" { puts } will give *---* as its result and print the letters *a* to *h*, one per line in that order. The command: regsub -command {\W(\W)} "ab cd,{ef gh,} ij" {apply {{x y} { scan $y %c c format %%%02x $c }}} will produce this result: ab cd%7bef gh,} ij IMPLEMENTATION ================ COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows