TIP #111: DICTIONARY VALUES AND MANIPULATORS ============================================== Version: $Revision: 1.12 $ Author: Donal K. Fellows David S. Cargo State: Final Type: Project Tcl-Version: 8.5 Vote: Done Created: Saturday, 05 October 2002 URL: https://tip.tcl-lang.org111.html Post-History: ------------------------------------------------------------------------- ABSTRACT ========== This TIP proposes adding a standard value format (and supporting commands) to Tcl that implements a value-to-value mapping, just as Tcl's list values can be regarded as implementing a number-to-value mapping. RATIONALE =========== What is a dictionary? It is a translation from arbitrary values to arbitrary values, often also known as an associative map. Many computer languages, especially higher-level ones, have them as part of the language or the standard library. It would be nice to have them in Tcl too. Now, I realise that Tcl already contains arrays which provide dictionary functionality, but they are not quite the same thing. Tcl's arrays are collections of variables indexable by name, and not collections of values. This has some far-reaching implications; it is possible to set traces on individual elements of the array, but it is not possible to pass the array by value. However, one of the main concerns is the sheer cost of arrays in terms of memory space; aside from the hash table used as the core of the implementation (and the representations of the keys and values, of course) there is a substantial overhead for each array to support traces on the array as a whole, plus a similar overhead /per element/ that stems from the fact that elements are variables in their own right. By contrast, a dictionary value should be a lot more frugal. VALUE SYNTAX AND SEMANTICS ============================ Naturally, it is desirable for dictionary values to have human-readable forms that are similar to those that currently exist. I propose using /key value key value .../ form with list-style quoting for keys and values that contain characters that are significant to Tcl, which should be immediately familiar to users of the *array get* and *array set* commands. No special interpretation will be placed on the amount of whitespace separating keys and values, just as with lists (indeed, any list with an even number of elements can be regarded as a dictionary.) For example, the following value represents a mapping from selected languages to a possible program to invoke to compile them: C gcc C++ g++ FORTRAN f77 Java javac Empty dictionaries are those that contain no mappings from keys to values. Any representation of an empty list will also be a representation of an empty dictionary. There is no upper bound on the number of items that a dictionary may hold. It should be specially noted that dictionary values have copy-on-write semantics just like lists. This means that if I hand a dictionary value into a procedure as an argument, and that procedure updates the variable containing that value, the value as seen by the caller will not have changed. This is in complete contrast with arrays which cannot (currently) be passed by value other than through using *array get* to convert the array to a list form and *array set* to convert back again. This specification does not state what order the keys and values are listed in. That depends on the implementation. COMMAND SYNTAX AND SEMANTICS ============================== I propose that all operations that work with dictionary values (where not done through adaptations of existing commands) will go through the *dict* command. The alternatives are "array" which is already in use, "dictionary" which is rather long for what I believe will be a fairly commonly used command, "alist" (association list) which is probably too easy to confuse with existing commands, and "map" which is probably better reserved for future use as something for applying an operation to a list (or other collection of values), "hash" (which is perhaps too common), and "table" (which is used for this type of data structure in the Icon programming language). Most subcommands operate on either a dictionary value (/exists/, /for/, /get/, /info/, /keys/, /remove/, /replace/, /size/, and /values/), or on a variable containing a dictionary value (/append/, /incr/, /lappend/, /set/, and /unset/). Proposed subcommands: dict create: Make a dictionary. *dict create* ?/key1 value1 key2 value2 .../? This will create a new dictionary from the given keys and values and return it as the result. The command will take an even number of arbitrary strings (or other objects, naturally) and will use the first, third, fifth, etc. as keys and the second, fourth, sixth, etc. as values. From the point of view of string representations, this command will behave the same as the *list* command with an even number of arguments. There is no restriction on the possible representations of keys or values. It is legal to call this command with no arguments at all, which creates an empty dictionary. dict get: Get value for given key. *dict get* /dictionaryValue/ ?/key .../? Given a dictionary value (first argument) and a key (second argument), this will retrieve the value for that key. Where several keys are supplied, the behaviour of the command shall be as if the result of *dict get* /dictVal key/ was passed as the first argument to *dict get/ with the remaining arguments as second (and possibly subsequent) arguments. This facilitates lookups in nested dictionaries. For example, the following two commands are equivalent: dict get $dict foo bar spong dict get [dict get [dict get $dict foo] bar] spong If no keys are provided, dict would return a list containing pairs of elements in a manner similar to *array get*. That is, the first element of each pair would be the key and the second element would be the value for that key. It is an error to attempt to retrieve a value for a key that is not present in the dictionary. dict replace: Create a new dictionary that is a copy of an old one except with some values different or some extra key/value pairs added. *dict replace* /dictionaryValue/ ?/key value .../? This is very much the analogue of *lreplace*, taking a dictionary value as its first argument and then a list of key/value pairs. The result of the command is a new dictionary value that is a copy of the supplied dictionary other than that whenever a key is one of those supplied to this command, the returned dictionary will map that key to the associated value. It is legal for this command to be called with no key/value pairs, but illegal for this command to be called with a key but no value. dict remove: Create a new dictionary that is a copy of an old one except without the key/value mappings whose keys are listed. *dict remove* /dictionaryValue/ ?/key key .../? This operation does what *dict replace* can't do; removes keys and values. The result of the command is a new dictionary value that does not contain mappings for any of the keys listed; it is not an error if either there are no keys listed, or if any of the listed keys does not exist in the supplied dictionary. dict set: Set value for given key in a dictionary in a variable. *dict set* /dictionaryVar key/ ?/key .../? /value/ This operation takes the name of a variable containing a dictionary value and places an updated dictionary value in that variable containing a mapping from the given key to the given value. In a manner analogous to *lset*, where multiple keys are present, they do indexing into nested dictionaries. dict unset: Remove association for given key in a dictionary in a variable. *dict unset* /dictionaryVar key/ ?/key .../? This operation takes the name of a variable containing a dictionary value and places an updated dictionary value in that variable that does not contain a mapping for the given key. Where multiple keys are present, this describes a path through nested dictionaries to the mapping to remove. At least one key must be specified. dict keys: List all keys (with optional criteria matching) in dictionary. *dict keys* /dictionaryValue/ ?/globPattern/? Return a list of all keys in the given dictionary value. If a pattern is supplied, only those keys that match it (according to the rules of *string match*) will be returned. The returned keys will be in an arbitrary implementation-specific order. dict values: List all values (with optional criteria matching) in the dictionary. *dict values* /dictionaryValue/ ?/globPattern/? Return a list of all values in the given dictionary value. If a pattern is supplied, only those values that match it (according to the rules of *string match*) will be returned. The returned keys will be in an arbitrary implementation-specific order, though where no pattern is supplied the /i/th key returned by *dict keys* will be the key for the /i/th value returned by *dict values* applied to the same dictionary value. dict for: Iterate across all key/value mappings in the dictionary. *dict for* {/keyVar valueVar/} /dictionaryValue body/ This takes three arguments, the first a pair of variable names (for the key and value respectively of each mapping in the dictionary), the second the dictionary value to iterate across, and the third a script to be evaluated for each mapping with the key and value variables set appropriately (in the manner of *foreach*.) The result of the command is an empty string. If any evaluation of the body generates a /TCL_BREAK/ result, no further pairs from the dictionary will be iterated over and the *dict for* command will terminate successfully immediately. If any evaluation of the body generates a /TCL_CONTINUE/ result, this shall be treated exactly like a normal /TCL_OK/ result. dict filter: Create a new dictionary from an old one containing just a selection of key/value pairs. *dict filter* /dictionaryValue/ *key* /globPattern/ *dict filter* /dictionaryValue/ *value* /globPattern/ *dict filter* /dictionaryValue/ *script* {/keyVar valueVar/} /script/ This takes a dictionary value and returns a new dictionary that contains just those key/value pairs that match the specified rule. Three rules are outlined above. The *key* rule only matches those key/value pairs whose keys match the given glob-style pattern. The *value* rule only matches those key/value pairs whose values match the given glob-style pattern. The *script* rule tests for matching by assigning the key to the /keyVar/ and the value to the /valueVar/, and then evaluating the given script which should return a boolean value (with the key/value pair only being included in the result of the *dict filter* when a true value is returned.) dict append: Append a string to the value for a particular key in the dictionary. *dict append* /dictionaryVar key/ ?/string .../? This appends the given string (or strings) to the value that the given key maps to in the dictionary value contained in the given variable, writing the resulting dictionary value back to that variable. Non-existent keys are treated as if they map to an empty string. dict incr: Increment the value for a particular key in the dictionary. *dict incr* /dictionaryVar key/ ?/increment/? This adds the given increment value (an integer that defaults to 1 if not specified) to the value that the given key maps to in the dictionary value contained in the given variable, writing the resulting dictionary value back to that variable. Non-existent keys are treated as if they map to 0. It is an error to increment a value for an existing key if that value is not an integer. dict lappend: Append an item to the list-value for a particular key in the dictionary. *dict lappend* /dictionaryVar key/ ?/item .../? This appends the given items to the list value that the given key maps to in the dictionary value contained in the given variable, writing the resulting dictionary value back to that variable. Non-existent keys are treated as if they map to an empty list, and it is legal for there to be no items to append to the list. It is an error for the value that the key maps to to not be representable as a list. dict exists: Test whether a mapping exists for a key. *dict exists* /dictionaryValue key/ ?/key .../? This returns a boolean value indicating whether the given key (or path of keys through a set of nested dictionaries) exists in the given dictionary value. This returns a true value exactly when *dict get* on that path will succeed. dict size: Get the number of key/value mappings in a dictionary. *dict size* /dictionaryValue/ This returns the size of the dictionary, which will be exactly half the value that *llength* /dictionaryValue/ would return. It is an error to apply this command to a non-dictionary value. dict info: Get implementation-specific information about the dictionary value. *dict info* /dictionaryValue/ This returns information (intended for display to people) about the given dictionary though the format of this data is dependent on the implementation of the dictionary. For dictionaries that are implemented by hash tables, it is expected that this will return the string produced by /Tcl_HashStats()/. OTHER RELATED CHANGES ======================= There are a few other commands that change: * *array set* will take a dictionary instead of (or as well as) a list as its final argument. * *array get* will return a dictionary. * *string map* will take a dictionary instead of (or as well as) a list as its map argument. Naturally, dictionary handling will form its own maintenance area. [TIP #16] and [TIP #24] will be updated as necessary. C API ======= There will be a new public structure and a few new public functions to allow C-level access to dictionary values: The new structure (called /Tcl_DictSearch/) will be there to allow for searches (i.e. traversals) of a dictionary. This TIP does not specify the fields of the structure; the declaration is just to allow for allocation of these structures on the C stack. Many public API functions are capable of generating error messages; these generally indicate some type-conversion failure. Sharing constraint violations (where applicable) cause panics as they indicate basic programming errors which should not be causable by scripts. The public API functions are: Tcl_Obj **Tcl_NewDictObj*(void); Tcl_Obj **Tcl_DbNewDictObj*(CONST char */file/, int /line/); These functions (in non-debug and debug versions) create a new dictionary object and return it. int *Tcl_DictObjPut*(Tcl_Interp */interp/, Tcl_Obj */dictPtr/, Tcl_Obj */keyPtr/, Tcl_Obj */valuePtr/); This function inserts a new key/value pair into a dictionary, or updates a key/value pair already in the dictionary. The dictionary object must be unshared. Note that both the key and value objects will have their reference count increased. The return value is a normal TCL_OK/TCL_ERROR result, with the /interp/ for error reporting. int *Tcl_DictObjGet*(Tcl_Interp */interp/, Tcl_Obj */dictPtr/, Tcl_Obj */keyPtr/, Tcl_Obj **/valuePtrPtr/); This function looks up the value for a key in a dictionary. The variable pointed to by the /valuePtrPtr/ argument is updated to contain a reference to the value, or a NULL if there is no mapping for the key in the dictionary. No reference counts are manipulated by this function. The return value is a normal TCL_OK/TCL_ERROR result, with the /interp/ for error reporting. int *Tcl_DictObjRemove*(Tcl_Interp */interp/, Tcl_Obj */dictPtr/, Tcl_Obj */keyPtr/); This function removes the key/value pair with the given key from the dictionary. It is not an error if the key is not present in the dictionary. The dictionary must be unshared. The return value is a normal TCL_OK/TCL_ERROR result, with the /interp/ for error reporting. int *Tcl_DictObjSize*(Tcl_Interp */interp/, Tcl_Obj */dictPtr/, int */sizePtr/); This function updates the integer variable pointed to by /sizePtr/ with the number of key/value pairs in the dictionary. The return value is a normal TCL_OK/TCL_ERROR result, with the /interp/ for error reporting. int *Tcl_DictObjFirst*(Tcl_Interp */interp/, Tcl_Obj */dictPtr/, Tcl_DictSearch */searchPtr/, Tcl_Obj **/keyPtrPtr/, Tcl_Obj **/valuePtrPtr/, int */donePtr/); This function starts a search of (i.e. iteration over) the given dictionary, using the structure pointed to by /searchPtr/ as context. The return value is a normal TCL_OK/TCL_ERROR result, with the /interp/ for error reporting. Three variables are updated to indicate what was found; /keyPtrPtr/ is used for reporting the key of a key/value pair and /valuePtrPtr/ is used for reporting the corresponding value. Finally, /donePtr/ is used for indicating whether the search has found all the values; if the variable it points to is set to 1 there are no key/value pairs in the dictionary (i.e. the variables pointed to by /keyPtrPtr/ and /valuePtrPtr/ were not updated), but if the variable is set to 0, a key/value pair was found and /Tcl_DictObjNext()/ should be called to discover whether that was the last value or if there are further ones in the dictionary. Note that if this function indicates that the search is not done but the calling code wishes to extract no further values from the dictionary, /Tcl_DictObjDone()/ /must/ be called to release the internal locks on the representation of the value. void *Tcl_DictObjNext*(Tcl_DictSearch */searchPtr/, Tcl_Obj **/keyPtrPtr/, Tcl_Obj **/valuePtrPtr/, int */donePtr/); This function gets the next key/value pair from the search of a dictionary, using the search referenced by /searchPtr/. The meaning of the /keyPtrPtr/, /valuePtrPtr/ and /donePtr/ variables is much the same as in /Tcl_DictObjFirst()/, along with the restriction that if the function indicates that the search is not done but the calling code wishes to extract no further values from the dictionary, /Tcl_DictObjDone()/ /must/ be called to release the internal locks on the representation of the value. void *Tcl_DictObjDone*(Tcl_DictSearch */searchPtr/); This function terminates a search of a dictionary before all the values in the dictionary have been iterated over, releasing the internal locks on the dictionary representation. int *Tcl_DictObjPutKeyList*(Tcl_Interp */interp/, Tcl_Obj */dictPtr/, int /keyc/, Tcl_Obj *CONST */keyv/, Tcl_Obj */valuePtr/); This function is a variant on /Tcl_DictObjPut()/ that takes a list of keys so as to work with nested dictionaries. int *Tcl_DictObjRemoveKeyList*(Tcl_Interp */interp/, Tcl_Obj */dictPtr/, int /keyc/, Tcl_Obj *CONST */keyv/); This function is a variant on /Tcl_DictObjRemove()/ that takes a list of keys so as to work with nested dictionaries. EXAMPLES ========== Counting the number of unique words in a file and the number of times each word occurs: set f [open someFile.txt] set contents [read $f] close $f foreach word [regexp -all -inline {\w+} $contents] { dict incr count $word } puts "There are [dict size $count] unique words." foreach word [lsort -dictionary [dict keys $count]] { puts "${word}: [dict get $count $word] occurrences" } A localisable *string toupper* implementation: set capital [dict create C [dict create]] foreach c {abcdefghijklmnopqrstuvwxyz} { dict set capital C $c [string toupper $c] } dict set capital en [dict get $capital C] # ... and so on for other supported languages ... set upperCase [string map [dict get $capital $env(LANG)] $string COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- APPENDICES ============ /These appendices are not formally part of the proposal and exist merely to help understanding./ IMPLEMENTATION NOTES ---------------------- Implement using hash tables (of course.) Need efficient ways to convert to/from lists, perhaps making lists know what's going on underneath the covers? FUTURE DIRECTIONS ------------------- Alternate implementations of mappings, like trees or disk-backed databases? ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows