TIP #155: Fix Some of the Text Widget's Limitations

TIP:	155
Title:	Fix Some of the Text Widget's Limitations
Version:	$Revision: 1.23 $
Author:	Vince Darley <vince at santafe dot edu>
State:	Final
Type:	Project
Tcl-Version:	8.5
Vote:	Done
Created:	Monday, 08 September 2003

Abstract

Tk's text widget is very powerful, but has a number of known limitations. In particular the entire handling of wrapped lines and 'display/visual entities' versus 'logical entities' is quite limited. The most obvious side-effect of these inadequacies is the 'scrollbar problem' (in which, particularly when there are long wrapped lines in the widget, the vertical scrollbar slider changes in size depending on the number of logical lines currently displayed, see http://mini.net/tcl/896 for example). This TIP overhauls the widget to provide consistent, complete support for 'display lines', 'display indices' and as a consequence smooth, pixel-based scrolling. A few other small bugs/issues have also been resolved.

Proposal

The text widget has a number of limitations:

The aforementioned scrollbar interaction is flawed
To count the number of characters between index positions $idx1 and $idx2, one can only really do string length [.text get $idx1 $idx2]. There is no easy way to determine the number of visible (non-elided) characters between these two index positions, nor the number of valid index positions between them (remember that embedded windows or images always take up one unit of index position, but don't correspond to any characters). A similar difficulty exists in counting the number of display lines between two index positions, and in counting the number of pixels between two index positions (or in the entire widget).
Performing a correct text "replace" operation (as used by a text editor, for example) is difficult, because combinations of insert/delete tend to make the window scroll and/or leave the insertion cursor in an unnatural place.
There is no way to configure the widget to get an acceptable block-cursor.
When long lines are wrapped there is no easy way to get the beginning or end of a possible display line, or move up or down by display lines, unless the line is actually currently displayed (and even then the code is rather complex).
Even though 'search' can operate optionally on all text or just non-elided text, there is no easy way to retrieve the actual string which matched in the latter case, if the match spans characters on either side of an elided range.
.text search -backwards -all returns subsets of indices in forwards rather than backwards order; with simple multi-line greedy searches (like -nolinestop -- .*) it fails to match multiple lines; it can return backwards matches which fully enclose each other, etc.

This TIP is, therefore, to fix these limitations, as follows:

Make internal changes to the text widget so it keeps track of the number of vertical display pixels in each logical line, and uses that information to calculate scrollbar interactions, to provide a better user experience, including smooth scrolling. This requires an extension to the text widget's yview command to do smooth scrolling: .text yview scroll N pixels
Add a count widget subcommand, which calculates the number of characters or index positions or non-elided characters/index positions or lines or displaylines or x/y-pixels between two given indices. Similar extend +N chars to allow index calculations by chars, indices and elided or not.
Add a replace widget subcommand, which performs a combined delete/insert operation while ensuring that the insertion position, vertical display position, and undo/redo information is correctly set.
Add a -blockcursor configuration option which makes the widget use a rectangular flashing "block" rather than a thin beam for the insertion point.
Add displaylinestart and displaylineend and +/- N displaylines index offsets which work like linestart, lineend, +/- N lines but with display lines, and which work whether the relevant indices are currently displayed or not. And make tk::TextUpDownLine operate in terms of display lines.
Add an option -displaychars to get which, if given, restricts the returned characters to be just those which are not elided.
Fix 'search -all -backwards' so that it returns the matches in the correct backwards sequence, even within each individual line. Also add the options '-overlap' (to allow matches which overlap each other to be returned) and '-strictlimits' to prevent matches which would extend beyond the strict [pos,limit] range. Fix code for correct greedy forwards and backwards searches using correct non-overlapping matching unless explicitly requested.

Each of these proposals is discussed and presented in full detail below.

In addition, the current implementation provides for a bit more text widget objectification and fixes the "very slow deletion of lots of text with lots of tags" bug (there was a non-linear slowdown which has been removed). One crashing bug in the text widget has also been fixed, and a bug in searches with '-all'.

Scrollbar Interaction

This is the most complex of the proposed changes, even though solving it results in very limited changes to the actual text-widget's public Tcl or C interfaces. The following example illustrates the problem:

 pack [scrollbar .s -command {.t yview}] -side right -fill y
 pack [text .t -yscrollcommand {.s set}] -side left

 for {set i 0} {$i < 300} {incr i} {
   .t insert insert $i
 }

 for {set i 0} {$i < 20} {incr i} {
   .t insert insert "$i\n"
 }

 for {set i 0} {$i < 300} {incr i} {
   .t insert insert $i
 }

Solving this problem perfectly must require the text widget to know exactly how many vertical pixels each line contains. However, for a widget which contains a great deal (megabytes) of text, images, embedded windows and numerous tags (and on which certain tags could have their font configuration changed on the fly), it is clearly impractical to have the widget calculate every line's pixel-height requirements whenever anything changes (for bad cases the response-time would be unacceptable).

The solution proposed is an asynchronous update mechanism where the structure representing each logical line caches the last known pixel height and an epoch counter. As changes (global or local) are made, individual logical lines recalculate their height either immediately (for small, local changes such as inserting a few characters) or are scheduled for recalculation (for larger changes such as resizing the window or changing size-influencing tag settings). When blocks of lines (or, indeed, all lines) are scheduled for recalculation, each asynchronous callback should only recalculate the pixel heights of a relatively small number of lines, so that user-responsiveness is maintained.

As line pixel-height calculations are updated, a second asynchronous mechanism is triggered, this one to update any scrollbar attached to the text widget (i.e. call the -yscrollcommand callback). Again it is undesirable to call this every single time a single line's height is updated, so this is called with a timer mechanism so it is updated every 200ms.

Both of these asynchronous mechanisms should be designed so they do not need to be run if nothing relevant has changed in the widget.

It may require some experimentation to determine the most appropriate usage patterns of these asynchronous callbacks. In particular, the current implementation uses timer callbacks (idle callbacks cannot be used because idle callbacks are not allowed to reschedule themselves). A thread-based implementation may have some advantages (although it certainly has disadvantages too).

Given the pixel calculations are available, they have been used to provide for smooth scrolling of the text widget. This means even with large images (possibly even larger than the height of the widget itself), smooth scrolling off top and bottom of the widget are automatic. In order for smooth scrolling to be used by Tk's Tcl library, the pixels unit has been added to the .text yview scroll command. This is used to ensure mouse-wheel and scan-drag scrolling are smooth. It is important that the existing pages and units (the latter being display lines) do not change in meaning so that, for example, clicking on a scrollbar's arrow still scrolls the widget by 1 display line.

Count Subcommand

A new subcommand .text count ?options? startIndex endIndex is added for all text widgets. Valid options (any combination of which can be specified) are -chars, -indices, -displaychars, -displayindices, -lines, -displaylines, -xpixels, -ypixels. The default value, if no option is specified, is -indices. In addition -update can be specified. If given this option ensures all subsequent options operate on a range of indices for which all metric calculations (e.g. line pixel heights) are up to date. This option is of particular importance to the -ypixels option.

The subcommand counts the number of relevant things between the two indices. If startIndex is after endIndex, the result will be a negative number. The actual items which are counted depend on the option given, as follows:

-indices: count all characters and embedded windows or images (i.e. everything which counts in text-widget index space), whether they are elided or not.
-chars: count all characters, whether elided or not. Do not count embedded windows or images.
-displaychars: count all non-elided characters.
-displayindices: count all non-elided characters, windows and images.
-lines: count all logical lines (this option is only included for completeness since it can be achieved pretty easily in Tk already)
-displaylines: count all displaylines.
-xpixels: count the number of horizontal pixels between the two indices.
-ypixels: count the number of vertical pixels between the two indices.

If more than one of these counting options is given, the result is a list with one entry for each such option given (-update does not contribute to this result, of course).

Similar, the index modifiers are extended so the following are now valid: "+N display chars", "+N indices", "+N display indices", "+N any indices", "+N any chars". The "display" and "any" can be abbreviated if they are followed by whitespace.

In particular, this means that:

 string length [.text get $i1 $i2] == [.text count -chars $i1 $i2]

provided $i1 is not after $i2 in the widget. It also means that

 .text compare "$i1 + [.text count -indices $i1 $i2]indices" == $i2

and

 .text compare "$i1 + [.text count -displayindices $i1 $i2] display indices" == $i2

is true under all circumstances.

Lastly, for those who wish to know the required number of vertical pixels a text widget needs, .text count -update -ypixels 1.0 end will give this (once any borderwidth, highlightthickness and ypad have been added on). For example, the following procedure will give the smallest possible size in pixels which a given text widget would desire to have no scrollbars:

proc textDetermineDimensions {text} {
    set border [$text cget -highlightthickness]
    incr border [$text cget -borderwidth]
    set y [expr {2* ($border + [$text cget -pady])}]
    set x [expr {2* ($border + [$text cget -padx])}]
    
    incr y [$text count -update -ypixels 1.0 end]
    if {[$text cget -wrap] eq "none"} {
	set max_x 0
	for {set i 0} {$i < int([$text index end])} {incr i} {
	    set xpix [$text count -xpixels ${i}.0 "${i}.0 linend"]
	    if {$xpix > $max_x} {
		set max_x $xpix
	    }
	}
	incr max_x $x
	return [list $x $y]
    } else {
	return [list [winfo reqwidth $text] $y]
    }
}

Replace Subcommand

A new subcommand

 .text replace index1 index2 chars ?tagList chars tagList ...?

is added for all text widgets. This subcommand is approximately equivalent to a combined:

 .text delete index1 index2
 .text insert index1 chars ?tagList chars tagList ...?

but also ensures that the current window display position (e.g. what line is currently displayed at the top left corner of the text widget), the current insertion position and the undo/redo stack are all correctly set up. This subcommand could be implemented in pure Tcl, but is quite complex to get right. The C-level implementation shares most of the code with insert/delete/count and can also be made more efficient in its handling of window-scrolling issues.

Block Cursor

A new text widget configuration option -blockcursor <boolean> is added. If set to any true value, instead of a thin flashing vertical bar being used for the insertion cursor, a full rectangular block is used instead.

Display-Line Handling

Currently one can discover the beginning or end of a given logical line and move between logical lines with:

 .text index "$idx linestart"
 .text index "$idx lineend"
 .text index "$idx +1lines"
 .text index "$idx -5lines"

However, when a given logical line may be wrapped over numerous display lines it is not so simple to find the beginning or end of a display line, or move up or down by display lines. (In fact is is currently possible with the @ syntax if the logical line and the $idx are both currently displayed on screen, but is not possible(*) if the logical line is not currently displayed). Therefore a new index manipulation set are added:

 .text index "$idx display linestart"
 .text index "$idx display lineend"
 .text index "$idx +1display lines"
 .text index "$idx -5display lines"

These work whether or not $idx is currently displayed (when it is not displayed, the index is calculated by laying out the geometry of the line behind the scenes so this operation is certainly more time-consuming than determining the logical line start or end). The "display" in these items can be abbreviated if it is followed by whitespace (if it is not abbreviated then the whitespace is optional).

In addition the single Tcl proc tk::TextUpDownLine (in text.tcl) has been updated to operate in terms of display lines and thereby to retain the current x position as accurately as possible across multiple up/down arrow keypresses.

(*) Perhaps it would be possible, if horribly cumbersome, by copying the relevant contents into another text widget which is unmapped and making sure the desired lines are visible in that widget, and finally performing the desired operations on the copy before deleting it all!

Get -displaychars

This is a simple new option to the get subcommand, which now has the syntax .text get ?-displaychars? ?--? index1 ?index2....?. This means the following code now makes sense:

 set found [.text search -elide -count num $pattern $pos]
 set match [.text get -displaychars $found "$found + $num chars"]

Previously, achieving something like the second line was quite complex.

Search subcommand

In Tk8.5a0 at present search -all -backwards ... returns the list of indices backwards from line to line, but forwards within each line (a side-effect of backwards matching being implemented as repeated forward searches). Large backwards or forwards regexp searches for, say, -nolinestop -- .* would only match a single line. Various other overlap vs non-overlap problems too. All of these glitches (my own code ;-) have been fixed and the test suite for search hugely extended.

Backward Compatibility

All of the above changes simply extend the functionality of the text widget in new ways, and therefore have no significant backward compatibility problems. It is possible that some existing Tk code may notice some minor behavioural differences:

Since the interaction between text widget and vertical scrollbar is now slightly different, any code which assumed that a particular scrollbar position (e.g. 0.5) corresponds to a particular line of text will find that that line of text may now be different (and will of course depend on the actual line heights). This change is considered a bug fix, not an incompatibility! In particular, however, any code which performs a calculation like:

 set num_lines [lindex [split [.text index end] .] 0]
 set last_visible [expr {int($num_lines *[lindex [.text yview] 1])}]

(under the assumption that the scrollbar's units are effectively measured in logical lines) will now get a different answer (since the scrollbar operates with pixels), and that different answer will not correspond to the last visible line. Of course the code should always have used:

 set last_visible
   [expr {int([.text index "@[winfo width .text],[winfo height .text]"])}]

which works correctly no matter how the scrollbar operates.

In addition, with the new pixels unit for the scrollbar, the command .text yview scroll 1 p is now ambiguous and will throw an error. This incompatibility is similar to those which have previously been introduced in, e.g., Tk 8.4 (with -padx, -pady).

Lastly, "+N chars" is a synonym for "+N indices" for backwards compatibility reasons. This may be different to "+N any chars".

Implementation

A complete implementation is available at:

http://sf.net/tracker/?func=detail&aid=791292&group_id=12997&atid=312997

This passes all Tk text widget tests (on Windows XP, at least), including a significant number of new tests. The memory requirements of the text widget have increased marginally to support the correct vertical scrolling behaviour: two new integers must be stored for each logical line of text.

Out of Scope

A number of other text widget enhancements might be nice. Some of these are listed here for completeness:

Printing support (at least to postscript like the canvas widget -- in fact I'm sure 95% of the canvas printing code could be re-used for this, with the only new stuff being handling of multiple pages. It might be preferable to support pdf instead, however)
Cloning (e.g. reading in the result of .text -dump, although handling of tag definitions and tag-bindings would also be nice)
Synchronizing/backing the widget contents with a Tcl variable (actually not too hard to implement quite efficiently now, given the count functionality). This can be done without too much trouble with pure Tcl.
Loading data into the widget asynchronously from a channel (easy to implement with Tcl procedures, although core support might make for better scrollbar handling and thereby make such a feature almost invisible to the user).
Modifications to the horizontal scrollbar. The horizontal scrollbar's position/status depends only on the currently-displayed contents of the text widget. This means as one scrolls the widget vertically, the horizontal scrollbar changes. No attempt has been made to fix this problem in this TIP.
A shorthand (display start, display end?) for the rather less obvious .text index "@0,0" and .text index "@[winfo width .text],[winfo height .text]"
The attachment of user-defined data to each logical line in the widget (e.g. a code parser's current internal state), which might make true parsing for syntax colouring significantly easier.
The pre-existing behaviour of +/- N lines uses byte-indices instead of x-pixel calculations (thereby having the cursor bobbling around when there are multi-byte characters or even worse when there are images, proportional fonts, tabs, etc). Further, the behaviour is quite strange when wrapping is enabled in the widget. This TIP does not propose any changes in this area other than to suggest that Tcl coders make use of 'displaylines' instead, for more consistent behaviour (as has been done to text.tcl).
To do word-matching with search requires the use of a regexp pattern and something like .text search -regexp -- "\\m[quote::Regfind $string]\\M" $pos. It might be nice to add a flag to control word-matching without the need for such manipulations.

None of these is included in the current TIP or current implementation. If interested members of the community wish to extend this TIP or submit further TIPs to handle any of these enhancements, they are very welcome (and the author is happy to help coordinate where possible).

Copyright

This document has been placed in the public domain.

[Index] [History] [HTML Format] [Source Format] [LaTeX Format] [Text Format] [XML Format] [*roff Format (experimental)] [RTF Format (experimental)]

TIP AutoGenerator - written by Donal K. Fellows