The work on the modularized core is currently on hiatus. However, before that happened, we briefly investigated the following possibilities for additional optional features, with estimates for amount of work and possible gain. It should also be noted that the sources of the modularized core are available through the Tcl CVS at SourceForge, under the branch-tag mod-8-3-4-branch. The license is the same as for the unmodified Tcl core itself.
The estimates are given in both Lines Of Code (LOC) for the sources, and a percentage of the total size of the static library. The lines of codes were counted using "wc -l". Nothing was done to take comments into account. This means that the percentages given below can be seriously off (overestimation) given the extensive commenting of the Tcl core code. The percentages are based on the contents of Table 7 and Table 8, which list the sizes of the various object files.
Object file | #byte | % of Total |
regcomp.o | 40368 | 8.30 |
tclExecute.o | 26540 | 5.45 |
tclIO.o | 25296 | 5.20 |
tclCmdMZ.o | 17160 | 3.53 |
tclBasic.o | 16656 | 3.42 |
tclVar.o | 16584 | 3.41 |
tclCompCmds.o | 16256 | 3.34 |
tclCompile.o | 16024 | 3.29 |
tclCmdAH.o | 14228 | 2.92 |
tclNamesp.o | 13712 | 2.82 |
tclUtf.o | 13396 | 2.75 |
tclDate.o | 12752 | 2.62 |
tclCmdIL.o | 12660 | 2.60 |
tclFileName.o | 12020 | 2.47 |
tclPosixStr.o | 10972 | 2.25 |
tclInterp.o | 10196 | 2.10 |
tclEncoding.o | 10008 | 2.06 |
regexec.o | 9296 | 1.91 |
tclParse.o | 9252 | 1.90 |
tclUnixChan.o | 9196 | 1.89 |
tclUtil.o | 8600 | 1.77 |
tclIOCmd.o | 7828 | 1.61 |
tclBinary.o | 7804 | 1.60 |
tclScan.o | 7380 | 1.52 |
tclProc.o | 7288 | 1.50 |
tclUnixFCmd.o | 6604 | 1.36 |
tclParseExpr.o | 6452 | 1.33 |
tclUnixInit.o | 6040 | 1.24 |
tclPipe.o | 6040 | 1.24 |
tclPkg.o | 5544 | 1.14 |
tclObj.o | 5520 | 1.13 |
tclCompExpr.o | 5512 | 1.13 |
tclFCmd.o | 5272 | 1.08 |
tclStringObj.o | 4728 | 0.97 |
tclUnixPipe.o | 4196 | 0.86 |
tclTimer.o | 4088 | 0.84 |
tclEvent.o | 4036 | 0.83 |
Total | 486648 | 100.00 |
Object file | #byte | % of Total |
tclListObj.o | 3760 | 0.77 |
tclIOGT.o | 3760 | 0.77 |
tclRegexp.o | 3700 | 0.76 |
tclResult.o | 3608 | 0.74 |
tclLoad.o | 3556 | 0.73 |
tclUnixFile.o | 3324 | 0.68 |
tclIOUtil.o | 3300 | 0.68 |
tclMain.o | 3296 | 0.68 |
tclHash.o | 3296 | 0.68 |
tclStubInit.o | 2960 | 0.61 |
tclLiteral.o | 2784 | 0.57 |
tclNotify.o | 2596 | 0.53 |
tclEnv.o | 2396 | 0.49 |
tclClock.o | 2304 | 0.47 |
tclLink.o | 2240 | 0.46 |
tclGet.o | 2164 | 0.44 |
regerror.o | 1972 | 0.41 |
tclUnixNotfy.o | 1784 | 0.37 |
tclIndexObj.o | 1668 | 0.34 |
tclPreserve.o | 1500 | 0.31 |
tclResolve.o | 1228 | 0.25 |
tclThread.o | 1216 | 0.25 |
tclUnixTime.o | 1160 | 0.24 |
tclCkalloc.o | 1116 | 0.23 |
tclLoadDl.o | 1044 | 0.21 |
tclAsync.o | 1028 | 0.21 |
tclIOSock.o | 992 | 0.20 |
tclStubLib.o | 980 | 0.20 |
tclHistory.o | 920 | 0.19 |
tclPanic.o | 876 | 0.18 |
tclAppInit.o | 776 | 0.16 |
tclUnixSock.o | 760 | 0.16 |
tclUnixEvent.o | 752 | 0.15 |
tclAlloc.o | 620 | 0.13 |
tclMtherr.o | 616 | 0.13 |
regfree.o | 560 | 0.12 |
tclUnixThrd.o | 532 | 0.11 |
Total | 486648 | 100.00 |
The whole interpreter (115 files, matching the glob pattern tcl/{generic,unix}/*.c) comes in at 3214256 LOC and 486648 Byte. This is 100 %.
File | Touched | |
tclBasic.c | 1 lines | |
tclInt.h | 2 lines | |
tclBinary.c | 1552 lines (all) | |
1555 lines | 0.04 % | |
binary | 1.60 % |
Alternative: Leave Tcl_ObjType ``tclByteArray'' in.
File | Touched | |
tclBinary.c | 1027 lines ( 2/3 of file) | |
1030 lines | 0.03 % | |
binary | 1.06 % |
File | Touched | |
tclBasic.c | 1 lines | |
tclInt.h | 2 lines | |
tclClock.c | 377 lines (all) | |
tclDate.c | 1873 lines (all) | |
2253 lines | 0.07 % | |
binary | 3.09 % |
File | Touched | |
tclBasic.c | 1 line | |
tclInt.h | 2 lines | |
tclPkg.c | 979 lines | |
982 lines | 0.03 % | |
binary | 1.14 % |
File | Touched | |
tclCmdMZ.c | 1331 lines | |
Lines at most | 0.04 % | |
binary | 1.58 % |
File | Touched | |
tclCmdIL.c | 678 lines (lsort) | |
Lines at most | 0.02 % | |
binary | 0.54 % |
File | Touched | |
tclCompCmds.c | 2043 lines (all) | |
tclCompExpr.c | 1051 lines (all) | |
tclCompile.c | 3414 lines (all) | |
Entrypoints ... | 300 lines (estim.) | |
6808 lines | 0.21 % | |
binary | 7.76 % |
File | Touched | |
tclCompCmds.c | 2043 lines (all) | |
tclCompExpr.c | 1051 lines (all) | |
tclCompile.c | 3414 lines (all) | |
tclExecute.c | 6412 lines (all) | |
Entrypoints ... | 300 lines (estim.) | |
13220 lines | 0.41 % | |
binary | 13.21 % |
File | Touched | |
regc_color.c | 17775 lines | |
regc_cvec.c | 5094 lines | |
regc_lex.c | 24495 lines | |
regc_locale.c | 34453 lines | |
regc_nfa.c | 36234 lines | |
regcomp.c | 59492 lines | |
rege_dfa.c | 17820 lines | |
regerror.c | 3515 lines | |
regexec.c | 28360 lines | |
regfree.c | 2086 lines | |
regfronts.c | 2394 lines | |
tclRegexp.c | 1029 lines | |
232747 lines | 7.24 % | |
binary | 11.5 % |
File | Touched |
tclNamepace.c | 3916 lines (mostly) |
tclVar.c | 4813 lines |
tclParse.c | 2357 lines |
tclParseExpr.c | 1870 lines |
various (set, proc) | 1000 lines |
A simple cut-out of this feature is not possible, we will rather have to rewrite parts of the parser, and of commands like set and proc to remove the special handling of the colon (:), the namespace separator character, from the system.
About 13956 LOC have to be touched for this, which is about 0.43 %. Circa 2.82 % of the binary are definitely removed.
File | Touched |
tclEncoding.c | 2871 lines |
How much is removed from the file above depends on the chosen model, of which we have two:
Not everything might be removed because we believe that the best way to remove UTF8 completely is to rewrite the Utf <-> External converter functions and throw away the rest. That way we don't have to think about all the other places which do UTF8.
If we remove this completely we have to touch many more places throughout the whole code, most notably the channel system. The latter would bring the number of LOCs removed or rewritten up, but also takes much longer. We currently have no good LOC estimate for this scenario.
As a first approximation we grep'ped the sources for ``Tcl_Utf'', which gives us 314 locations in 40 files. We guesstimate that each location translates into 1-4 lines of code touched directly. And depending on context maybe 5-20 others around each location which have to change too. That would be between 314 and 6280 LOC changed, i.e. replaced with different, non-UTF, code.
The number of 5-20 other lines depending on context might be an underestimation for the channel system. This part of the core will very likely need a complete reorganization to allow usage both with and without encodings. This would be 8389 lines changed in tclIO.c. Changed, not cut!
But also note the fact that tclIO.c is with 5.20 % of the binary also the third-largest file right now.
Summary: About 17540 LOC have to be touched for this, which is about 0.54 % of the whole sources.
While this subsystem is with 8.46 % of the binary code the third-largest part of the core after regular expressions and the engine for the execution of bytecodes, it also a tangled web and in our opinion at least very difficult to unravel.
Especially as it is heavily influenced by the choice of whether to use encodings or not, and also if it has to support the notifier or not, i.e. file events.
No estimates were made for this part of the core.
It was noted before that Source Navigator crashed when processing the Tcl sources. This not the case for the newest version, 5.1. This means that our ability to determine which parts of the code have to be made conditional, or are dependent on more than one feature is greatly enhanced. Of course, we will have to write special scripts which mine the dependency database for the information we need. This however is less difficult than searching through the sources by ourselves, and less error-prone.
Such help is especially important for a future up-port of the modularization changes to 8.4. The internal organization of the code has changed so much that the patches we could generate from the comparison of an unmodified versus an modularized 8.3.4 core are essentially useless. The only parts which can be lifted over relatively easily will be the changes to reduce the consumption of stack space.