dwheeler.com

Estimating Linux's Size

Assembly languages vary greatly in the comment character they use, so my counter had to handle this variance. I wrote a program which first examined the file to determine if C-style ``/*'' comments and C preprocessor commands (e.g., ``#include'') were used. If both ``/*'' and ``*/'' were in the file, it was assumed that C-style comments were used, since it is unlikely that both would be used as something else (e.g., as string data) in the same assembly language file. Determining if a file used the C preprocessor was trickier, since many assembly files do use ``#'' as a comment character and some preprocessor directives are ordinary words that might be included in a human comment. The heuristic used was: if #ifdef, #endif, or #include are used, the preprocessor is used; if at least three lines have either #define or #else, then the preprocessor is used. No doubt other heuristics are possible, but this at least seemed to produce reasonable results. The program then determined what the comment character was, by identifying which punctuation mark (from a set of possible marks) was the most common non-space initial character on a line (ignoring ``/'' and ``#'' if C comments or preprocessor commands, respectively, were used). Once the comment character had been determined, and it had been determined if C-style comments were also allowed, the lines of code could be counted in the file.

Although their values are not used in estimating effort, I also counted the number of files; summaries of these values are included in appendix B.

Since the Linux kernel was the largest single component, and I had questions about the various inconsistencies in the ``Halloween'' documents, I made additional measures of the Linux kernel.

Some have objected because the counting approach used here includes lines not compiled into code in this Linux distribution. However, the primary objective of these measures was to estimate total effort to develop all of these components. Even if some lines are not normally enabled on Linux, it still required effort to develop that code. Code for other architectures still has value, for example, because it enables users to port to other architectures while using the component. Even if that code is no longer being maintained (e.g., because the architecture has become less popular), nevertheless someone had to invest effort to create it, the results benefitted someone, and if it is needed again it's still there (at least for use as a starting point). Code that is only enabled by compile-time options still has value, because if the options were desired the user could enable them and recompile. Code that is only used for testing still has value, because its use improves the quality of the software directly run by users. It is possible that there is some ``dead code'' (code that cannot be run under any circumstance), but it is expected that this amount of code is very small and would not signficantly affect the results. Andi Kleen (of SuSE) noted that if you wanted to only count compiled and running code, one technique (for some languages) would be to use gcc's ``-g'' option and use the resulting .stabs debugging information with some filtering (to exclude duplicated inline functions). I determined this to be out-of-scope for this paper, but this approach could be used to make additional measurements of the system.

A.4 Estimating Effort and Costs

For each build directory, I totalled the source lines of code (SLOC) for each language, then totalled those values to determine the SLOC for each directory. I then used the basic Constructive Cost Model (COCOMO) to estimate effort. The basic model is the simplest (and least accurate) model, but I simply did not have the additional information necessary to use the more complex (and more accurate) models. COCOMO is described in depth by Boehm [1981].

Basic COCOMO is designed to estimate the time from product design (after plans and requirements have been developed) through detailed design, code, unit test, and integration testing. Note that plans and requirement development are not included. COCOMO is designed to include management overhead and the creation of documentation (e.g., user manuals) as well as the code itself. Again, see Boehm [1981] for a more detailed description of the model's assumptions.

In the basic COCOMO model, estimated man-months of effort, design through test, equals 2.4*(KSLOC)^1.05, where KSLOC is the total physical SLOC divided by 1000.

I assumed that each package was built completely independently and that there were no efforts necessary for integration not represented in the code itself. This almost certainly underestimates the true costs, but for most packages it's actually true (many packages don't interact with each other at all). I wished to underestimate (instead of overestimate) the effort and costs, and having no better model, I assumed the simplest possible integration effort. This meant that I applied the model to each component, then summed the results, as opposed to applying the model once to the grand total of all software.

Note that the only input to this model is source lines of code, so some factors simply aren't captured. For example, creating some kinds of data (such as fonts) can be very time-consuming, but this isn't directly captured by this model. Some programs are intentionally designed to be data-driven, that is, they're designed as small programs which are driven by specialized data. Again, this data may be as complex to develop as code, but this is not counted.

Another example of uncaptured factors is the difficulty of writing kernel code. It's generally acknowledged that writing kernel-level code is more difficult than most other kinds of code, because this kind of code is subject to a subtle timing and race conditions, hardware interactions, a small stack, and none of the normal error protections. In this paper I do not attempt to account for this. You could try to use the Intermediate COCOMO model to try to account for this, but again this requires knowledge of other factors that can only be guessed at. Again, the effort estimation probably significantly underestimates the actual effort represented here.

It's worth noting that there is an update to COCOMO, COCOMO II. However, COCOMO II requires as its input logical (not physical) SLOC, and since this measure is much harder to obtain, I did not pursue it for this paper. More information about COCOMO II is available at the web site http://sunset.usc.edu/research/COCOMOII/index.html. A nice overview paper where you can learn more about software metrics is Masse [1997].

I assumed that an average U.S. programmer/analyst salary in the year 2000 was $56,286 per year; this value was from the ComputerWorld, September 4, 2000's Salary Survey, Overhead is much harder to estimate; I did not find a definitive source for information on overheads. After informal discussions with several cost analysts, I determined that an overhead of 2.4 would be representative of the overhead sustained by a typical software development company. Should you diagree with these figures, I've provided all the information necessary to recalculate your own cost figures; just start with the effort estimates and recalculate cost yourself.

Appendix B. More Detailed Results

This appendix provides some more detailed results. B.1 lists the SLOC found in each build directory; B.2 shows counts of files for each category of file; B.3 presents some additional measures about the Linux kernel. B.4 presents some SLOC totals of putatively ``minimal'' systems. You can learn more at http://www.dwheeler.com/sloc.

B.1 SLOC in Build Directories The following is a list of all build directories, and the source lines of code (SLOC) found in them, followed by a few statistics counting files (instead of SLOC).

Remember that duplicate files are only counted once, with the build directory ``first in ASCII sort order'' receiving any duplicates (to break ties). As a result, some build directories have a smaller number than might at first make sense. For example, the ``kudzu'' build directory does contain code, but all of it is also contained in the ``Xconfigurator'' build directory.. and since that directory sorts first, the kudzu package is considered to have ``no code''.

The columns are SLOC (total physical source lines of code), Directory (the name of the build directory, usually the same or similar to the package name), and SLOC-by-Language (Sorted). This last column lists languages by name and the number of SLOC in that language; zeros are not shown, and the list is sorted from largest to smallest in that build directory. Similarly, the directories are sorted from largest to smallest total SLOC.

SLOC	Directory	SLOC-by-Language (Sorted)
1526722 linux           ansic=1462165,asm=59574,sh=2860,perl=950,tcl=414,
                        yacc=324,lex=230,awk=133,sed=72
1291745 XFree86-3.3.6   ansic=1246420,asm=14913,sh=13433,tcl=8362,cpp=4358,
                        yacc=2710,perl=711,awk=393,lex=383,sed=57,csh=5
720112  egcs-1.1.2      ansic=598682,cpp=75206,sh=14307,asm=11462,yacc=7988,
                        lisp=7252,exp=2887,fortran=1515,objc=482,sed=313,perl=18
652087  gdb-19991004    ansic=587542,exp=37737,sh=9630,cpp=6735,asm=4139,
                        yacc=4117,lisp=1820,sed=220,awk=142,fortran=5
625073  emacs-20.5      lisp=453647,ansic=169624,perl=884,sh=652,asm=253,
                        csh=9,sed=4
467120  binutils-2.9.5.0.22 ansic=407352,asm=27575,exp=12265,sh=7398,yacc=5606,
                        cpp=4454,lex=1479,sed=557,lisp=394,awk=24,perl=16
415026  glibc-2.1.3     ansic=378753,asm=30644,sh=2520,cpp=1704,awk=910,
                        perl=464,sed=16,csh=15
327021  tcltk-8.0.5     ansic=240093,tcl=71947,sh=8531,exp=5150,yacc=762,
                        awk=273,perl=265
247026  postgresql-6.5.3 ansic=207735,yacc=10718,java=8835,tcl=7709,sh=7399,
                        lex=1642,perl=1206,python=959,cpp=746,asm=70,csh=5,sed=2
235702  gimp-1.0.4      ansic=225211,lisp=8497,sh=1994
231072  Mesa            ansic=195796,cpp=17717,asm=13467,sh=4092
222220  krb5-1.1.1      ansic=192822,exp=19364,sh=4829,yacc=2476,perl=1528,
                        awk=393,python=348,lex=190,csh=147,sed=123
206237  perl5.005_03    perl=94712,ansic=89366,sh=15654,lisp=5584,yacc=921
205082  qt-2.1.0-beta1  cpp=180866,ansic=20513,yacc=2284,sh=538,lex=464,
                        perl=417
200628  Python-1.5.2    python=100935,ansic=96323,lisp=2353,sh=673,perl=342,
                        sed=2
199982  gs5.50          ansic=195491,cpp=2266,asm=968,sh=751,lisp=405,perl=101
193916  teTeX-1.0       ansic=166041,sh=10263,cpp=9407,perl=3795,pascal=1546,
                        yacc=1507,awk=522,lex=323,sed=297,asm=139,csh=47,lisp=29
155035  bind-8.2.2_P5   ansic=131946,sh=10068,perl=7607,yacc=2231,cpp=1360,
                        csh=848,awk=753,lex=222
140130  AfterStep-APPS-20000124 ansic=135806,sh=3340,cpp=741,perl=243
138931  kdebase         cpp=113971,ansic=23016,perl=1326,sh=618
138118  gtk+-1.2.6      ansic=137006,perl=479,sh=352,awk=274,lisp=7
138024  gated-3-5-11    ansic=126846,yacc=7799,sh=1554,lex=877,awk=666,csh=235,
                        sed=35,lisp=12
133193  kaffe-1.0.5     java=65275,ansic=62125,cpp=3923,perl=972,sh=814,
                        asm=84
131372  jade-1.2.1      cpp=120611,ansic=8228,sh=2150,perl=378,sed=5
128672  gnome-libs-1.0.55 ansic=125373,sh=2178,perl=667,awk=277,lisp=177
127536  pine4.21        ansic=126678,sh=766,csh=62,perl=30
121878  ImageMagick-4.2.9 ansic=99383,sh=11143,cpp=8870,perl=2024,tcl=458
119613  lynx2-8-3       ansic=117385,sh=1860,perl=340,csh=28
116951  mc-4.5.42       ansic=114406,sh=1996,perl=345,awk=148,csh=56
116615  gnumeric-0.48   ansic=115592,yacc=600,lisp=191,sh=142,perl=67,python=23
113272  xlispstat-3-52-17 ansic=91484,lisp=21769,sh=18,csh=1
113241  vim-5.6         ansic=111724,awk=683,sh=469,perl=359,csh=6
109824  php-3.0.15      ansic=105901,yacc=1887,sh=1381,perl=537,awk=90,cpp=28
104032  linuxconf-1.17r2 cpp=93139,perl=4570,sh=2984,java=2741,ansic=598
102674  libgr-2.0.13    ansic=99647,sh=2438,csh=589
100951  lam-6.3.1       ansic=86177,cpp=10569,sh=3677,perl=322,fortran=187,
                        csh=19
99066   krb4-1.0        ansic=84077,asm=5163,cpp=3775,perl=2508,sh=1765,
                        yacc=1509,lex=236,awk=33
94637   xlockmore-4.15  ansic=89816,cpp=1987,tcl=1541,sh=859,java=285,perl=149
93940   kdenetwork      cpp=80075,ansic=7422,perl=6260,sh=134,tcl=49
92964   samba-2.0.6     ansic=88308,sh=3557,perl=831,awk=158,csh=110
91213   anaconda-6.2.2  ansic=74303,python=13657,sh=1583,yacc=810,lex=732,
                        perl=128
89959   xscreensaver-3.23 ansic=88488,perl=1070,sh=401
88128   cvs-1.10.7      ansic=68303,sh=17909,perl=902,yacc=826,csh=181,lisp=7
87940   isdn4k-utils    ansic=78752,perl=3369,sh=3089,cpp=2708,tcl=22
85383   xpdf-0.90       cpp=60427,ansic=21400,sh=3556
81719   inn-2.2.2       ansic=62403,perl=10485,sh=5465,awk=1567,yacc=1547,
                        lex=249,tcl=3
80343   kdelibs         cpp=71217,perl=5075,ansic=3660,yacc=240,lex=116,
                        sh=35
79997   WindowMaker-0.61.1 ansic=77924,sh=1483,perl=371,lisp=219
78787   extace-1.2.15   ansic=66571,sh=9322,perl=2894
77873   apache_1.3.12   ansic=69191,sh=6781,perl=1846,cpp=55
75257   xpilot-4.1.0    ansic=68669,tcl=3479,cpp=1896,sh=1145,perl=68
73817   w3c-libwww-5.2.8 ansic=64754,sh=4678,cpp=3181,perl=1204
72726   ucd-snmp-4.1.1  ansic=64411,perl=5558,sh=2757
72425   gnome-core-1.0.55 ansic=72230,perl=141,sh=54
71810   jikes           cpp=71452,java=358
70260   groff-1.15      cpp=59453,ansic=5276,yacc=2957,asm=1866,perl=397,
                        sh=265,sed=46
69265   fvwm-2.2.4      ansic=63496,cpp=2463,perl=1835,sh=723,yacc=596,lex=152
69246   linux-86        ansic=63328,asm=5276,sh=642
68997   blt2.4g         ansic=58630,tcl=10215,sh=152
68884   squid-2.3.STABLE1 ansic=66305,sh=1570,perl=1009
68560   bash-2.03       ansic=56758,sh=7264,yacc=2808,perl=1730
68453   kdegraphics     cpp=34208,ansic=29347,sh=4898
65722   xntp3-5.93      ansic=60190,perl=3633,sh=1445,awk=417,asm=37
62922   ppp-2.3.11      ansic=61756,sh=996,exp=82,perl=44,csh=44
62137   sgml-tools-1.0.9 cpp=38543,ansic=19185,perl=2866,lex=560,sh=532,
                        lisp=309,awk=142
61688   imap-4.7        ansic=61628,sh=60
61324   ncurses-5.0     ansic=45856,ada=8217,cpp=3720,sh=2822,awk=506,perl=103,
                        sed=100
60429   kdesupport      ansic=42421,cpp=17810,sh=173,awk=13,csh=12
60302   openldap-1.2.9  ansic=58078,sh=1393,perl=630,python=201
57217   xfig.3.2.3-beta-1 ansic=57212,csh=5
56093   lsof_4.47       ansic=50268,sh=4753,perl=856,awk=214,asm=2
55667   uucp-1.06.1     ansic=52078,sh=3400,perl=189
54935   gnupg-1.0.1     ansic=48884,asm=4586,sh=1465
54603   glade-0.5.5     ansic=49545,sh=5058
54431   svgalib-1.4.1   ansic=53725,asm=630,perl=54,sh=22
53141   AfterStep-1.8.0 ansic=50898,perl=1168,sh=842,cpp=233
52808   kdeutils        cpp=41365,ansic=9693,sh=1434,awk=311,sed=5
52574   nmh-1.0.3       ansic=50698,sh=1785,awk=74,sed=17
51813   freetype-1.3.1  ansic=48929,sh=2467,cpp=351,csh=53,perl=13
51592   enlightenment-0.15.5 ansic=51569,sh=23
50970   cdrecord-1.8    ansic=48595,sh=2177,perl=194,sed=4
49370   tin-1.4.2       ansic=47763,sh=908,yacc=699
49325   imlib-1.9.7     ansic=49260,sh=65
48223   kdemultimedia   ansic=24248,cpp=22275,tcl=1004,sh=621,perl=73,awk=2
47067   bash-1.14.7     ansic=41654,sh=3140,yacc=2197,asm=48,awk=28
46312   tcsh-6.09.00    ansic=43544,sh=921,lisp=669,perl=593,csh=585
46159   unzip-5.40      ansic=40977,cpp=3778,asm=1271,sh=133
45811   mutt-1.0.1      ansic=45574,sh=237
45589   am-utils-6.0.3  ansic=33389,sh=8950,perl=2421,lex=454,yacc=375
45485   guile-1.3       ansic=38823,lisp=4626,asm=1514,sh=310,awk=162,csh=50
45378   gnuplot-3.7.1   ansic=43276,lisp=661,asm=539,objc=387,csh=297,perl=138,
                        sh=80
44323   mgetty-1.1.21   ansic=33757,perl=5889,sh=3638,tcl=756,lisp=283
42880   sendmail-8.9.3  ansic=40364,perl=1737,sh=779
42746   elm2.5.3        ansic=32931,sh=9774,awk=41
41388   p2c-1.22        ansic=38788,pascal=2499,perl=101
41205   gnome-games-1.0.51 ansic=31191,lisp=6966,cpp=3048
39861   rpm-3.0.4       ansic=36994,sh=1505,perl=1355,python=7
39160   util-linux-2.10f ansic=38627,sh=351,perl=65,csh=62,sed=55
38927   xmms-1.0.1      ansic=38366,asm=398,sh=163
38548   ORBit-0.5.0     ansic=35656,yacc=1750,sh=776,lex=366
38453   zsh-3.0.7       ansic=36208,sh=1763,perl=331,awk=145,sed=6
37515   ircii-4.4       ansic=36647,sh=852,lex=16
37360   tiff-v3.5.4     ansic=32734,sh=4054,cpp=572
36338   textutils-2.0a  ansic=18949,sh=16111,perl=1218,sed=60
36243   exmh-2.1.1      tcl=35844,perl=316,sh=49,exp=34
36239   x11amp-0.9-alpha3 ansic=31686,sh=4200,asm=353
35812   xloadimage.4.1  ansic=35705,sh=107
35554   zip-2.3         ansic=32108,asm=3446
35397   gtk-engines-0.10 ansic=20636,sh=14761
35136   php-2.0.1       ansic=33991,sh=1056,awk=89
34882   pmake           ansic=34599,sh=184,awk=58,sed=41
34772   xpuzzles-5.4.1  ansic=34772
34768   fileutils-4.0p  ansic=31324,sh=2042,yacc=841,perl=561
33203   strace-4.2      ansic=30891,sh=1988,perl=280,lisp=44
32767   trn-3.6         ansic=25264,sh=6843,yacc=660
32277   pilot-link.0.9.3 ansic=26513,java=2162,cpp=1689,perl=971,yacc=660,
                        python=268,tcl=14
31994   korganizer      cpp=23402,ansic=5884,yacc=2271,perl=375,lex=61,sh=1
31174   ncftp-3.0beta21 ansic=30347,cpp=595,sh=232
30438   gnome-pim-1.0.55 ansic=28665,yacc=1773
30122   scheme-3.2      lisp=19483,ansic=10515,sh=124
30061   tcpdump-3.4     ansic=29208,yacc=236,sh=211,lex=206,awk=184,csh=16
29730   screen-3.9.5    ansic=28156,sh=1574
29315   jed             ansic=29315
29091   xchat-1.4.0     ansic=28894,perl=121,python=53,sh=23
28897   ncpfs-2.2.0.17  ansic=28689,sh=182,tcl=26
28449   slrn-0.9.6.2    ansic=28438,sh=11
28261   xfishtank-2.1tp ansic=28261
28186   texinfo-4.0     ansic=26404,sh=841,awk=451,perl=256,lisp=213,sed=21
28169   e2fsprogs-1.18  ansic=27250,awk=437,sh=339,sed=121,perl=22
28118   slang           ansic=28118
27860   kdegames        cpp=27507,ansic=340,sh=13
27117   librep-0.10     ansic=19381,lisp=5385,sh=2351
27040   mikmod-3.1.6    ansic=26975,sh=55,awk=10
27022   x3270-3.1.1     ansic=26456,sh=478,exp=88
26673   lout-3.17       ansic=26673
26608   Xaw3d-1.3       ansic=26235,yacc=247,lex=126
26363   gawk-3.0.4      ansic=19871,awk=2519,yacc=2046,sh=1927
26146   libxml-1.8.6    ansic=26069,sh=77
25994   xrn-9.02        ansic=24686,yacc=888,sh=249,lex=92,perl=35,awk=31,
                        csh=13
25915   gv-3.5.8        ansic=25821,sh=94
25479   xpaint          ansic=25456,sh=23
25236   shadow-19990827 ansic=23464,sh=883,yacc=856,perl=33
24910   kdeadmin        cpp=19919,sh=3936,perl=1055
24773   pdksh-5.2.14    ansic=23599,perl=945,sh=189,sed=40
24583   gmp-2.0.2       ansic=17888,asm=5252,sh=1443
24387   mars_nwe        ansic=24158,sh=229
24270   gnome-python-1.0.51 python=14331,ansic=9791,sh=148
23838   kterm-6.2.0     ansic=23838
23666   enscript-1.6.1  ansic=22365,lex=429,perl=308,sh=291,yacc=164,lisp=109
22373   sawmill-0.24    ansic=11038,lisp=8172,sh=3163
22279   make-3.78.1     ansic=19287,sh=2029,perl=963
22011   libpng-1.0.5    ansic=22011
21593   xboard-4.0.5    ansic=20640,lex=904,sh=41,csh=5,sed=3
21010   netkit-telnet-0.16 ansic=14796,cpp=6214
20433   pam-0.72        ansic=18936,yacc=634,sh=482,perl=321,lex=60
20125   ical-2.2        cpp=12651,tcl=6763,sh=624,perl=60,ansic=27
20078   gd1.3           ansic=19946,perl=132
19971   wu-ftpd-2.6.0   ansic=17572,yacc=1774,sh=421,perl=204
19500   gnome-utils-1.0.50 ansic=18099,yacc=824,lisp=577
19065   joe             ansic=18841,asm=224
18885   X11R6-contrib-3.3.2 ansic=18616,lex=161,yacc=97,sh=11
18835   glib-1.2.6      ansic=18702,sh=133
18151   git-4.3.19      ansic=16166,sh=1985
18020   xboing          ansic=18006,sh=14
17939   sh-utils-2.0    ansic=13366,sh=3027,yacc=871,perl=675
17765   mtools-3.9.6    ansic=16155,sh=1602,sed=8
17750   gettext-0.10.35 ansic=13414,lisp=2030,sh=1983,yacc=261,perl=53,sed=9
17682   bc-1.05         ansic=9186,sh=7236,yacc=967,lex=293
17271   fetchmail-5.3.1 ansic=13441,python=1490,sh=1246,yacc=411,perl=321,
                        lex=238,awk=124
17259   sox-12.16       ansic=16659,sh=600
16785   control-center-1.0.51 ansic=16659,sh=126
16266   dhcp-2.0        ansic=15328,sh=938
15967   SVGATextMode-1.9-src ansic=15079,yacc=340,sh=294,lex=227,sed=15,
                        asm=12
15868   kpilot-3.1b9    cpp=8613,ansic=5640,yacc=1615
15851   taper-6.9a      ansic=15851
15819   mpg123-0.59r    ansic=14900,asm=919
15691   transfig.3.2.1  ansic=15643,sh=38,csh=10
15638   mod_perl-1.21   perl=10278,ansic=5124,sh=236
15522   console-tools-0.3.3 ansic=13335,yacc=986,sh=800,lex=291,perl=110
15456   rpm2html-1.2    ansic=15334,perl=122
15143   gnotepad+-1.1.4 ansic=15143
15108   GXedit1.23      ansic=15019,sh=89
15087   mm2.7           ansic=8044,csh=6924,sh=119
14941   readline-2.2.1  ansic=11375,sh=1890,perl=1676
14912   ispell-3.1      ansic=8380,lisp=3372,yacc=1712,cpp=585,objc=385,
                        csh=221,sh=157,perl=85,sed=15
14871   gnuchess-4.0.pl80 ansic=14584,sh=258,csh=29
14774   flex-2.5.4      ansic=13011,lex=1045,yacc=605,awk=72,sh=29,sed=12
14587   multimedia      ansic=14577,sh=10
14516   libgtop-1.0.6   ansic=13768,perl=653,sh=64,asm=31
14427   mawk-1.2.2      ansic=12714,yacc=994,awk=629,sh=90
14363   automake-1.4    perl=10622,sh=3337,ansic=404
14350   rsync-2.4.1     ansic=13986,perl=179,sh=126,awk=59
14299   nfs-utils-0.1.6 ansic=14107,sh=165,perl=27
14269   rcs-5.7         ansic=12209,sh=2060
14255   tar-1.13.17     ansic=13014,lisp=592,sh=538,perl=111
14105   wmakerconf-2.1  ansic=13620,perl=348,sh=137
14039   less-346        ansic=14032,awk=7
13779   rxvt-2.6.1      ansic=13779
13586   wget-1.5.3      ansic=13509,perl=54,sh=23
13504   rp3-1.0.7       cpp=10416,ansic=2957,sh=131
13241   iproute2        ansic=12139,sh=1002,perl=100
13100   silo-0.9.8      ansic=10485,asm=2615
12657   macutils        ansic=12657
12639   libungif-4.1.0  ansic=12381,sh=204,perl=54
12633   minicom-1.83.0  ansic=12503,sh=130
12593   audiofile-0.1.9 sh=6440,ansic=6153
12463   gnome-objc-1.0.2 objc=12365,sh=86,ansic=12
12313   jpeg-6a         ansic=12313
12124   ypserv-1.3.9    ansic=11622,sh=460,perl=42
11790   lrzsz-0.12.20   ansic=9512,sh=1263,exp=1015
11775   modutils-2.3.9  ansic=9309,sh=1620,lex=484,yacc=362
11721   enlightenment-conf-0.15 ansic=6232,sh=5489
11633   net-tools-1.54  ansic=11531,sh=102
11404   findutils-4.1   ansic=11160,sh=173,exp=71
11299   xmorph-1999dec12 ansic=10783,tcl=516
10958   kpackage-1.3.10 cpp=8863,sh=1852,ansic=124,perl=119
10914   diffutils-2.7   ansic=10914
10404   gnorpm-0.9      ansic=10404
10271   gqview-0.7.0    ansic=10271
10267   libPropList-0.9.1 sh=5974,ansic=3982,lex=172,yacc=139
10187   dump-0.4b15     ansic=9422,sh=760,sed=5
10088   piranha         ansic=10048,sh=40
10013   grep-2.4        ansic=9852,sh=103,awk=49,sed=9
9961    procps-2.0.6    ansic=9959,sh=2
9942    xpat2-1.04      ansic=9942
9927    procmail-3.14   ansic=8090,sh=1837
9873    nss_ldap-105    ansic=9784,perl=89
9801    man-1.5h1       ansic=7377,sh=1802,perl=317,awk=305
9741    Xconfigurator-4.3.5 ansic=9578,perl=125,sh=32,python=6
9731    ld.so-1.9.5     ansic=6960,asm=2401,sh=370
9725    gpm-1.18.1      ansic=8107,yacc=1108,lisp=221,sh=209,awk=74,sed=6
9699    bison-1.28      ansic=9650,sh=49
9666    ash-linux-0.2   ansic=9445,sh=221
9607    cproto-4.6      ansic=7600,lex=985,yacc=761,sh=261
9551    pwdb-0.61       ansic=9488,sh=63
9465    rdist-6.1.5     ansic=8306,sh=553,yacc=489,perl=117
9263    ctags-3.4       ansic=9240,sh=23
9138    gftp-2.0.6a     ansic=9138
8939    mkisofs-1.12b5  ansic=8939
8766    pxe-linux       cpp=4463,ansic=3622,asm=681
8572    psgml-1.2.1     lisp=8572
8540    xxgdb-1.12      ansic=8540
8491    gtop-1.0.5      ansic=8151,cpp=340
8356    gedit-0.6.1     ansic=8225,sh=131
8303    dip-3.3.7o      ansic=8207,sh=96
7859    libglade-0.11   ansic=5898,sh=1809,python=152
7826    xpm-3.4k        ansic=7750,sh=39,cpp=37
7740    sed-3.02        ansic=7301,sed=359,sh=80
7617    cpio-2.4.2      ansic=7598,sh=19
7615    esound-0.2.17   ansic=7387,sh=142,csh=86
7570    sharutils-4.2.1 ansic=5511,perl=1741,sh=318
7427    ed-0.2          ansic=7263,sh=164
7255    lilo            ansic=3522,asm=2557,sh=740,perl=433,cpp=3
7227    cdparanoia-III-alpha9.6 ansic=6006,sh=1221
7095    xgammon-0.98    ansic=6506,lex=589
7041    newt-0.50.8     ansic=6526,python=515
7030    ee-0.3.11       ansic=7007,sh=23
6976    aboot-0.5       ansic=6680,asm=296
6968    mailx-8.1.1     ansic=6963,sh=5
6877    lpr             ansic=6842,sh=35
6827    gnome-media-1.0.51 ansic=6827
6646    iputils         ansic=6646
6611    patch-2.5       ansic=6561,sed=50
6592    xosview-1.7.1   cpp=6205,ansic=367,awk=20
6550    byacc-1.9       ansic=5520,yacc=1030
6496    pidentd-3.0.10  ansic=6475,sh=21
6391    m4-1.4          ansic=5993,lisp=243,sh=155
6306    gzip-1.2.4a     ansic=5813,asm=458,sh=24,perl=11
6234    awesfx-0.4.3a   ansic=6234
6172    sash-3.4        ansic=6172
6116    lslk            ansic=5325,sh=791
6090    joystick-1.2.15 ansic=6086,sh=4
6072    kdoc            perl=6010,sh=45,cpp=17
6043    irda-utils-0.9.10 ansic=5697,sh=263,perl=83
6033    sysvinit-2.78   ansic=5256,sh=777
6025    pnm2ppa         ansic=5708,sh=317
6021    rpmfind-1.4     ansic=6021
5981    indent-2.2.5    ansic=5958,sh=23
5975    ytalk-3.1       ansic=5975
5960    isapnptools-1.21 ansic=4394,yacc=1383,perl=123,sh=60
5744    gdm-2.0beta2    ansic=5632,sh=112
5594    isdn-config     cpp=3058,sh=2228,perl=308
5526    efax-0.9        ansic=4570,sh=956
5383    acct-6.3.2      ansic=5016,cpp=287,sh=80
5115    libtool-1.3.4   sh=3374,ansic=1741
5111    netkit-ftp-0.16 ansic=5111
4996    bzip2-0.9.5d    ansic=4996
4895    xcpustate-2.5   ansic=4895
4792    libelf-0.6.4    ansic=3310,sh=1482
4780    make-3.78.1_pvm-0.5 ansic=4780
4542    gpgp-0.4        ansic=4441,sh=101
4430    gperf-2.7       cpp=2947,exp=745,ansic=695,sh=43
4367    aumix-1.30.1    ansic=4095,sh=179,sed=93
4087    zlib-1.1.3      ansic=2815,asm=712,cpp=560
4038    sysklogd-1.3-31 ansic=3741,perl=158,sh=139
4024    rep-gtk-0.8     ansic=2905,lisp=971,sh=148
3962    netkit-timed-0.16 ansic=3962
3929    initscripts-5.00 sh=2035,ansic=1866,csh=28
3896    ltrace-0.3.10   ansic=2986,sh=854,awk=56
3885    phhttpd-0.1.0   ansic=3859,sh=26
3860    xdaliclock-2.18 ansic=3837,sh=23
3855    pciutils-2.1.5  ansic=3800,sh=55
3804    quota-2.00-pre3 ansic=3795,sh=9
3675    dosfstools-2.2  ansic=3675
3654    tcp_wrappers_7.6 ansic=3654
3651    ipchains-1.3.9  ansic=2767,sh=884
3625    autofs-3.1.4    ansic=2862,sh=763
3588    netkit-rsh-0.16 ansic=3588
3438    yp-tools-2.4    ansic=3415,sh=23
3433    dialog-0.6      ansic=2834,perl=349,sh=250
3415    ext2ed-0.1      ansic=3415
3315    gdbm-1.8.0      ansic=3290,cpp=25
3245    ypbind-3.3      ansic=1793,sh=1452
3219    playmidi-2.4    ansic=3217,sed=2
3096    xtrojka123      ansic=3087,sh=9
3084    at-3.1.7        ansic=1442,sh=1196,yacc=362,lex=84
3051    dhcpcd-1.3.18-pl3 ansic=2771,sh=280
3012    apmd            ansic=2617,sh=395
2883    netkit-base-0.16 ansic=2883
2879    vixie-cron-3.0.1 ansic=2866,sh=13
2835    gkermit-1.0     ansic=2835
2810    kdetoys         cpp=2618,ansic=192
2791    xjewel-1.6      ansic=2791
2773    mpage-2.4       ansic=2704,sh=69
2758    autoconf-2.13   sh=2226,perl=283,exp=167,ansic=82
2705    autorun-2.61    sh=1985,cpp=720
2661    cdp-0.33        ansic=2661
2647    file-3.28       ansic=2601,perl=46
2645    libghttp-1.0.4  ansic=2645
2631    getty_ps-2.0.7j ansic=2631
2597    pythonlib-1.23  python=2597
2580    magicdev-0.2.7  ansic=2580
2531    gnome-kerberos-0.2 ansic=2531
2490    sndconfig-0.43  ansic=2490
2486    bug-buddy-0.7   ansic=2486
2459    usermode-1.20   ansic=2459
2455    fnlib-0.4       ansic=2432,sh=23
2447    sliplogin-2.1.1 ansic=2256,sh=143,perl=48
2424    raidtools-0.90  ansic=2418,sh=6
2423    netkit-routed-0.16 ansic=2423
2407    nc              ansic=1670,sh=737
2324    up2date-1.13    python=2324
2270    memprof-0.3.0   ansic=2270
2268    which-2.9       ansic=1398,sh=870
2200    printtool       tcl=2200
2163    gnome-linuxconf-0.25 ansic=2163
2141    unarj-2.43      ansic=2141
2065    units-1.55      ansic=1963,perl=102
2048    netkit-ntalk-0.16 ansic=2048
1987    cracklib,2.7    ansic=1919,perl=46,sh=22
1984    cleanfeed-0.95.7b perl=1984
1977    wmconfig-0.9.8  ansic=1941,sh=36
1941    isicom          ansic=1898,sh=43
1883    slocate-2.1     ansic=1802,sh=81
1857    netkit-rusers-0.16 ansic=1857
1856    pump-0.7.8      ansic=1856
1842    cdecl-2.5       ansic=1002,yacc=765,lex=75
1765    fbset-2.1       ansic=1401,yacc=130,lex=121,perl=113
1653    adjtimex-1.9    ansic=1653
1634    netcfg-2.25     python=1632,sh=2
1630    psmisc          ansic=1624,sh=6
1621    urlview-0.7     ansic=1515,sh=106
1604    fortune-mod-9708 ansic=1604
1531    netkit-tftp-0.16 ansic=1531
1525    logrotate-3.3.2 ansic=1524,sh=1
1473    traceroute-1.4a5 ansic=1436,awk=37
1452    time-1.7        ansic=1395,sh=57
1435    ncompress-4.2.4 ansic=1435
1361    mt-st-0.5b      ansic=1361
1290    cxhextris       ansic=1290
1280    pam_krb5-1      ansic=1280
1272    bsd-finger-0.16 ansic=1272
1229    hdparm-3.6      ansic=1229
1226    procinfo-17     ansic=1145,perl=81
1194    passwd-0.64.1   ansic=1194
1182    auth_ldap-1.4.0 ansic=1182
1146    prtconf-1.3     ansic=1146
1143    anacron-2.1     ansic=1143
1129    xbill-2.0       cpp=1129
1099    popt-1.4        ansic=1039,sh=60
1088    nag             perl=1088
1076    stylesheets-0.13rh perl=888,sh=188
1075    authconfig-3.0.3 ansic=1075
1049    kpppload-1.04   cpp=1044,sh=5
1020    MAKEDEV-2.5.2   sh=1020
1013    trojka          ansic=1013
987     xmailbox-2.5    ansic=987
967     netkit-rwho-0.16 ansic=967
953     switchdesk-2.1  ansic=314,perl=287,cpp=233,sh=119
897     portmap_4       ansic=897
874     ldconfig-1999-02-21 ansic=874
844     jpeg-6b         sh=844
834     ElectricFence-2.1 ansic=834
830     mouseconfig-4.4 ansic=830
816     rpmlint-0.8     python=813,sh=3
809     kdpms-0.2.8     cpp=809
797     termcap-2.0.8   ansic=797
787     xsysinfo-1.7    ansic=787
770     giftrans-1.12.2 ansic=770
742     setserial-2.15  ansic=742
728     tree-1.2        ansic=728
717     chkconfig-1.1.2 ansic=717
682     lpg             perl=682
657     eject-2.0.2     ansic=657
616     diffstat-1.27   ansic=616
592     netscape-4.72   sh=592
585     usernet-1.0.9   ansic=585
549     genromfs-0.3    ansic=549
548     tksysv-1.1      tcl=526,sh=22
537     minlabel-1.2    ansic=537
506     netkit-bootparamd-0.16 ansic=506
497     locale_config-0.2 ansic=497
491     helptool-2.4    perl=288,tcl=203
480     elftoaout-2.2   ansic=480
463     tmpwatch-2.2    ansic=311,sh=152
445     rhs-printfilters-1.63 sh=443,ansic=2
441     audioctl        ansic=441
404     control-panel-3.13 ansic=319,tcl=85
368     kbdconfig-1.9.2.4 ansic=368
368     vlock-1.3       ansic=368
367     timetool-2.7.3  tcl=367
347     kernelcfg-0.5   python=341,sh=6
346     timeconfig-3.0.3 ansic=318,sh=28
343     mingetty-0.9.4  ansic=343
343     chkfontpath-1.7 ansic=343
332     ethtool-1.0     ansic=332
314     mkbootdisk-1.2.5 sh=314
302     symlinks-1.2    ansic=302
301     xsri-1.0        ansic=301
294     netkit-rwall-0.16 ansic=294
290     biff+comsat-0.16 ansic=290
288     mkinitrd-2.4.1  sh=288
280     stat-1.5        ansic=280
265     sysreport-1.0   sh=265
261     bdflush-1.5     ansic=202,asm=59
255     ipvsadm-1.1     ansic=255
255     sag-0.6-html    perl=255
245     man-pages-1.28  sh=244,sed=1
240     open-1.4        ansic=240
236     xtoolwait-1.2   ansic=236
222     utempter-0.5.2  ansic=222
222     mkkickstart-2.1 sh=222
221     hellas          sh=179,perl=42
213     rhmask          ansic=213
159     quickstrip-1.1  ansic=159
132     rdate-1.0       ansic=132
131     statserial-1.1  ansic=121,sh=10
107     fwhois-1.00     ansic=107
85      mktemp-1.5      ansic=85
82      modemtool-1.21  python=73,sh=9
67      setup-1.2       ansic=67
56      shaper          ansic=56
52      sparc32-1.1     ansic=52
47      intimed-1.10    ansic=47
23      locale-ja-9     sh=23
16      AnotherLevel-1.0.1 sh=16
11      words-2         sh=11
7       trXFree86-2.1.2 tcl=7
0       install-guide-3.2.html (none)
0       caching-nameserver-6.2 (none)
0       XFree86-ISO8859-2-1.0 (none)
0       rootfiles       (none)
0       ghostscript-fonts-5.50 (none)
0       kudzu-0.36      (none)
0       wvdial-1.41     (none)
0       mailcap-2.0.6   (none)
0       desktop-backgrounds-1.1 (none)
0       redhat-logos    (none)
0       solemul-1.1     (none)
0       dev-2.7.18      (none)
0       urw-fonts-2.0   (none)
0       users-guide-1.0.72 (none)
0       sgml-common-0.1 (none)
0       setup-2.1.8     (none)
0       jadetex         (none)
0       gnome-audio-1.0.0 (none)
0       specspo-6.2     (none)
0       gimp-data-extras-1.0.0 (none)
0       docbook-3.1     (none)
0       indexhtml-6.2   (none)
ansic:    14218806 (80.55%)
cpp:       1326212 (7.51%)
lisp:       565861 (3.21%)
sh:         469950 (2.66%)
perl:       245860 (1.39%)
asm:        204634 (1.16%)
tcl:        152510 (0.86%)
python:     140725 (0.80%)
yacc:        97506 (0.55%)
java:        79656 (0.45%)
exp:         79605 (0.45%)
lex:         15334 (0.09%)
awk:         14705 (0.08%)
objc:        13619 (0.08%)
csh:         10803 (0.06%)
ada:          8217 (0.05%)
pascal:       4045 (0.02%)
sed:          2806 (0.02%)
fortran:      1707 (0.01%)
Total Physical Source Lines of Code (SLOC) = 17652561
Total Estimated Person-Years of Development = 4548.36
Average Programmer Annual Salary = 56286
Overhead Multiplier = 2.4
Total Estimated Cost to Develop = $ 614421924.71

B.2 Counts of Files For Each Category

There were 181,679 ordinary files in the build directory. The following are counts of the number of files (not the SLOC) for each language:

ansic:       52088 (71.92%)
cpp:          8092 (11.17%)
sh:           3381 (4.67%)
asm:          1931 (2.67%)
perl:         1387 (1.92%)
lisp:         1168 (1.61%)
java:         1047 (1.45%)
python:        997 (1.38%)
tcl:           798 (1.10%)
exp:           472 (0.65%)
awk:           285 (0.39%)
objc:          260 (0.36%)
sed:           112 (0.15%)
yacc:          110 (0.15%)
csh:            94 (0.13%)
ada:            92 (0.13%)
lex:            57 (0.08%)
fortran:        50 (0.07%)
pascal:          7 (0.01%)
Total Number of Source Code Files = 72428

In addition, when counting the number of files (not SLOC), some files were identified as source code files but nevertheless were not counted for other reasons (and thus not included in the file counts above). Of these source code files, 5,820 files were identified as duplicating the contents of another file, 817 files were identified as files that had been automatically generated, and 65 files were identified as zero-length files.

B.3 Additional Measures of the Linux Kernel

I also made additional measures of the Linux kernel. This kernel is Linux kernel version 2.2.14 as patched by Red Hat. The Linux kernel's design is reflected in its directory structure. Only 8 lines of source code are in its main directory; the rest are in descendent directories. Counting the physical SLOC in each subdirectory (or its descendents) yielded the following:
BUILD/linux/Documentation/      765
BUILD/linux/arch/            236651
BUILD/linux/configs/              0
BUILD/linux/drivers/         876436
BUILD/linux/fs/               88667
BUILD/linux/ibcs/             16619
BUILD/linux/include/         136982
BUILD/linux/init/              1302
BUILD/linux/ipc/               1757
BUILD/linux/kernel/            7436
BUILD/linux/ksymoops-0.7c/     3271
BUILD/linux/lib/               1300
BUILD/linux/mm/                6771
BUILD/linux/net/             105549
BUILD/linux/pcmcia-cs-3.1.8/  34851
BUILD/linux/scripts/           8357

I separately ran the CodeCount tools on the entire linux operating system kernel. Using the CodeCount definition of C logical lines of code, CodeCount determined that this version of the linux kernel included 673,627 logical SLOC in C. This is obviously much smaller than the 1,462,165 of physical SLOC in C, or the 1,526,722 SLOC when all languages are combined for Linux.

However, this included non-i86 code. To make a more reasonable comparison with the Halloween documents, I needed to ignore non-i386 code.

First, I looked at the linux/arch directory, which contained architecture-specific code. This directory had the following subdirectories (architectures): alpha, arm, i386, m68k, mips, ppc, s390, sparc, sparc64. I then computed the total for all of ``arch'', which was 236651 SLOC, and subtracted out linux/arch/i386 code, which totalled to 26178 SLOC; this gave me a total of non-i386 code in linux/arc as 210473 physical SLOC. I then looked through the ``drivers'' directory to see if there were sets of drivers which were non-i386. I identified the following directories, with the SLOC totals as shown:

linux/drivers/sbus/       22354
linux/drivers/macintosh/   6000
linux/drivers/sgi/         4402
linux/drivers/fc4/         3167
linux/drivers/nubus/        421
linux/drivers/acorn/      11850
linux/drivers/s390/        8653
Driver Total:              56847
Thus, I had a grand total on non-i86 code (including drivers and architecture-specific code) as 267320 physical SLOC. This is, of course, another approximation, since there's certainly other architecture-specific lines, but I believe that is most of it. Running the CodeCount tool on just the C code, once these architectural and driver directories are removed, reveals a logical SLOC of 570,039 of C code.

B.4 Minimum System SLOC

Most of this paper worries about counting an ``entire'' system. However, what's the SLOC size of a ``minimal'' system? Here's an attempt to answer that question.

Red Hat Linux 6.2, CD-ROM #1, file RedHat/base/comps, defines the ``base'' (minimum) Red Hat Linux 6.2 installation as a set of packages. The following are the build directories corresponding to this base (minimum) installation, along with the SLOC counts (as shown above). Note that this creates a text-only system:

Component                SLOC
anacron-2.1              1143
apmd                     3012
ash-linux-0.2            9666
at-3.1.7                 3084
authconfig-3.0.3         1075
bash-1.14.7             47067
bc-1.05                 17682
bdflush-1.5               261
binutils-2.9.5.0.22    467120
bzip2-0.9.5d             4996
chkconfig-1.1.2           717
console-tools-0.3.3     15522
cpio-2.4.2               7617
cracklib,2.7             1987
dev-2.7.18                  0
diffutils-2.7           10914
dump-0.4b15             10187
e2fsprogs-1.18          28169
ed-0.2                   7427
egcs-1.1.2             720112
eject-2.0.2               657
file-3.28                2647
fileutils-4.0p          34768
findutils-4.1           11404
gawk-3.0.4              26363
gd1.3                   20078
gdbm-1.8.0               3315
getty_ps-2.0.7j          2631
glibc-2.1.3            415026
gmp-2.0.2               24583
gnupg-1.0.1             54935
gpm-1.18.1               9725
grep-2.4                10013
groff-1.15              70260
gzip-1.2.4a              6306
hdparm-3.6               1229
initscripts-5.00         3929
isapnptools-1.21         5960
kbdconfig-1.9.2.4         368
kernelcfg-0.5             347
kudzu-0.36                  0
ldconfig-1999-02-21       874
ld.so-1.9.5              9731
less-346                14039
lilo                     7255
linuxconf-1.17r2       104032
logrotate-3.3.2          1525
mailcap-2.0.6               0
mailx-8.1.1              6968
MAKEDEV-2.5.2            1020
man-1.5h1                9801
mingetty-0.9.4            343
mkbootdisk-1.2.5          314
mkinitrd-2.4.1            288
mktemp-1.5                 85
modutils-2.3.9          11775
mouseconfig-4.4           830
mt-st-0.5b               1361
ncompress-4.2.4          1435
ncurses-5.0             61324
net-tools-1.54          11633
newt-0.50.8              7041
pam-0.72                20433
passwd-0.64.1            1194
pciutils-2.1.5           3855
popt-1.4                 1099
procmail-3.14            9927
procps-2.0.6             9961
psmisc                   1630
pump-0.7.8               1856
pwdb-0.61                9551
quota-2.00-pre3          3804
raidtools-0.90           2424
readline-2.2.1          14941
redhat-logos                0
rootfiles                   0
rpm-3.0.4               39861
sash-3.4                 6172
sed-3.02                 7740
sendmail-8.9.3          42880
setserial-2.15            742
setup-1.2                  67
setup-2.1.8                 0
shadow-19990827         25236
sh-utils-2.0            17939
slang                   28118
slocate-2.1              1883
stat-1.5                  280
sysklogd-1.3-31          4038
sysvinit-2.78            6033
tar-1.13.17             14255
termcap-2.0.8             797
texinfo-4.0             28186
textutils-2.0a          36338
time-1.7                 1452
timeconfig-3.0.3          346
tmpwatch-2.2              463
utempter-0.5.2            222
util-linux-2.10f        39160
vim-5.6                113241
vixie-cron-3.0.1         2879
which-2.9                2268
zlib-1.1.3               4087

Thus, the contents of the build directories corresponding to the ``base'' (minimum) installation totals to 2,819,334 SLOC.

A few notes are in order about this build directory total:

  1. Some of the packages listed by a traditional package list aren't shown here because they don't contain any code. Package "basesystem" is a pseudo-package for dependency purposes. Package redhat-release is just a package for keeping track of the base system's version number. Package "filesystem" contains a directory layout.
  2. ntsysv's source is in chkconfig-1.1.2; kernel-utils and kernel-pcmcia-cs are part of "linux". Package shadow-utils is in build directory shadow-19990827. Build directory util-linux includes losetup and mount. "dump" is included to include rmt.
  3. Sometimes the build directories contain more code than is necessary to create just the parts for the ``base'' system; this is a side-effect of how things are packaged. ``info'' is included in the base, so we count all of texinfo. The build directory termcap is counted, because libtermcap is in the base. Possibly most important, gcc (egcs) is there because libstdc++ is in the base.
  4. Sometimes a large component is included in the base, even though most of the time little of its functionality is used. In particular, the mail transfer agent ``sendmail'' is in the base, even though for many users most of sendmail's functionality isn't used. However, for this paper's purposes this isn't a problem. After all, even if sendmail's functionality is often underused, clearly that functionality took time to develop and that functionality is available to those who want it.
  5. My tools intentionally eliminated duplicates; it may be that a few files aren't counted here because they're considered duplicates of another build directory not included here. I do not expect this factor to materially change the total.
  6. Red Hat Linux is not optimized to be a ``small as possible'' distribution; their emphasis is on functionality, not small size. A working Linux distribution could include much less code, depending on its intended application. For example, ``linuxconf'' simplifies system configuration, but the system can be configured by editing its system configuration files directly, which would reduce the base system's size. This also includes vim, a full-featured text editor - a simpler editor with fewer functions would be smaller as well.

Many people prefer some sort of graphical interface; here is a minimal configuration of a graphical system, adding the X server, a window manager, and a few tools:

ComponentSLOC
XFree86-3.3.61291745
Xconfigurator-4.3.59741
fvwm-2.2.469265
X11R6-contrib-3.3.218885
These additional graphical components add 1,389,636 SLOC. Due to oddities of the way the initialization system xinitrc is built, it isn't shown here in the total, but xinitrc has so little code that its omission does not significantly affect the total.

Adding these numbers together, we now have a total of 4,208,970 SLOC for a ``minimal graphical system.'' Many people would want to add more components. For example, this doesn't include a graphical toolkit (necessary for running most graphical applications). We could add gtk+-1.2.6 (a toolkit needed for running GTK+ based applications), adding 138,118 SLOC. This would now total 4,347,088 for a ``basic graphical system,'' one able to run basic GTK+ applications.

Let's add a web server to the mix. Adding apache_1.3.12 adds only 77,873 SLOC. We now have 4,424,961 physical SLOC for a basic graphical system plus a web server.

We could then add a graphical desktop environment, but there are so many different options and possibilities that trying to identify a ``minimal'' system is hard to do without knowing the specific uses intended for the system. Red Hat defines a standard ``GNOME'' and ``KDE'' desktop, but these are intended to be highly functional (not ``minimal''). Thus, we'll stop here, with a total of 2.8 million physical SLOC for a minimal text-based system, and total of 4.4 million physical SLOC for a basic graphical system plus a web server.

References

[Boehm 1981] Boehm, Barry. 1981. Software Engineering Economics. Englewood Cliffs, N.J.: Prentice-Hall, Inc. ISBN 0-13-822122-7.

[Dempsey 1999] Dempsey, Bert J., Debra Weiss, Paul Jones, and Jane Greenberg. October 6, 1999. UNC Open Source Research Team. Chapel Hill, NC: University of North Carolina at Chapel Hill. http://www.ibiblio.org/osrt/develpro.html.

[DSMC] Defense Systems Management College (DSMC). Indirect Cost Management Guide: Navigating the Sea of Overhead. Defense Systems Management College Press, Fort Belvoir, VA 22060-5426. Available as part of the ``Defense Acquisition Deskbook.'' http://portal.deskbook.osd.mil/reflib/DTNG/009CM/004/009CM004DOC.HTM.

[FSF 2000] Free Software Foundation (FSF). What is Free Software?. http://www.gnu.org/philosophy/free-sw.html.

[Halloween I] Valloppillil, Vinod, with interleaved commentary by Eric S. Raymond. Aug 11, 1998. "Open Source Software: A (New?) Development Methodology" v1.00. http://www.opensource.org/halloween/halloween1.html.

[Halloween II] Valloppillil, Vinod and Josh Cohen, with interleaved commentary by Eric S. Raymond. Aug 11, 1998. "Linux OS Competitive Analysis: The Next Java VM?". v1.00. http://www.opensource.org/halloween/halloween2.html

[Kalb 1990] Kalb, George E. "Counting Lines of Code, Confusions, Conclusions, and Recommendations". Briefing to the 3rd Annual REVIC User's Group Conference, January 10-12, 1990. http://sunset.usc.edu/research/CODECOUNT/documents/3rd_REVIC.pdf

[Kalb 1996] Kalb, George E. October 16, 1996 "Automated Collection of Software Sizing Data" Briefing to the International Society of Parametric Analysts, Southern California Chapter. http://sunset.usc.edu/research/CODECOUNT/documents/ispa.pdf

[Masse 1997] Masse, Roger E. July 8, 1997. Software Metrics: An Analysis of the Evolution of COCOMO and Function Points. University of Maryland. http://www.python.org/~rmasse/papers/software-metrics.

[Miller 1995] Miller, Barton P., David Koski, Cjin Pheow Lee, Vivekananda Maganty, Ravi Murthy, Ajitkumar Natarajan, and Jeff Steidl. 1995. Fuzz Revisited: A Re-examination of the Reliability of UNIX Utilities and Services. http://www.cs.wisc.edu/~bart/fuzz/fuzz.html.

[Moody 2001] Moody, Glyn. 2001. Rebel Code. ISBN 0713995203.

[NAS 1996] National Academy of Sciences (NAS). 1996. Statistical Software Engineering. http://www.nap.edu/html/statsoft/chap2.html

[OSI 1999]. Open Source Initiative. 1999. The Open Source Definition. http://www.opensource.org/osd.html.

[Park 1992] Park, R. 1992. Software Size Measurement: A Framework for Counting Source Statements. Technical Report CMU/SEI-92-TR-020. http://www.sei.cmu.edu/publications/documents/92.reports/92.tr.020.html

[Perens 1999] Perens, Bruce. January 1999. Open Sources: Voices from the Open Source Revolution. "The Open Source Definition". ISBN 1-56592-582-3. http://www.oreilly.com/catalog/opensources/book/perens.html

[Raymond 1999] Raymond, Eric S. January 1999. ``A Brief History of Hackerdom''. Open Sources: Voices from the Open Source Revolution. http://www.oreilly.com/catalog/opensources/book/raymond.html.

[Schneier 2000] Schneier, Bruce. March 15, 2000. ``Software Complexity and Security''. Crypto-Gram. http://www.counterpane.com/crypto-gram-0003.html

[Shankland 2000a] Shankland, Stephen. February 14, 2000. "Linux poses increasing threat to Windows 2000". CNET News.com. http://news.cnet.com/news/0-1003-200-1549312.html.

[Shankland 2000b] Shankland, Stephen. August 31, 2000. "Red Hat holds huge Linux lead, rivals growing". CNET News.com. http://news.cnet.com/news/0-1003-200-2662090.html

[Stallman 2000] Stallman, Richard. October 13, 2000 "By any other name...". http://www.anchordesk.co.uk/anchordesk/commentary/columns/0,2415,7106622,00.html.

[Vaughan-Nichols 1999] Vaughan-Nichols, Steven J. Nov. 1, 1999. Can you Trust this Penguin? ZDnet. http://www.zdnet.com/sp/stories/issue/0,4537,2387282,00.html

[Wheeler 2000a] Wheeler, David A. 2000. Open Source Software / Free Software References. http://www.dwheeler.com/oss_fs_refs.html.

[Wheeler 2000b] Wheeler, David A. 2000. Quantitative Measures for Why You Should Consider Open Source / Free Software. http://www.dwheeler.com/oss_fs_why.html.

[Zoebelein 1999] Zoebelein. April 1999. http://leb.net/hzo/ioscount.

This paper is (C) Copyright 2000 David A. Wheeler. All rights reserved. You may download and print it for your own personal use, and of course you may link to it. When referring to the paper, please refer to it as ``Estimating GNU/Linux's Size'' by David A. Wheeler, located at http://www.dwheeler.com/sloc. Please give credit if you refer to any of its techniques or results.