web.archive.org

spreadsheet: Definition and Much More from Answers.com

️Fri Apr 22 2005

A spreadsheet is a rectangular table (or grid) of information, often financial information. The word came from "spread" in its sense of a newspaper or magazine item (text and/or graphics) that covers two facing pages, extending across the center fold and treating the two pages as one large one. The compound word "spread-sheet" came to mean the format used to present bookkeeping ledgers—with columns for categories of expenditures across the top, invoices listed down the left margin, and the amount of each payment in the cell where its row and column intersect—which were traditionally a "spread" across facing pages of a bound ledger (book for keeping accounting records) or on oversized sheets of paper ruled into rows and columns in that format and approximately twice as wide as ordinary paper.

History

Early implementations

Batch spreadsheets

One of the first commercial uses of computers was in processing payroll and other financial records, so the programs (and, indeed, the programming languages themselves) were designed to generate reports in the standard "spreadsheet" format bookkeepers and accountants used. As computers became more available and affordable in the last quarter of the 20th century, more software became available for them, and programs to keep financial records and generate spreadsheet reports were always in demand. Those spreadsheet programs can be used to tabulate many kinds of information, not just financial records, so the term "spreadsheet" has developed a more general meaning as information presented in a rectangular table, usually generated by a computer.

The concept of an electronic spreadsheet was outlined in the 1961 paper "Budgeting Models and System Simulation" by Richard Mattessich. Some credit for the computerized spreadsheet perhaps belongs to Rene K. Pardo and Remy Landau, who filed U.S. Patent on some of the related algorithms in 1970. While the patent was initially rejected by the patent office as being a purely mathematical invention, Pardo and Landau won a court case in 1983 establishing that "something does not cease to become patentable merely because the point of novelty is in an algorithm." This case helped establish the viability of software patents.

Autoplan/Autotab

In 1968, three former employees from the General Electric computer company headquartered in Phoenix, Arizona set out to start their own software development house. A. Leroy Ellison, Harry N. Cantrell, and Russell E. Edwards found themselves doing a large number of calculations when making tables for the business plans that they were presenting to venture capitalists. They decided to save themselves a lot of effort and wrote a computer program that produced their tables for them. This program, originally conceived as a simple utility for their personal use, would turn out to be the first software product offered by the company that would become known as Capex Corporation. The program ran on GE’s Time-sharing service and was dubbed "AutoPlan". Soon afterward, a version that ran on IBM mainframes was introduced under the name "AutoTab". (National CSS offered a similar product, CSSTAB, which had a moderate timesharing user base by the early 70s. A major application was opinion research tabulation.)

AutoPlan/AutoTab was not a WYSIWYG interactive spreadsheet program. It was more like a simple scripting language for spreadsheets. The user defined the names and labels for the rows and columns, then the formulas that defined each row or column. The basic processing was as follows; if row formulas were defined, the program looped through the formulae for each column from left to right; if column formulae were defined, the program looped through the formulae for each row from top to bottom. There were many refinements available.

Capex Corporation was swallowed up by Computer Associates in 1982, the first link in CA’s long chain of acquisitions. AutoPlan had pretty much disappeared along with the GE timesharing service, and AutoTab was at best a minor product by then. AutoTab was never offered under the CA company name.

Interactive spreadsheets

It was not until the ready availability of visual display units ("VDU's") that fully interactive spreadsheets became possible. Earlier implementations were mainly designed around batch programs. In the early 1970s text based VDU's began to be used as input/output devices for interactive transaction processes. It was several years later before full function graphic user interfaces were widely available for new user interface paradigms such as spreadsheets.

A number of innovative timesharing applications were built in 60s, 70s, and early 80s that anticipated some of the user interface elements eventually popularized in PC spreadsheets. Some were developed by the commercial computer timesharing industry; others were academic projects; and yet others were built by large computer users to meet in-house needs.^[1] The lack of on-line historical material relating to such systems, and their limited coverage in academic and commercial publications, makes it hard to assess their level of innovation and ultimate impact. Throughout the industry's history, there have always been clever engineers working to build better user interfaces, and few development projects have occurred in a vacuum without inspiration from prior art. Nevertheless, the history of spreadsheets seems most strongly influenced by the handful of products and technologies that became well-known.

An example of an early "industrial weight" spreadsheet was APLDOT, developed in 1976 at the United States Railway Association on an IBM 360/91, running at The Johns Hopkins University Applied Physics Laboratory in Laurel, MD.^[2] The application was used successfully for many years in developing such applications as financial and costing models for the US Congress and for Conrail. APLDOT was dubbed a "spreadsheet" because financial analysts and strategic planners used it to solve the same problems they addressed with paper spreadsheet pads. All software development was in the public domain; the software system underwent a court challenge in US Government vs Penn Central et al. in 1978 and 1979.^{[citation needed]}

VisiCalc

The spreadsheet concept became widely known in the late 1970s and early 1980s because of Dan Bricklin's implementation of VisiCalc.

Bricklin has spoken of watching his university professor create a table of calculation results on a blackboard. When the professor found an error, he had to tediously erase and rewrite a number of sequential entries in the table, triggering Bricklin to think that he could replicate the process on a computer, using the blackboard as the model to view results of underlying formulas. His idea became VisiCalc, the first application that turned the personal computer from a hobby for computer enthusiasts into a business tool.

Screenshot of VisiCalc, the first PC spreadsheet.

VisiCalc went on to become the first "killer app", an application that was so compelling, people would buy a particular computer just to own it. In this case the computer was the Apple II, and VisiCalc was no small part in that machine's success. The program was later ported to a number of other early computers, notably CP/M machines, the Atari 8-bit family and various Commodore platforms. Nevertheless, VisiCalc remains best known as "an Apple II program".

The acceptance of the IBM PC following its introduction in August, 1981, began slowly, because most of the programs available for it were ports from other 8-bit platforms. Things changed dramatically with the introduction of Lotus 1-2-3 in November, 1982, and release for sale in January, 1983. It became that platform's killer app, and drove sales of the PC due to the improvements in speed and graphics compared to VisiCalc. VisiCorp was unable to respond competitively, and disappeared within a few years.

Lotus 1-2-3, Quattro, and Microsoft Excel

Lotus 1-2-3, along with its erstwhile competitor Borland Quattro, soon displaced VisiCalc; but they in turn faced a similar fate as Microsoft expanded its control of the PC desktop. Microsoft had been developing Excel on the Macintosh platform for several years at this point, where it had developed into a fairly powerful system. A port of Excel to Windows 2.0 resulted in a fully functional Windows spreadsheet. The more robust Windows 3.x platforms of the early 1990s made it possible for Excel to take market share from Lotus. By the time Lotus responded with usable Windows products, Microsoft had started compiling their Office suite. To this day, Microsoft continues to dominate the industry.

OpenOffice

OpenOffice.org Calc is an open-source alternative to Microsoft Excel. Calc can open Excel's files. Calc's files can be converted to Excel.

Other products

A number of companies have attempted to break into the spreadsheet market with programs based on very different paradigms. Lotus introduced what is likely the most successful example, Lotus Improv, which saw some commercial success, notably in the financial world where its powerful data mining capabilities remain well respected to this day. Spreadsheet 2000 attempted to dramatically simplify formula construction, but was generally not successful. Stories attempted to make it easier to deal with 3-D blocks of data (as opposed to the 2-D nature of most spreadsheets), but appears to have seen little or no use.

A list of old spreadsheet software
- Boeing Calc 3D
- Improv
- Javlin
- Lotus Jazz for Macintosh
- Lucid 3D
- MultiPlan
- PowerStep for NeXT Step
- Quattro Pro
- Silk
- SuperCalc
- Surpass
- Symphony
- TWIN
- VP Planner
- Wingz for Macintosh

[1] [2] [3] [4]

Concepts

Cells

A "cell" can be thought of as a box or "pigeon hole" for holding data. A single cell is usually referenced by its column and row (A2 would represent the cell below containing the value 10). Its physical size can usually be tailored for its content by dragging its height or width at box intersections (or for entire columns or rows by dragging the column or rows headers).

My Spreadsheet

	A	B	C	D
01	value1	value2	added	Multiplied
02	10	20	30	200

An array of cells is called a "sheet" or "worksheet". It is analogous to an array of variables in a conventional computer program (although certain unchanging values, once entered, could be considered, by the same analogy, constants). In most implementations, many worksheets may be located within a single spreadsheet. A worksheet is simply a subset of the spreadsheet divided for the sake of clarity. Functionally, the spreadsheet operates as a whole and all cells operate as global variables within the spreadsheet.

A cell may contain a value or a formula, by convention usually beginning with = sign, or it may simply be left empty.

Values

A value can be entered from the computer keyboard by directly typing into the cell itself. Alternatively, a value can be based on a formula (see below), which might perform a calculation, display the current date or time, or retrieve external data such as a stock quote or a database value.

The Spreadsheet Value Rule
Computer scientist Alan Kay used the term value rule to summarize a spreadsheet's operation: a cell's value relies solely on the formula the user has typed into the cell ^[3]. The formula may rely on the value of other cells, but those cells are likewise restricted to user-entered data or formulas. There are no 'side effects' to calculating a formula: the only output is to display the calculated result inside its occupying cell. There is no natural mechanism for permanently modifying the contents of a cell unless the user manually modifies the cell's contents. In the context of programming languages, this yields a limited form of first-order functional programming^[4].

Real time update

A standard of speadsheets since the mid 80s, this feature eliminates the need to manually tell the spreadsheet to recalculate values. Earlier speadsheets required manual calls to recalculate as calculation times hindered data entry speed.

Formula

When a cell contains a formula, it often contains references to other cells. Such a cell reference is a type of variable. Its value is the value of the referenced cell or some derivation of it. If that cell in turn references other cells, the value depends on the values of those.

By convention, the left hand side, of what is normally considered a formula, is omitted and assumed to be the cell itself.

In the above example the formula in the cell "C2" might be either:-

=A2+B2 or
=SUM(A2:B2) (A2 is start of a cell range and B2 its end).

A formula identifies the calculation needed to place the result in the cell it is contained within. A cell containing a formula therefore has two display components; the formula itself and the resulting value. The formula is normally only shown when the cell is selected by "clicking" the mouse over a particular cell; otherwise it contains the result of the calculation (in this case 30). (A common error in spreadsheet usage is when a cell, previously holding a formula, is accidentally directly over-keyed by a value from the keyboard. Most modern spreadsheets allow selective "locking" of cells to prevent this, though many users do not take advantage of this feature.)

The available options for valid formulae depends on the particular spreadsheet implementation but, in general, most arithmetic operations and quite complex nested conditional operations can be performed by most of today's commercial spreadsheets. Modern implementations also offer functions to access remote data and applications.

A formula may contain a condition (or nested conditions) - with or without an actual calculation - and is sometimes used purely to identify and highlight errors. In the example below, it is assumed the sum of a column of percentages (A1 through A6) is tested for validity and an explicit message put into the adjacent right hand cell with a simple pointer graphic to the total to the left.

=IF(sum(A1:A6) > 100%, "<==More than 100%",sum(A1:A6))    show error in Right hand column if total % more than 100!

A spreadsheet does not, in fact, have to contain any formulae at all, in which case it could be considered merely a collection of data arranged in rows and columns (a database) like a calendar, timetable or simple list. Because of its ease of use, formatting and hyperlink capabilities, many spreadsheets are used solely for this purpose!

Locked cell

Once entered, selected cells (or the entire spreadsheet) can optionally be "locked" to prevent accidental overwriting. Typically this would apply to cells containing formulae but might be applicable to cells containing "constants" such as a kilogram/pounds conversion factor (2.20462262 to eight decimal places).

Data type

In addition, a cell or group of cells, can optionally be defined to specify the data type of the data it holds, or expects to hold, when a value is entered. This may determine the format in which a value is displayed, and (theoretically at least) the allowed operations upon it. In practice however, most commercial spreadsheets allow invalid operations, resulting in illogical operations without an appropriate warning.

The default is usually set by its initial content if not specifically previously set, so that for example "31/12/2007" or "31 Jan 2007" would default to the data type of "date". Similarly adding a % sign after a numeric value would tag the cell as a percentage data type.

A common data type is "numeric" or "currency" and the cell can, in these cases, be further identified with, for example, the number of decimal places (to display) and with a currency attribute if applicable such as $ or £. The cell contents are not changed by these attributes, only the displayed value.

Named cells

In most implementations, a cell can be "named" so that even if the cell is "cut and pasted" to a new location within the spreadsheet, its reference always remains intact. Names must be unique within the spreadsheet and, once defined, can then be used instead of a "normal" cell reference.

Format

Each cell (like its counterpart the "word" in a word processor) can be separately defined in terms of its displayed format. Any cell or range of cells can be highlighted in several different ways such as use of bold text, colour, font, text size and so on.

These attributes typically do not alter the data content in any way and some formatting may be lost or altered when copying spreadsheet data between different implementations or software versions. In some implementations, the format may be conditional upon the data within the cell - for example, a value may be displayed red if it is negative.

Sheets

In the earliest spreadsheets, cells were a simple two-dimensional grid. Over time, the model has been expanded to include a third dimension, and in some cases a series of named grids, called sheets. The most advanced examples allow inversion and rotation operations which can slice and project the data set in various ways.

Cell reference

A cell reference may be to a cell in a different sheet within the same spreadsheet, or (depending on the implementation) to a cell in another spreadsheet entirely or a value from a remote application.

A typical cell reference in "A1" style consists of one or two case-insensitive letters to identify the column (if there are up to 256 columns: A-Z and AA-IV) followed by a row number (e.g. in the range 1-65536). Either part can be relative (it changes when the formula it is in is moved or copied), or absolute (indicated with $ in front of the part concerned of the cell reference). The older "R1C1" reference style consists of the letter R, the row number, the letter C, and the column number; relative row or column numbers are indicated by enclosing the number in square brackets. Most current spreadsheets use the A1 style, some providing the R1C1 style as a compatibility option.

A cell on the same "sheet" is usually addressed as:-

=A1

A cell on a different sheet of the same spreadsheet is usually addressed as:-

=SHEET2!A1             (that is; the first cell in sheet 2 of same spreadsheet).

Some spreadsheet implementations allow a cell references to another spreadsheet (not the current open and active file) on the same computer or a local network. It may also refer to a cell in another open and active spreadsheet on the same computer or network that is defined as shareable. These references contain the complete filename, such as:-

='C:\Documents and Settings\Username\My spreadsheets\[main sheet]Sheet1!A1

In a spreadsheet, references to cells are automatically updated when new rows or columns are inserted or deleted. Care must be taken however when adding a row immediately before a set of column totals to ensure that the totals reflect the additional rows values - which often they do not!

Cell Ranges

A reference to a range of cells is typically of the form (A1:A6) which specifies all the cells in the range A1 through to A6. A formula such as "=Sum(A1:A6)" would add all the cells specified and put the result in the cell containing the formula itself.

Programming issues

Just as the early programming languages were designed to generate spreadsheet printouts, programming techniques themselves have evolved to process tables (also known as spreadsheets or matrices) of data more efficiently in the computer itself.

Spreadsheets have evolved into powerful programming languages; specifically, they are functional, visual, and multiparadigm languages.

Many people find it easier to perform calculations in spreadsheets than by writing the equivalent sequential program. This is due to two traits of spreadsheets.

They use spatial relationships to define program relationships. Like all animals, humans have highly developed intuitions about spaces, and of dependencies between items. Sequential programming usually requires typing line after line of text, which must be read slowly and carefully to be understood and changed.
They are forgiving, allowing partial results and functions to work. One or more parts of a program can work correctly, even if other parts are unfinished or broken. This makes writing and debugging programs much easier, and faster. Sequential programming usually needs every program line and character to be correct for a program to run. One error usually stops the whole program and prevents any result.

A spreadsheet program is designed to perform general computation tasks using spatial relationships rather than time as the primary organizing principle. Many programs designed to perform general computation use timing, the ordering of computational steps, as their primary way to organize a program. A well defined entry point is used to determine the first instructions, and all other instructions must be reachable from that point.

In a spreadsheet, however, a set of cells is defined with a spatial relation to one another.

It is often convenient to think of a spreadsheet as a mathematical graph, where the nodes are spreadsheet cells, and the edges are references to other cells specified in formulas. This is often called the dependency graph of the spreadsheet. References between cells can take advantage of spatial concepts such as relative position and absolute position, as well as named locations, to make the spreadsheet formulas easier to understand and manage.

Spreadsheets usually attempt to automatically update cells when the cells on which they depend have been changed. The earliest spreadsheets used simple tactics like evaluating cells in a particular order, but modern spreadsheets compute a minimal recomputation order from the dependency graph. Later spreadsheets also include a limited ability to propagate values in reverse, altering source values so that a particular answer is reached in a certain cell. Since spreadsheet cells formulas are not generally invertible, though, this technique is of somewhat limited value.

Many of the concepts common to sequential programming models have analogues in the spreadsheet world. For example, the sequential model of the indexed loop is usually represented as a table of cells, with similar formulas (normally differing only in which cells they reference).

Shortcomings

While extremely popular, spreadsheets are not without their downsides. Some of the problems associated with spreadsheets include^[5]^[6]:

While spreadsheets are effective at certain tasks, they are sometimes used for tasks that they are not suited to.^[7]^[8]^[9]
Lack of auditing and revision control. This makes it difficult to determine who changed what and when. This can cause problems with regulatory compliance, among other things.
Lack of security. Generally, if one has permission to open a spreadsheet, one has permission to modify any part of it. This, combined with the lack of auditing above, can make it easy for someone to commit fraud.
Lack of concurrency. Unlike databases, spreadsheets typically allow only one user to be making changes at any given time.
Because they are loosely structured, it is easy for someone to introduce an error, either accidentally or intentionally, by entering information in the wrong place or expressing dependencies among cells (such as in a formula) incorrectly.^[10]^[11]
The results of a formula (example "=A1*B1") applies only to a single cell (that is, the cell the formula is actually located in - in this case perhaps C1), even though it can "extract" data from many other cells, and even real time dates and actual times. This means that to cause a similar calculation on an array of cells, an almost identical formula (but residing in its own "output" cell) must be repeated for each row of the "input" array.This differs from a "formula" in a conventional computer program which would typically have one calculation which would then apply to all of the input in turn. With current spreadsheets, this forced repetition of near identical formulae can have detrimental consequences from a quality assurance standpoint and is often the cause of many spreadsheet errors. This last problem could be solved conceptually, simply by permitting the specification of a new category of "spatially independent" formula, allowing the "left hand" (target) of the formula to be entered combined with use of "indexed cell addressing" of the generic form:-

WHILE COUNT(A1:A20) > 0), C(i) = A(i)*B(i)     where i=incremented row number (1-20)

This theoretical category of formula could reside anywhere within the spreadsheet since its target cell(s) are specified independently of their location in the spreadsheet. (However, for clarity, the "cloned" formula could optionally be shown in each target cell, any change to one affecting all its clones automatically, thereby reducing errors).

or, to conform more to current "spreadsheet like" syntax perhaps:-

=IF(COUNT(A1:A20) > 0, A(i)*B(i),"")   where 2nd parameter represents the formula

to be applied to each occurrence - but entered only in the first cell, the rest of them displaying the cloned formula.

With the recent advent of remote data update of cells, the need to specify conditional formula of this type will assume a new urgency since the precise contents and extents of external spreadsheets may not be fully discernable before execution.

While there are built-in and third-party tools for desktop spreadsheet applications that address some of these shortcomings, awareness of these is generally low, and usage lower still. However, many of these earlier shortcomings can be handled by online spreadsheets such as EditGrid and Google Docs.

Web based spreadsheets

The advent of advanced web technologies, such as Ajax and XUL, circa 2005 has propelled the emergence of a new generation of online spreadsheets. Equipped with a rich Internet application user experience, many of the web based online spreadsheets boast the same features seen in desktop spreadsheet applications. Some already surpass them, offering real time updates from remote sources such as stock prices and currency exchange rates.

References

^ One example cited recently in Wikipedia is the CICS-based Works Records System built by Imperial Chemical Industries in the early 1970s. Other innovative approaches to end-user computing were being pursued at Xerox PARC, MIT, Citibank, National CSS, and IBM.
^ portal.acm.org – APLDOT
^ Kay, Alan (September 1984). "Computer Software". Scientific American 251 (3): 52-59. – Value Rule
^ Burnett, Margaret; Atwood, J., Walpole Djang, R., Reichwein, J., Gottfried, H., and Yang, S. (March 2001). "Forms/3: A first-order visual language to explore the boundaries of the spreadsheet paradigm". Journal of Functional Programming 11 (2): 155-206. – spreadsheets as functional programming
^ Philip Howard (2005-04-22). Managing spreadsheets. IT-Directors.com. Retrieved on 2006-06-29.
^ Raymond R. Panko (2005-01). What We Know About Spreadsheet Errors. Retrieved on 2006-09-22.
^ Is Excel Budgeting a Mistake?]
Excel's critics say that Excel is fundamentally unsuited for budgeting, forecasting, and other activities that involve collaboration or consolidation. Are they correct?
^ http://www.cs.uiowa.edu/~jcryer/JSMTalk2001.pdf Problems With Using Microsoft Excel for Statistics
^ Spreadsheet Addiction
^ Excel spreadsheets in School budgeting - a cautionary tale (2001)
^ Public reports of spreadsheet errors collated by the European Spreadsheet Risks Interest Group (EuSpRIG).

External links

General information

A Brief History of Spreadsheets by D.J. Power
A Spreadsheet Programming article on DevX
comp.apps.spreadsheets FAQ by Russell Schulz
Develop Training Simulations with Excel
Extending the Concept of Spreadsheet by Jocelyn Paine
Linux Spreadsheets by Christopher Browne; much general information on spreadsheets, and some on related Linux issues
Spreadsheets category on the Open Directory Project
Spreadsheet - Its First Computerization (1961-1964) by Richard Mattessich
"Spreadsheet Wars" - A classic video showing spreadsheet vendors going head-to-head in the late 80's .
CICS history and introduction of IBM 3270 by Bob Yelavich
Autoplan & Autotab article by Creative Karma

Research organisations

European Spreadsheet Risks Interest Group (EuSpRIG)

This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)