NOTE-XLF-19980721
[This local archive copy mirrored from the canonical site: http://www.docuverse.com/xlf/NOTE-XLF-19980721-all.html; links may not have complete integrity, so use the canonical document at this URL if possible.]
NOTE-XLF-19980721
XLF: The Extensible Log Format
Version 1.0
NOTE-XLF-19980721
XLF Working Group
21 July, 1998
Note: This draft is for review by XLF mailing list members.
This version
Latest version
Editors
Lisa Rein, finetuning.com
Gavin Nicol, Inso EPS
Principal Contributors
Don Park, Docuverse
Status
This document is a product of the member of the XLF mailing list. We will update this draft specification on a regular basis.
Please send detailed comments on this document to xlf-owner@cybercom.net . We cannot guarantee a personal response but we will try when it is appropriate.
Abstract
XLF (Extensible Log Format) is a set of DTD fragments, recommendations and API's intended to provide a complete, open, interoperable, and extensible logging infrastructure. (ED: Need more language here)
Languages used
English, OMG IDL, Java, XML
Table of Contents
1. Introduction
Logging is part of almost every modern operating system in one form or another. Logs are used for tracking events that occur at runtime, and are often later used in analysis of system performance, security breaches, and access patterns, to name but a few.
This specification defines the Extensible Log Format, an XML based format for log data that is intended to make log information smarter, and easier to work with than ever before. A brief analysis of the need for XLF follows.
1.1. Dumb data, smart scripts
Currently, administrators use any number of methods to derive information from their server logs: usually in the form of custom-built scripts. In doing so, the scripts take "dumb" data, and extract "intelligent" results from it (this is often talked about as "adding intelligence to data").
However, logs also have the potential of holding "intelligent" data, such that far more and better information can be logged that is currently possible. Intelligent log data combined with intelligent processing will lead to far more powerful analysis and reporting capabilities than ever before.
For example: a wealth of information can be (and often is!)
obtained from HTTP server logs. Common usages today include
counting the number of hits, deriving access patterns, and
finding what percentage of downloads have broken before they were
completed. Often, deep analysis of HTTP log data requires that
sophisticated heurisitics be applied to the data. For example,
deriving access patterns necessitates analysis of the access
times and hostid
of every log entry. With more
intelligent data, such heuristics would be unnecessary.
Another example might be electronic commerce: a transaction
page that is written in XML (say the order page from amazon.com
)
might have its <total.price>
, <customer.name>
and <customer.address>
, and other pieces of
information logged, especially if that information can then
entered into a database automatically. XLF could play a key role
in defining the model for distributed data-driven processes on
the Web.
1.2. Server log interchange
One major problem alluded to earlier is that most tools for processing log data are custom built. This is partly due to the differing requirements for analysis, but is also certainly partly due to the myriad log formats found today. A single extensible log format, with a single syntax (XML), will at least result in a common infrastructure upon which log analysis tools can be built.
In addition, the log format could help coordinate distributed
systems: the types of messages sent to a log are similar to those
used to coordinate processes. Administration is another potential
area where the format could play a role: SNMP and other such
protocols involve exchange of messages similar to those sent to
logs.
1.3. Objectives
Work on XLF began based upon the conviction that the needs outlined in the introduction are real, and need to be addressed. To that end, the objectives of XML are:
- To identify commonly occuring peices of information in log files, and to model them as elements and attribute types. These elements and attributes will form part the XMLF Core specification, which will be the basis for log file formats based on XLF.
- To define the models such that they can easily be included in other formats (open containment), using only XML (XLF will be an XML application. No extensions to XML will be required.
- To model a sufficient set of data that common internet server logs (HTTP, FTP, proxies, etc.) can be modelled in XLF. Other logs, such as Yamaguchi logs should also be considered.
- To provide guidelines on how to extend the specification to support specific logging applications (i.e. HTTP). Some recommendations for specific protocols might also be made.
- To, as far as possible, provide backward comptibility. For example, it should be possible to write log producer modules (like a plugin) that convert legacy log formats into XLF and feed it into XLF log service framework. Under a language like Java, it should even be possible to migrate XLF plugins (producers, filters, and consumers) from the server to the client on demand.
- To specify API's for adding data to a log and accessing data within a log.
- To provide a "proof of concept" implementation: Docuverse will be building an implementation in Java and make it available freely (and for free ;-) like Free-DOM, and several of the initiative members have already expressed interests in designing Logging Framework based on XLF.
- To encourage server companies as well as log analyzer companies to support the specification so that we will have a truely universal log format.
2. DTD Fragments
This section defines a number of DTD fragments that can be used to build an application-specific log file format. It also defines recommended log file formats for more common log files, such as those for HTTP .
The fragments defined herein are all assumed to be defined within the XLF namespace
2.1. Notes on fragment design
As with most things in software design, XLF is faced with a number of (at times) contradictory requirements. This section discusses some of the more important areas that affected the design of the XLF DTD fragments.
2.1.1. Verbosity
It is true that XML is more verbose than "normal" log formats in the first glance. However, in a typical log format, each log entry must standalone and thus specify all fields of an entry. Logs in XML can use inheritance so that repetitive information need not be duplicated. In addition, simple hyperlinks (ID/IDREF pairs) can be used to allow grouping of log information.
Log size is also not a major concern if we assume XLF to be a data interchange format rather than data storage format. For example, XLF may be used by log producers to send log information to log servers which host log filters and consumers, some of which could be smart enough to strip out unnecessary information according to administrator preference and compress them once a day or once a hour.
In addition, even as a raw storage format, compressed XML data generally takes up less than 20% more space than data formatted in a less verbose form. This is due to the great amount of redundancy in an XML file that can be compressed efficiently. This redundancy has a positive effect of making an XLF log more robust in the face of data corruption.
One possible result of XLF will be the creation of a Log
Server market. Server products inevitably generate logs but
most companies can not afford to dedicate significant resources
to log management and analysis, even though most administrators
rely heavily on logs to keep tabs on systems. The Log Server
market could be created with a standard Log Service Framework
which allows plug-and-play log producers, filters, and consumers.
Server companies benefit because they will be able to license
quality log servers rather than having to build them. Network
administrators will benefit because they will not have to write
custom scripts anymore.
2.1.2. Log Context
One problem with using XML as a log format is that log events are often asynchronous, while XML documents are not. For example, is is often the case that the start and end of a system action can be interspersed with start and end events from other parts of the system. If the start and end correspond to XML start and end tags respectively, the generated log will not be well-formed XML, or if it is, it will be semantically incorrect
<xlf:event id="ID-1447410289"> ... <xlf:event id="ID-1980373498"> ... </xlf:event> <!-- id="ID-1447410289" --> </xlf:event> <!-- id="ID-1980373498" -->
There are a number of possible solutions to this problem, one of which is to model Log Events. In this case, the all events are modelled as discrete elements that carry a session identifier along with them. Using the session identifier allows one to later process the log file to create more structured logs. The example above would look like the following using this method.
<xlf:start id="ID-1447410289"/> ... <xlf:start id="ID-1980373498"/> ... <xlf:end id="ID-1447410289"/> <xlf:end id="ID-1980373498"/>
In general, this is not much better than existing log formats, except for the unification of syntax, so the XLF specification defines structured fragments. There are two primary assumptions behind this decision:
- That something akin to a log server will exist.
- That XLF is primarily to be used for interchange, not for direct storage. In cases where asynchronous storage is required, data will most likely be stored in a binary format that can then be converted to XLF.
2.1.3. Element Renaming
In order to provide a certain degree of flexibility in the
fragment reuse, a means of aliasing elements is provided (much in
the spirit of architectural forms). XLF uses the xlf:fragment
attribute on an element to decide what type of fragment it is,
not the element name. For example:
<xlf:resource filename="spec.txt"/>
is equivalent to
<file xlf:fragment="xlf:resource" filename="spec.txt"/>
This is accomplished by having each defined DTD fragment
define a #FIXED
attribute of type xlf:fragment
,
so that if the fragments are used verbatim, they have the
attribute declared. This same technique can be used to obviate
the need to specify the attribute value in an application
specific log format too:
<!DOCTYPE log [ <!ATTLIST file xlf:fragment CDATA "xlf:resource" #FIXED> ]> <log> <file filename="spec.txt"/> </log>
This example is exactly equivalent to the earlier example
using <file>
.
2.2. Defined Fragments
2.2.1. Declaration of time base
<!ELEMENT xlf:timebase EMPTY> <!ATTLIST xlf:timebase xlf:fragment CDATA "xlf:timebase" id CDATA #REQUIRED zone CDATA #REQUIRED year NUTOKEN #REQUIRED month NUTOKEN #REQUIRED day NUTOKEN #REQUIRED hour NUTOKEN #REQUIRED minute NUTOKEN #REQUIRED second NUTOKEN #REQUIRED tick NUTOKEN #REQUIRED tps NUTOKEN #REQUIRED>
The xlf:timebase
is used to declare the base time
for the system, and should occur as one of the first parts of a
log file. In a log format that includes timebase, other elements
can simply use ticks as the unit of measurement:
<download file="spec.txt" tick="21353221"/>
The attributes on xlf:timebase
have the following
meanings:
Name Description xlf:fragment This is a #FIXED
attribute that provides the basis for element renaming for this fragment.id TBD zone TBD year TBD month TBD day TBD hour TBD minute TBD second TBD tick TBD tps TBD
3. Recommendation for HTTP Log Files
Appendix A: Glossary
-
log
- (ED: TBD)