XFLAIM Concepts

Welcome to XFLAIM. XFLAIM is a fast, flexible and reliable cross-platform database engine derived from FLAIM. The original FLAIM database engine was conceived with a view toward the flexibility and adaptability that is offered by an XML-like data model. As such, it was a logical step to move to a full XML-based engine. Various products have used FLAIM for nearly 15 years. For instance, Novell’s scalable, reliable directory and collaboration products, eDirectory and GroupWise, both use FLAIM as the data store, with user licenses totaling well into the hundreds of millions (as of December 2005). XFLAIM may well be considered the next generation of the FLAIM database engine. Most of the concepts that existed in FLAIM also exist in XFLAIM, with variations to support the XML/DOM model.

In the Summary of XFLAIM Features and the XFLAIM Concepts sections of this document, we will discuss a number of concepts that are specific to XFLAIM. Those who are familiar with the FLAIM database engine will notice close similarities between the two database engines. Also included is a more general discussion of what an XML database is. We will discuss XFLAIM and the DOM model which is the API chosen for applications to store and retrieve the XML documents.

XFLAIM is supported on the Windows NT4, Windows 2000, Windows XP, Red Hat Linux, AIX, Solaris, Mac OS X, and HP/UX platforms.

Summary of XFLAIM Features

The following is a brief summary of the features available in XFLAIM:

DOM Nodes and Documents

Documents are stored as DOM nodes.
All element, attribute, and data nodes have a name id tag.
Each DOM node can contain up to 4 gigabytes of data.
Data types include text (Unicode and UTF-8), numeric, and binary.

Collections

Documents are stored in collections
There may be multiple collections per database.
Collections allow data to be logically partitioned.

Indexing

Compound indexes, key component may be any XFLAIM data type.
Optional and/or required nodes in compound indexes (key not generated if required nodes are missing)
Presence indexes (indexes the existence of a node rather than its content).
Case insensitive and case sensitive collation.
White space compression and other special key-generation rules.
Cross-document type indexes.
Substring indexing.
Each-word indexing.
Approximate indexing (Metaphone).
Support for many international languages and collating sequences, including Arabic, Hebrew, and Asian (Japanese, Korean, Chinese).
Each index in a database can have it's own international language.
Keys up to 1024 bytes long, key truncation supported.
Multiple indexes per collection.
APIs for reading indexes directly.
Indexes are dynamically updated when nodes are added, modified, or deleted.
Indexes can be built in the background.
Indexes can be taken off-line (suspend) and later resumed.

Dynamic Dictionary

Add, modify, and drop index, collection, element, attribute, prefix, and encryption definitions.

Query Capabilities

XPATH is used as the query language.
Rich set of query expression operators:

Comparison operators (equal, not equal, less than, less than or equal, greater than, greater than or equal). Text comparison operators include wild card matching, allowing for match begin, match end, and substring (contains) searching.
Arithmetic operators (unary minus, multiply, divide, mod, plus, minus).
Logical operators (not, and, or).
Parentheses (used to alter normal operator precedence).

Advanced query optimization (XFLAIM will automatically select indexes, etc. based on least cost estimation).
Index specification. The application may explicitly specify an index to use.
Powerful navigational calls for retrieving and browsing through query results (first, last, next, previous, and current node/document).

Read and Update Operations

Ability to retrieve nodes directly from collections by 64 bit node id.
Index keys can be read directly.
Advanced querying capabilities are supported via XPATH.
Add, modify, and delete operations are supported.

Transactions

Transaction begin, commit, abort. Use of rollback log for transaction abort and for recovery after a crash.
Transaction types:

Update. Update, read, and query operations allowed.
Read. Only read and query operations allowed. Read transactions provide a read consistent snapshot of the database as of the point in time the transaction is started.

Automatic rollback of failed transactions (due to program aborts or server crashes).
Periodic checkpoints to minimize recovery time after a system crash.
No limit on size of update transactions.
ACID principles are fully supported (Atomicity, Consistency, Isolation, Durability).
Group Commit allows multiple update transactions committed to disk at once to enhance update performance.

Roll-forward Logging

Roll-forward logging is used to minimize data that is written to commit a transaction.
The roll-forward log is also used in automatic recovery after a crash.
Multiple roll-forward log files may be used to support the “continuous backup” feature. Files are numbered sequentially and are also identified with serial numbers to guarantee proper sequencing. Up to 4 billion log files supported, resulting in practically unlimited capacity.
Option to use only a single roll-forward log file for applications that do not need continuous backup.
Roll-forward log files may be stored on a separate disk volume.
Minimal transaction logging. Only deltas logged for modifies and only node identifiers are logged for deletes.
Optionally, aborted transactions can be logged for debugging purposes.

Database Reliability and Recovery

Database recovery after a system crash is automatic. The rollback log is used to return the database to last checkpointed state. The roll-forward log is used to “redo” transactions that were committed since the last checkpoint.
Recovery is idempotent. That is, if crash occurs during recovery, it will be resumed when the database is subsequently opened.
Reliability has been tested using an automated “pull-the-plug” test, which randomly cycles the power on the server during high volume updates. Thousands of “pull-the-plug” iterations have been performed on Windows, Unix, and Linux.
Disk-full conditions and other disk errors are handled gracefully. The database will “stall” new update transactions until the disk error is resolved, thus avoiding a shut down of the database engine.
Customers can take hot backups and put roll-forward logs on a separate volume to protection against media failures. With this protection, two simultaneous disk failures would be required to lose any data.

Concurrency

One writer, multiple readers.
Readers don't block writers because they NEVER lock items in the database.
Writers don't block readers.
Readers get a virtual snapshot of the database. The rollback log is used to provide block multi-versioning.
Uncommitted data is not visible to other transactions.

Caching

A block cache is shared by all threads in a process. XFLAIM supports up to 4 GB of cache on 32 bit platforms and much more on 64 bit platforms.
Document node cache.
Cache poisoning prevention.
Memory fragmentation prevention via smart management of cache and node allocations.
Cache statistics can be queried, and include hits, faults, hit looks, and fault looks.

Optimized Disk Reading / Writing

Direct I/O allows file system cache to be bypassed.
Asynchronous writes.
Cache blocks are written in ascending order to optimize disk head movements. Adjacent blocks are coalesced into larger write buffers for improved performance.

Database Validation and Repair

Routines for checking the physical and logical structure of database are provided. Links between blocks, the B-Tree structure, block checksums, node/document structure, index keys/reference sets and data in nodes are verified. Damaged indexes can be fixed on-line if problems are encountered during the check.
Routines for repairing a database allow data recovery from severely damaged databases.
Progress and status callbacks are possible with all check and repair routines. This allows the application to display progress and cancel the operation if desired. Corruptions are also reported via the callbacks so that an application can create a detailed log of corruptions found if desired.

Backup / Restore

Hot backup. Backups can be performed without taking the database offline and without stopping updates.
Continuous backup. Roll-forward logs can be managed in a way that allows them to serve as a “continuous” backup of the database. No committed transaction will be lost.
Incremental backups. This minimizes what must be backed up - only blocks changed since last backup.
Backup and restore use flexible streaming interfaces to allow the application to efficiently select and manage the backup media. For example, an application could even choose to send backup data across a network to be stored on a remote device. XFLAIM uses double buffering so that an output device can be kept busy while XFLAIM is fetching the next set of blocks to backup. This helps prevent streaming devices (such as tape drives) from stalling.
All blocks in backup include a checksum to ensure that data is reliable when restored.
Simple block compression used to minimize size of backup.
Use of serial numbers in roll-forward log files and backups to ensure “identifiability” when restoring. Database also has a serial number.
Restore from full backup, multiple incremental backups, and/or roll-forward logs - all in one call.
Status callbacks are supported during backup and restore operations, allowing the application to report progress and/or abort the backup or restore operation.
Partial restore of a database is supported. An application has the option of stopping a restore operation after either: 1) a full backup or incremental has been restored, or 2) after a particular transaction in the roll-forward log has been re-played.

Database Monitoring / Statistics Collection

Ability to collect detailed statistics on disk I/O activity and transaction activity.
Ability to monitor cache utilization, including bytes used, number of blocks and nodes cached, cache hits, faults, etc.
Ability to collect detailed information about queries. This includes the ability to see which indexes are used, how many keys are fetched, how many nodes are fetched, how many nodes failed the criteria, etc. This allows analyzing of query efficiency and troubleshooting of query performance problems.

Checksums

Block checksums are set on all blocks in the database when writing to disk and are verified whenever blocks are read from disk.
The checksums are used to automatically detect database inconsistencies.

Database Size

Up to 8 terabytes of data per database.
Up to 2^64 - 1 (64 bits) of document IDs per collection.
Database grows as needed. There is no need to pre-allocate disk space.
Support is provided for re-claiming unused database blocks and log areas and returning to them to the host file system. Space may be reclaimed without taking database off-line.
The database block size can be set on database creation to 4, 8, 16 or 32 KB.
Sophisticated block splitting and block combining to maximize block utilization.
Roughly 80% utilization in index blocks.
Roughly 80-95% utilization in data blocks.

Cross Platform

Databases files are binary portable across ALL supported platforms. There is no need for explicit conversions when moving a database from one platform to another. The platform where the database is created determines whether a little-endian or big-endian storage format will be used for database metadata. If a database is moved to a platform with a different endian format, conversions happen automatically as needed. Thus, it is possible for a database that was originally created on a little-endian platform and subsequently moved to a big-endian platform to gradually migrate to over time.
Supported platforms include Windows (NT, 2000, XP-64 bit), UNIX (Solaris, AIX), Tru64, Linux, and Mac OS X.
Source code is developed in C++ programming language (one source for all platforms), allowing FLAIM to easily build libraries for other platforms. The 64-bit Windows port of XFLAIM was completed in less than a week.

Utilities

Database checking utility (checkdb).
Database rebuild utility (rebuild).
Database browser and editor utility (xshell, DOMEdit). Provides support for retrieving, adding, modifying, and deleting documents and individual nodes.
Low-level physical structure viewer/editor (view).
Automated test utility (dart).
All utilities build and work on all platforms and have the same look and feel.

Checksumming

Block checksum set on all blocks in the database when writing to disk.
Block checksums are verified when reading blocks from disk.
Checksum used to automatically detect inconsistencies.

Database Size

Database may grow up to 8 terabytes or 4 terabytes (depends on platform). Up to 4096 files may be created. Each file is limited to approximately 4GB.
Number of documents up to 2⁶⁴ - 1 (64 bits) per collection.
Database grows as needed. No need to preallocate disk space.
Routine for re-claiming unused database blocks and log areas and returning to the host file system. Space may be reclaimed without taking database off-line.
Database block size can be set on database creation to 4K, 8K, 16K or 32 KB.
Sophisticated block splitting and block combining to maximize block utilization.
Roughly 80% utilization in index blocks.
Roughly 80-95% utilization in data blocks.

Testing

Automated testing randomly varies parameters and calls to aggressively test millions of possible combinations of usage.
Simulations involving a large variety of random combinations of operations and data.
Multiple continuous runs of days and weeks on multiple machines for high volume concurrency testing.
Automated power failure testing to test database reliability and recovery.

XFLAIM Concepts

Nodes

The most basic unit of information in an XFLAIM database is a node. A node is comprised of a node type (element, attribute, data, annotation, comment, etc.), an optional name identifier (for elements and attributes), a data type (this is an extension to standard XML, which only supports text data), and an optional value. A node’s name identifier typically conveys the creator’s intended meaning (or semantic) for the node. It provides the context for interpreting and clarifying the content. For example, city is the context for Denver (the content). Every node in an XFLAIM database collection is identified by a 64-bit unsigned integer called the NodeId. The NodeId is guaranteed to be unique within a collection. Zero is not a valid NodeId.

Node Type

In XML there are many node types (element, attribute, comment, etc.). XFLAIM currently supports document, element, attribute, data, comment, cdata section, and annotation nodes. The data node type is, in reality, an expansion of the XML text type, allowing types other than text (numbers and binary) to be stored.

Name Identifier

In XFLAIM, a node’s name and namespace are mapped to a number to allow compact node storage. XFLAIM provides interfaces that allow an application to retrieve the name and namespace strings associated with the numeric ID.

Node Data Type

A node’s data type (e.g., text, number, binary, etc.) defines intrinsic characteristics that are applicable to the node’s value. As such, a node’s data type tells XFLAIM how to store, index, validate, and manipulate the node’s value.

Node Value

A node’s value is always variable length, regardless of the data type. When storing a node’s value, only the actual space required is allocated. Although XFLAIM stores information per node not normally found in traditional databases (including node ID, etc.), great care has been taken to minimize the per-node overhead. The savings in disk space translates directly into performance benefits, because it takes fewer disk accesses to retrieve nodes.

Node Identifier

Within a collection, nodes are uniquely identified by unsigned 64-bit IDs (NodeId). All numbers from 1 through 18,446,744,073,709,551,615 (0xFFFFFFFFFFFFFFFF) are valid candidate NodeIds. Note that the zero is not valid. Once assigned, a node will always be associated with the same NodeId. When creating a node, XFLAIM assigns the NodeId. XFLAIM keeps track of the highest NodeId ever assigned within a collection and assigns the next NodeId following the highest NodeId. Even if ALL of the nodes in a collection are deleted, previously consumed NodeIds are NOT reused by XFLAIM.

Documents

Application data stored in XFLAIM is organized into XML documents. A document is generally used to represent an object or a concept in the real world (a product, a customer, an employee, a business division, etc.). As such, a document consists of a collection of nodes that represent information about the object.

In XFLAIM, as in basic XML, there is no requirement that documents conform to a pre-defined template. XFLAIM supports the creation of arbitrarily structured documents. The creator of an arbitrarily structured document is not only allowed to determine the contents of attributes within the document but is allowed to determine the structure of the document as well.

Hierarchical Structure

In an XFLAIM document, as with XML, a node can be placed subordinate to another node. The nodes are then said to have a parent/child relationship. A node may have at most one parent node. Nodes that have the same parent are said to be siblings.

Collections

Most databases provide some means for defining collections of records. For example, in relational databases, a collection of records is represented as a table. In XFLAIM, documents are organized into collections. Each collection may hold a heterogeneous group of documents, meaning that a collection may store many different types of documents. Collections may be added to a database at any time. A collection may also be deleted at any time.

Document Identifier

Every document in a collection has a root node, with the NodeId of the root serving as both the root's ID and the document ID. It is interesting to note that unlike traditional XML, XFLAIM does not require a document to have an document node as its root node. A document in XFLAIM can have either a document node or an element node as its root.

Predefined Collections

When a new database is created, two default collections are added automatically. These have special purposes and cannot be removed.

Dictionary Collection

Within every database, XFLAIM maintains a dictionary collection that keeps track of all element, attribute, index, collection, prefix, and encryption definitions. The collection number for the local dictionary is defined in xflaim.h as XFLM_DICT_COLLECTION.

Default Data Collection

The default data collection is provided as a default place for storing documents. XFLAIM makes no internal use of this collection. The collection number of the default data collection is defined in xflaim.h as XFLM_DATA_COLLECTION.

Indexes

The task of finding documents or nodes can be accomplished by sequentially scanning a collection until the desired documents or nodes are located. However, in collections with large numbers of nodes, this may be extremely slow. Indexes are provided as a means for finding documents more efficiently.

In XFLAIM, an index is associated with one collection, but each collection may have many indexes. Indexes may be created or dropped at any time. An index is essentially a set of keys that are arranged in a way that significantly speeds up the task of finding any particular key within the index. Index keys are constructed by extracting the contents of one or more nodes from documents. Each key in an index references the nodes from which it was constructed. Note that a compound key references more than one node, one for each component of the key. Every key also has associated with it the particular document identifier of the document from which it was constructed.

Background Indexing

XFLAIM allows indexes to be added in the background. When this option is selected, a thread is created that scans documents within the target collection. If foreground update activity is occurring, only a small number of documents are selected for each iteration of the background thread. If foreground activity is minimal, the background thread is much more aggressive. Once all of the collection's documents have been indexed, the new index will come on-line automatically and will be available for use.

Suspending Indexes

XFLAIM also allows indexes to be taken off-line (suspended). This makes an index unavailable for use and causes XFLAIM to stop updating the index's keys when new documents and nodes are added to the database. Suspending indexes can be useful during a batch load of documents because update overhead is reduced.

Resuming Indexes

Of course, a suspended index can be resumed. When a resume of a suspended index is requested, XFLAIM starts a background indexing thread to bring the index up-to-date. Only documents that were added to the database after the index was suspended are scanned. All other documents will have been indexed. When the background thread completes its work, the index is brought back on-line automatically.

Dictionary

The overall design and logical structure of a database is often called the database schema. In XFLAIM, database schemas are specified by a set of definition documents, which are stored in a special collection called the dictionary collection. Thus, a dictionary is a repository of definitive and descriptive metadata (i.e., “data about data”) that provides information about the overall design and logical structure of the database. It provides the information needed by XFLAIM to properly store, retrieve, and index application data. Dictionary definition documents may be constructed using the same methods that are used to construct user data documents, but their specific structures, syntaxes, and semantics are predefined.

Dictionary definition documents can be dynamically added, modified, and deleted (with some restrictions) using the same APIs that are used to add, modify, and delete user data documents. Indexes, for example, may be dynamically added or deleted by simply adding or deleting the appropriate index definition documents.

Dictionary Definition Name Attribute

The dictionary defines XML elements, attributes, prefixes, indexes, collections, and encryption schemes. Every definition document within the dictionary has a numeric identifier and a corresponding name. Definition documents are assigned a numeric attribute referred to as it's dictNum when they are added to the collection (either by XFLAIM or by the application). A definition document’s dictNum becomes the definition document’s name attribute or nameId. The numeric identifier is a 32-bit integer. For collection and index definitions, valid dictNum values range from 1 to 65500 inclusive. For all other definition documents, the dictNum value ranges from 1 to 4294967295 (0xFFFFFFFF) inclusive.

A definition document’s name is always assigned by the application. It is part of the syntax of the specific type of definition being created (see Appendix A for a complete specification of dictionary definition syntaxes).

A dictionary definition's assigned numeric identifier is used in XFLAIM’s APIs to reference the defined item. For example, an element whose nameId is 12 is referenced using the numeric value 12; an index whose nameId is 15 is referenced using the numeric value 15; and so forth. Some applications may desire to use names instead of numbers. To support this method of access, XFLAIM provides interfaces that allow an application to list names of items in the database (element names, index names, etc.), as well as interfaces that map a name to the corresponding numeric identifier and vice versa.

All elements, attributes, prefixes, indexes, collections, and encryption schemes must be defined in the dictionary before they can be used. However, if desired, XFLAIM can automatically add the necessary definitions to the dictionary when importing XML files into a collection.

Definition Document Types

The types of definition documents that are available in XFLAIM are: 1) element definitions, 2) attribute definitions, 3) prefix definitions, 4) collection definitions, 5) index definitions, and encryption definitions.

Data Types

A data type defines intrinsic characteristics that are applicable to an element or attribute value. As such, the data type tells XFLAIM how to store, index, validate, and otherwise manipulate the value. XFLAIM instrinsically defines certain data types to enable it to perform essential operations on that type of data. Currently, there are three fundamental data types in XFLAIM.

Number

This data type encompasses 64-bit signed and unsigned integers.

Text

The text data type provides support for UTF-8, 16-bit Unicode, and platform-native text (ASCII). The size of a text data is limited to 4 gigabytes.

Binary

The binary data type is used when storing raw binary data. XFLAIM makes no attempt to interpret binary data. An application can choose to have XFLAIM index binary data, but sorting is done by performing a simple byte-for-byte comparison of the data. The size of the binary data is limited to 4 gigabytes.

Element Definitions

An element’s name and data type are specified in an element definition. The element’s data type tells XFLAIM how an element’s data is to be stored, used, converted, and collated (if indexed). XFLAIM also ensures that the data type of an element in a document matches the type specified in the element definition.

An element definition document may be added or modified at any time. When modifying an element definition, only the element’s name may be changed. It is illegal to change the element’s data type. An element definition may be deleted after XFLAIM has verified that there are no instances of the element in the database.

Attribute Definitions

Attribute’s name and data type are specified in an attribute definition. The attribute’s data type tells XFLAIM how an attribute’s data is to be stored, used, converted, and collated (if used in an index). XFLAIM also ensures that the data type of an attribute matches the type specified in the attribute definition.

An attribute definition document may be added or modified at any time. When modifying an attribute definition, only the attribute’s name may be changed. It is illegal to change the attribute’s data type. An attribute definition may be deleted after XFLAIM has verified that there are no instances of the attribute in the database.

Prefix Definitions

Prefix definitions define XML prefixes that may be referenced by elements and attributes. The prefix's name is specified in the prefix definition. A prefix definition document may be added or modified at any time. When modifying a prefix definition, only the prefix’s name may be changed. A prefix definition may be deleted after XFLAIM has verified that there are no references to the prefix in the database.

Collection Definitions

Collection definitions are used to create additional collections for application data. At present, the only information specified in a collection definition is the collection’s name.

A collection definition document may be added, modified, or deleted at any time. The only thing that can be changed in a collection definition is the collection’s name. A collection definition document may not be deleted if there are collection-specific indexes still defined on the collection. When a collection is deleted, all documents in the collection are automatically deleted.

Index Definitions

An index definition specifies the nodes to be indexed and various indexing options.

An index definition may be added, modified, or deleted at any time. Adding or modifying an index definition causes the index to be generated or re-generated in the foreground or background. If the index is generated in the foreground, the add or modify operation will not return until the index has been built. If the data set being indexed is large, the operation could take a significant amount of time. The disadvantage to allowing a large index to be built in the foreground is that all other update operations are held off until the index has been generated.

If the index is generated in the background, XFLAIM will create a background thread that will build the index by starting and committing a series of small update transactions. Each of these transactions will scan a portion of the documents being indexed and will generate the corresponding index keys. Once the thread has visited all documents within the scope of the index, the index is automatically brought on-line. Until the index comes on-line it is unavailable for use by the application.

Deleting an index definition from the dictionary causes the index to be removed and all blocks previously allocated to the index are put into a free list for re-use.

It is important to note that once an index is on-line, XFLAIM automatically keeps it up-to-date as documents are added, modified or deleted. When a document is added to a collection, XFLAIM scans the nodes in the document and adds the necessary keys and references to all appropriate indexes. When a document is modified, XFLAIM scans the old version of the document and the new version. After scanning the old and new documents, FLAIM adds or deletes keys and references in the appropriate indexes. When a document is deleted from a collection, XFLAIM scans the nodes in the document and deletes the necessary keys and references from the appropriate indexes.

Index Types

XFLAIM supports indexing of all element and attribute nodes. Several different types of indexes are supported.

Single Node Index

A single node index indexes all occurrences of either an attribute or element. This type of index is useful when documents need to be retrieved based on the value of a single node. The index can help when using operators such as <, <=, >, or >=.

Compound (Multi-Node) Index

A compound index is one in which the values of multiple nodes are effectively concatenated to create a single key in the index. The nodes are concatenated in the order they are specified in the index definition. In a compound index, each component node is said to be either required or optional. A required component is one that must be present for a key to be generated. An optional component is one that is not required to be present.

A very useful property of a compound index in a query is that the result set produced by the query will be streamed back to the application according to the order of the index keys. Although XFLAIM has the ability to perform a cost-based analysis to determine the best index(es) for use in a query, it is sometimes advantageous to use a less optimal index if it provides the desired sort order. For this reason, XFLAIM allows the application to override the index selection during query optimization. For example, an index created on last name followed by first name would be useful in a phone book or directory application.

Index Options

There are several options that may be specified when defining an index.

Node Paths

When specifying the nodes that are to be indexed, the user may opt to specify a node path rather than a simple node identifier. A node path is a list of nodes that defines a more specific context for a node being indexed. When a node path is specified, the node is indexed only when it is found in the specific context defined by the path. When determining if a document should be referenced from a particular index, XFLAIM checks the entire path. This allows an index definition to be specific about exactly when a particular node should be indexed, thus allowing it to be indexed only when it appears in certain contexts.

International Languages

XFLAIM provides support for 38 international text collations (see Appendix B for a complete list). This feature allows applications to support multiple languages within a single database simply by specifying the desired collation language on the index definition.

Each Word

This option indicates that key values should be generated from each of the individual words contained within the text value of the node, as opposed to using the full text value to generate a single key.

Substring

This option indicates that key values should be generated by using the text string to produce a set of sub-string values. The set of sub-strings is generated by removing the left-most character of the text value in an iterative process until the string is empty.

Metaphone

This option indicates that keys should be built by generating metaphone values for each of the words of a text value. This allows applications to efficiently perform sounds like queries.

Case-Insensitive Collation

For indexes that include text nodes, XFLAIM allows the collation to be performed with or without sensitivity to case.

Node Name Identifier Indexing

Keys in an index are normally constructed using the indexed node’s value. XFLAIM also allows a node’s numeric name identifier to be indexed instead of its value. This allows the creation of a “presence” index that is useful for optimizing queries that have criteria on the existence of a particular node.

Encryption Definitions

Encryption definitions define encryption schemes that may be used to encrypt data in the database. The encryption scheme specifies an encryption algorithm (AES or DES3) and a key size. For AES, the key size may be 128, 192, or 256 bits. For DES3, only one key size is allowed: 168. When an encryption definition document is first created, XFLAIM automatically creates an encryption key (using NICI) and wraps the key inside the database key. The database key is always an AES key of the largest possible size allowed by the NICI that is running on the platform where the database was created. Generally, that will be 256 bits. An application is not allowed to modify the node in the document that stores the encryption key. NOTE: It is possible to build the XFLAIM product without support for encryption.

Querying the Database

Any application that relies on a database system to store its data obviously also needs mechanisms for finding and retrieving that data. In brief, a few of XFLAIM’s query capabilities include:

Specification of complex selection criteria via XPATH.

A result set interface, which includes methods to move to the first, last, next, and previous nodes in the result set.

In most cases, the result set does not have to be fully generated before XFLAIM can start returning results to the application.

Cost-based, multiple-index optimization of the query is performed automatically.

Query

In XFLAIM, an application poses a query by creating and configuring a query object (see documentation on IF_Query). A query object collects the selection criteria for an XPATH query and interprets and optimizes this information so that the requested nodes can be retrieved efficiently.

Result Sets

The answer to a query is the set of nodes or documents that satisfy the selection criteria. This is often called a result set. From a conceptual point of view, a query’s result set exists the instant its selection criteria has been defined. All that remains from the application’s point of view is to start retrieving the individual nodes of the result set.

XFLAIM provides a set of functions for navigating through a result set and retrieving nodes from it. This includes the ability to position to and optionally retrieve the first, last, next, previous, and current nodes in the result set.

Database Files

An XFLAIM database consists of five types of files: 1) A control file, 2) lock file, 3) data files, 4) rollback log files, and 5) roll-forward log files. The name of the database is the name of the control file. The names of all other files are based on the name of the control file. The naming convention and usage of each type of file is explained below.

Control File

All other file names that make up a database are derived from the name of the control file. If the control file name has a .db extension, then the part of the name that appears before the .db extension is used to form all other file names. Otherwise, the entire control file name is used to form all other file names. In the following discussion, we will refer to this as the <dbname>. If the control file name is abc.db, then the <dbname> is abc. If the control file name is myname.dat, then the <dbname> is myname.dat.

The first block of the control file is reserved for a database header. These are described in the next section. The rest of the file is actually part of the rollback log space in the database. Because the rollback log can grow and shrink, it is common to see the control file change its size.

Lock File

The lock file name is <dbname>.lck. It resides in the same directory as the control file and is used to prevent multiple processes from opening a database at the same time.

On Windows platforms, the lock file is created and opened in exclusive mode. When XFLAIM first opens a database, it will attempt to create and open this file. When XFLAIM finally closes a database, the lock file will be deleted. The mere existence of the lock file does not mean that the database is currently open by some process. It may be that the process has aborted without shutting down XFLAIM, or the system crashed before XFLAIM could close the database. Thus, when opening a database for the first time, if the file already exists, XFLAIM will attempt to delete the file first. If it cannot delete the file, it knows that another process is currently accessing the database, and it will return an access denied error.

On Unix platforms, XFLAIM uses the lock file in a slightly different way. Instead of deleting and re-creating the file every time it opens a database, the file is created when the database is first created, and remains as long as the database remains. To prevent multiple processes from accessing the database, XFLAIM will put a byte lock on byte zero of the file. If it cannot obtain the byte lock, it knows that another process has already obtained the byte lock and is accessing the database.

Data Files

The data files are used to store all of the blocks of the database, including data blocks, index blocks, available blocks, etc. Data files reside in the same directory as the control file, and have the <dbname> with various extensions. Each data file has a number that is encoded into the extension.

Naming Convention

The maximum number of data files is 2047 (file numbers 1 through 2047). For file numbers 1 through 511, the extension for a data file is its file number encoded as a two digit base 24 number. For file numbers 512 through 2047, the file number mod 512 is used to encode the first two digits as a two digit base 24 number, and then an additional third digit is added to the extension, as follows:

Data File Numbers                      Additional Third Digit

512 through 1023                          ‘r’

1024 through 1535                        ‘s’

1536 through 2047                        ‘t’

The following examples illustrate:

Data File Number                        Data File Name

1                                                                                                            <dbname>.01

2                                                                                                            <dbname>.02

512                                                  <dbname>.00r     (512 mod 512 is 0)

513                                                  <dbname>.01r     (513 mod 512 is 1)

1024                                                <dbname>.00s     (1024 mod 512 is 0)

1025                                                <dbname>.01s     (1025 mod 512 is 1)

1536                                                <dbname>.00t      (1536 mod 512 is 0)

1537                                                <dbname>.01t      (1537 mod 512 is 1)

Rollback Log Files

The rollback log files are used to log blocks of the database. The control file is actually also a rollback log file, except for its very first block (see explanation above). It is considered to be rollback log file number zero. If this file fills up because of a very large transaction (a circumstance that will be very rare), additional rollback log files will be created. These additional rollback log files reside in the same directory as the control file, and will have the <dbname> with various extensions. Each additional rollback log file has a number that is encoded into its extension.

Naming Convention

The maximum number of rollback log files is 2049 -- file number 0 (the control file), and file numbers 2048 through 4095. The file name for file number zero is, of course, xxx.db. Additional rollback log files (2048 through 4095) use the file number mod 512 to encode a two digit extension (base 24 format described above), and then add on a third digit as follows:

Rollback Log File Number                        Additional Third Digit

2048 through 2559                                        ‘v’

2560 through 3071                                        ‘w’

3072 through 3583                                        ‘x’

3584 through 4095                                        ‘z’

Below are some examples:

Rollback Log File Number                        Rollback Log File Name

2048                                                             <dbname>.00v    (2048 mod 512 is 0)

2049                                                             <dbname>.01v    (2049 mod 512 is 1)

2560                                                             <dbname>.00w   (2560 mod 512 is 0)

2561                                                             <dbname>.01w   (2561 mod 512 is 1)

3072                                                             <dbname>.00x    (3072 mod 512 is 0)

3073                                                             <dbname>.01x    (3073 mod 512 is 1)

3584                                                             <dbname>.00z    (3584 mod 512 is 0)

3585                                                             <dbname>.01z    (3585 mod 512 is 1)

Maximum Data File and Rollback File Sizes

The maximum file size for data and rollback files is 0xFFFC0000 bytes (almost 4 gigabytes). Because databases allow up to 2047 data files, database capacity is almost 8 terabytes. This will be the case for most databases. However, some linux platforms do not allow file sizes to exceed 2 gigabytes. In those cases, XFLAIM has to set the maximum file size to a different limit of 0x7FFFF0000 (just under 2 gigabytes). With 2047 data files, this still increases the database capacity to almost 4 terabytes.

Roll Forward Log Files

XFLAIM logs the operations of transactions to a roll-forward log. Roll-forward log files are used to recover transactions after a system failure and when restoring a database from backup.

Naming Convention

Roll-forward log files are stored in a subdirectory called <dbname>.rfl. Unless otherwise specified by an administrator, this subdirectory is located in the same directory as the other database files (<dbname>.db, <dbname>.01, etc.). If an administrator specifies a different directory for the roll-forward log files, a <dbname>.rfl subdirectory will still be created within the specified directory. For example, if an administrator specified sys:\rflfiles as the directory for roll-forward log files, XFLAIM would create a <dbname>.rfl subdirectory:

sys:\rflfiles\<dbname>.rfl

Roll forward log files in the <dbname>.rfl subdirectory will be named as nnnnnnnn.log, where nnnnnnnn is a hex number that is the log file's sequence number. Thus, log file number 1 is named 00000001.log, log file number 2 is named 00000002.log, and so forth.

Data Integrity and Transactions

It is desirable that database operations be performed in such a way as to preserve logical database integrity. However, it is not always possible to leave the database in a logically consistent state after a single update. Multiple update operations may be required before consistency is restored. Thus, in order to preserve consistency, a multi-operation transaction must be atomic; that is, all of the operations in the transaction must either complete or none of them must complete. This allows the database system to support a more complex notion of database integrity than it otherwise could.

Checkpoint

A checkpoint brings the on-disk version of the database up to the same coherent state as the in-memory (cached) database. XFLAIM attempts to do a checkpoint whenever there are periods of minimal update activity on the database. In this case, XFLAIM acquires a lock on the database and does as much work as possible until either the checkpoint completes or another thread wants to update the database.

To prevent the on-disk database from becoming too out of sync, there are conditions under which a checkpoint will be forced even if threads are waiting to update the database. First, if the checkpoint thread has not been able to complete a checkpoint within a specified time interval (default is three minutes), a checkpoint will be forced. Second, a checkpoint will always be forced when XFLAIM is told to shut down. Third, I/O errors or out-of-disk conditions on the RFL volume will cause a checkpoint to be forced. Forcing a checkpoint helps to shorten the amount of time it takes to recover the database after a system failure.

Transactions

XFLAIM provides two types of transactions, update and read.

Update Transaction

An update transaction allows an application to read and update data. Until an update transaction has been committed, none of the operations performed during the transaction are made permanent in the database. Furthermore, changes to the database are not visible to other concurrent transactions. If an update transaction is aborted, the changes made to the database during the transaction are undone (rolled back).

Read Transaction

A read transaction is a transaction where only read operations are allowed. This type of transaction provides a read-consistent view of the database, which can be logically viewed as a snapshot of the database taken at the start of the transaction. In effect, updates made by other concurrent processes that have not committed before the start of the read transaction are not visible from within the transaction. In a concurrent environment, a read transaction is executed so that it never blocks other read or update transactions.

Maintaining a read-consistent view of the database requires XFLAIM to keep multiple versions of database blocks in the database cache. Each prior version of a block is kept until it is no longer needed by any active read transaction.

Transaction Failures

There are two types of transaction failures. The first type of failure occurs when the application executing the transaction discovers an error that makes it impossible to continue the transaction. Upon detecting the error, the application can request that XFLAIM abort the transaction. XFLAIM will then undo (or rollback) all operations that were performed within the transaction.

The other type of transaction failure occurs when the application that is performing the transaction terminates before committing or aborting the transaction, thus leaving the effects of a partially completed transaction in the database. Such transactions are sometimes called “dead” transactions because the application that created the transaction has terminated without specifying a final disposition for the transaction. Dead transactions may be the result of external events over which the application has no control (CPU failures, etc.), or they may be the result of faulty application code. Whatever the reason, XFLAIM provides for the automatic detection and rollback of dead transactions.

Rollback Logging

When updated blocks are written to disk, XFLAIM must first write the prior versions of the blocks to a rollback log. Rollback logging has three primary purposes: 1) to undo a transaction when it aborts, 2) to recover a database to its last checkpointed state when doing database recovery after a system crash, and 3) to maintain read-consistent views of the database for read transactions.

To ensure that the rollback log can be used for recovery after a system failure, the state of the database and the rollback log after any single write must be such that a consistent (checkpointed) state can be restored if a failure were to occur during or after that write.

Roll-Forward Logging

XFLAIM logs the operations of each update transaction to a roll-forward log. Roll-forward log files are used to recover transactions after a system failure and when restoring a database from backup.

XFLAIM is able to operate in two modes with respect to the roll-forward log. In the default mode, the log is truncated every time a checkpoint is completed, since the log is no longer needed for recovery. This mode allows applications that do not need continuous backup capabilities to conserve disk space.

The other mode allows transactions logged to the roll-forward log be kept indefinitely. When this mode is employed, multiple log files are utilized instead of just one. Roll-forward log files are not reset and reused when checkpoints are performed. Instead, the roll-forward log continually grows.

For all practical purposes, a single file with a 64-bit address space would be more than adequate for thousands of years worth of transactions, given the transaction rate we can realistically sustain. However, there are a couple of reasons it is not practical or useful to simply keep growing a single file, even one with 64-bit capacity. First, not all operating systems support 64 bit files. Second, in the design of hot continuous backup, it was desirable that an administrator be allowed to move older portions of the roll-forward log to tape or some other backup media, thus conserving disk space on the volume where the roll-forward log files are kept. To achieve this, the roll-forward log is broken into multiple files. Each log file has a sequence number. The sequence number is written into a header within the file and is also encoded into the log file's name.

For recovery after a non-catastrophic event, only the RFL entries since the last checkpoint are needed. For recovery after a media failure, requiring a backup and the RFL to be used, only the RFL entries logged since the backup are needed. In short, only a subset of the RFL is needed to allow recovery in either case, thus allowing obsolete portions of the RFL to be removed as needed to reduce its footprint. XFLAIM provides mechanisms for an application to identify and remove sections of the log that are no longer relevant.

Recovery

In order to recover from a system failure, a mechanism for undoing the effects of partially completed transactions is required. When XFLAIM performs recovery, it uses the rollback log first to recover the database to its last checkpoint. Subsequently, the transactions in the roll-forward log are replayed to recover the database up to the last committed transaction.

Database recovery is idempotent. This means that if a crash occurs during the recovery, the process can be repeated until the database is successfully recovered. During recovery, occasional checkpoints will be performed so that if a failure happens during the recovery process, the recovery can be resumed without having to re-start from the beginning.

Concurrency Control

The goal of concurrency control is to ensure that operations being executed at the same time by different applications do not interleave in such a way as to compromise database integrity. Because transactions are defined as the unit of work that transforms a database from one consistent state to another, it is necessary to address concurrency issues in the context of transaction processing.

Individual transactions that run in isolation should always leave the database in a consistent state. In practice, it is usually desirable to allow many transactions to run concurrently. However, if the various operations of the different transactions were allowed to interleave indiscriminately, serious errors may result that could leave the database in an inconsistent state. The fundamental concern of database concurrency control is to ensure that concurrent execution of transactions does not result in a loss of database consistency. This means that the effect of interleaving the operations of multiple concurrent transactions should be the same as running the transactions serially.

Locking

In XFLAIM, locking is the technique used to coordinate multiple update transactions. Locking is not used for read transactions. Update transactions do not block read transactions, read transactions do not block update transactions, and read transactions do not block each other. The only transactions that block each other are update transactions. At present, the locking granularity is at the database level. Thus, when an update transaction is started, other updates will be held off until the transaction commits or aborts.

Lock Wait Period

When two update transactions contend for the database lock, one is granted the lock and the other is put into a queue to wait for the lock. An application may specify a lock wait period at the beginning of a transaction. The lock wait period indicates the number of seconds that XFLAIM should allow the transaction to wait for the lock. If the transaction does not obtain the lock within the specified amount of time, the transaction is removed from the lock wait queue and automatically aborted.

Deadlock Prevention

A deadlock can occur when two or more threads try to obtain locks that are already held by each other. XFLAIM prevents deadlock by aborting an update transaction whenever it is denied a lock request.

Many Readers / One Writer

XFLAIM places no restrictions on the number of concurrent readers that can access a database. It is impossible for readers to interfere with each other because they do not modify the database. Whenever an application knows that a transaction will only perform read operations, a read transaction should be used instead of an update transaction. This improves concurrent access to the database.

Backup and Restore

A basic, no-frills backup solution requires that all updates to the database be held off while the backup runs. This could be as simple as shutting down the database server and copying the files to a backup location, or to be slightly more sophisticated, the database server could continue to run in a read-only mode (after all dirty cache is flushed to disk) while the files are copied to a backup location. For most database deployments, this type of backup is generally not acceptable.

The next level of sophistication, hot backup, refers to a backup that is performed while other concurrent operations are allowed to execute against the database. This type of backup results in a snapshot in time of the database, capturing all committed transactions at the time of the backup. All modifications made to the database during the backup are excluded.

A hot backup allows for reasonable protection of the data in the database, while also allowing the database to remain fully on-line for the duration of the backup. The drawback is that changes made to the database between backups are not protected against catastrophic failure. This could mean the loss of several hours, or even days, of database updates depending on when the last backup was made. For some deployments, this risk of partial data loss is unacceptable.

Hot, continuous backup extends the concept of a hot backup by providing a mechanism for protecting changes to the database made between backups. Typically, this is accomplished by preserving roll-forward log (RFL) files, thus maintaining a complete record of changes made to the database since the last hot backup. These log files are typically stored on a device (disk, tape, etc.) separate from the device that hosts the database.

Backup

XFLAIM supports three different types of backups: Full, Incremental, and Continuous. All backup operations take place while the database is on-line, without blocking concurrent transactions.

Full Backup

A full backup makes a complete copy of all data in the database that is committed as of the start of the backup. It does this by starting a single read transaction (thus guaranteeing a read-consistent view of the database) and then streaming each of the blocks in the database out to the backup utility. Since this type of database scan is a classic example of a cache-poisoning operation, the read transaction is started with a special flag that prevents it from using cache in a way that would cause it to be poisoned. It is interesting to note that since block reads are done from cache when possible, it is likely that some of the blocks in the backup set will be newer than the corresponding database blocks on disk.

Incremental Backup

An incremental backup is similar to a full backup in that it is done within a single read transaction that scans every block in the database. The difference is that an incremental backup only copies those blocks that have changed since the last backup (either full or incremental).

Continuous Backup

As discussed above, full and incremental backups are essentially snapshots of the database at the time of the backup. Thus, transactions posted to the database after the start of the backup will not be recorded in the backup set. Continuous backup overcomes this shortcoming by preserving the transactions written to the roll-forward log. During a database restore, the transactions recorded in the roll-forward log can be applied to the newly restored database to bring it up to date with the last committed transaction.

Restore

An XFLAIM database restore is done via a callback mechanism which allows the application to stream bytes from the backup media into XFLAIM. During a restore, XFLAIM will first request data from a full backup. Subsequently, XFLAIM will request data from any incremental backups that are available. And finally, if roll-forward logs are available, XFLAIM will replay transactions until the database is up-to-date or until the restore is terminated.

Caching

XFLAIM uses a two-level caching system: a block cache and a node cache.

Block Cache

The block cache stores in-memory images of the database blocks. Each block in cache maintains a linked list of older and/or newer versions of the blocks that are cached. This is essential for providing read consistency.

Node Cache

The node cache operates at a logically higher level than the block cache. The items in the node cache are XFLAIM DOM nodes that have been extracted from database blocks. Without the node cache, every node access would require XFLAIM to re-construct the node from its corresponding elements in the data blocks of the database. Because of the obvious inefficiency of reconstituting node every time they are needed, the nodes are placed in the node cache after their first non-cached access. Once in cache, nodes can be returned by XFLAIM without having to access database blocks. Like the block cache, items in the node cache are linked in a list of older and/or newer versions.

Cache Poisoning

Cache poisoning occurs when an item is inserted into cache and is subsequently removed from cache before any cache hits occur on that item. Cache poisoning degrades performance, to the point that a severely poisoned cache usually performs slower than running without any cache at all.

The types of access patterns that poison a cache will depend on the algorithm used to determine which items to remove from a full cache. Typical access patterns that cause cache poisoning in XFLAIM are scans and cycles. For example, database scans can iterate over more documents and blocks than could fit in cache, while only visiting each item once.

XFLAIM offers a non-poisoning mode for transactions. In the case of a read transaction, newly added cache blocks are added to the least-recently used (LRU) end of the cache. Since the item at the LRU end may simply be replaced over and over, the rest of the cache remains undisturbed. In the case of an update transaction, new blocks that are read from disk are also added to at the LRU end. However, if a block becomes dirty during an update, it is relocated to the most recently used (MRU) end, since replacing a dirty block is more expensive than replacing a non-dirty block. Also, whenever a cache hit occurs in non-poisoning mode, the item is transposed with its neighbor toward the MRU end. This way, cache hits are promoted incrementally toward the MRU without poisoning cache.

Cache Performance Measures

There are four types of cache measurements we can make: cache hits (how many times we have reused items from cache), cache looks (how many links we follow on the bucket collision chain to end up with a hit), faults (how many times we could not find an item in cache and had to read from disk), and fault looks (how many links we follow on the bucket collision chain only to end up with a fault). The formula (cache looks/cache hits) * 2 gives us an average of how long our collision chains are (the factor of 2 comes from the fact that we would seek an average of halfway down the collision chain to reach a hit). The formula (fault looks/faults) also gives us an average of how long our collision chains are, since every fault results in as many fault looks as there are collision links.

The primary metric by which cache performance should be measured is in the number of faults per unit of throughput. Each cache fault results in an expensive I/O operation. The cost of the I/O operation may vary from platform to platform and may become more expensive as CPU speeds increase and disk speeds stagnate, but will probably be equivalent to at least thousands if not hundreds of thousands of CPU instructions. Efforts spent in reconfiguring a system's cache should be spent in trying to reduce the number of faults. Increasing the cache hits on a system will net little gain unless there is a corresponding decrease in the number of faults over the same operation.

An administrator has many variables to work with when trying to optimize XFLAIM database performance, including total amount of system memory, cache configuration, and access patterns. Adding more memory may not always help performance if the access pattern results in a cache poisoning. Allowing XFLAIM to occupy more cache may not help either, depending on the access pattern. The best recommendation for administrators is to experiment with various tuning variables in a production environment.

Configuration

The XFLAIM cache size can be configured to limit the amount of memory used. The size can be specified as either a hard limit or a dynamically adjusting limit.

Hard Limit

A hard limit, put simply, is a fixed maximum number of bytes that XFLAIM may use for cache. The number, once set, will not change unless a new cache size is explicitly set. A disadvantage to using a hard cache limit is that if the system RAM availability changes for some reason (e.g., a memory upgrade on the server), the cache size will not adapt automatically. A new limit would have to be specified to take advantage of the additional memory.

Dynamic Limit

In an attempt to avoid problems associated with a static cache size, dynamically adjusting limits were developed. Dynamically adjusting limits allow the user to specify a certain percentage of available memory to be used for cache. Available memory is defined as RAM that is not currently allocated to any process plus the RAM which FLAIM is using for cache at that point in time. In addition to specifying the percentage of available memory to use, the user indicates a lower and upper bound for how many bytes cache should consume. The lower bound is expressed as a number of bytes. The upper bound is expressed either in terms of a maximum number of bytes to use or in terms of a minimum number of bytes to leave available on the system.

In order to calculate the actual cache size with the dynamically adjusting limit, the amount of available memory is computed, and the user-specified percentage of that number is computed. Next, that result is compared with the upper bound, and the smaller of those two numbers is used. The final step is to compare that result with the lower bound, and the larger of those two numbers is used as the cache size.

At a certain time interval, known as a cache adjust interval, XFLAIM will perform the above calculations again, and compute a new cache limit. The default cache adjust interval is 15 seconds but the user may configure it differently if desired.

The primary disadvantage of a dynamically adjusting limit is its complexity, which results in a larger user support cost. Users want a simple explanation for how the dynamically adjusting limit works, and a simple formula to compute the optimum configuration values. Unfortunately, the system is inherently complex, and the optimum values for any given system can only be learned by trial and error. Therefore, users wanting to use this feature must be willing to spend adequate time and resources learning about and tinkering with dynamically adjusting cache limits.

Distribution

Cache is divided between document node cache and block cache. The default split is 50% document node cache and 50% block cache but the user may modify this, if desired. We wish to note that our performance tests have yet to reveal any document node cache/block cache divisions that are clearly superior to the default 50/50 split.

Issues

The maximum amount of memory that can be used for cache on a system is determined by several factors. Obviously, a certain amount of physical RAM will be consumed by the OS, other processes running on the system, and parts of the XFLAIM system unrelated to cache. The maximum size of the platform pointer type and FLMUINT type may confine the addressable memory space. For example, if a system's void * or FLMUINT is 32 bits wide, the maximum addressable memory will be 2³² bytes or approximately 4 GB. In addition, the OS may impose limitations on how much virtual address space it will allow a process to use. For example, most versions of Windows limit processes to 2 GB.

Finally, paging cache to disk in a virtual memory environment will degrade performance. Setting the cache size to some amount less than the amount of physical RAM may be the only effective means of eliminating this problem if the platform does not allow the pages to be pinned in memory.

Memory Fragmentation, Cache Preallocation

Cache is a "long-term" memory allocation - meaning that the allocated memory is potentially held on to for long periods of time. A "short-term" allocation is one that is used only temporarily. Because cache is allocated "as needed", the allocations can be mixed with short-term allocations. This can cause cache to be scattered all over the virtual address space. Over time, it has been observed that this can lead to fragmentation of the virtual address space. Fragmentation, if severe enough, can lead to memory allocation failures. This occurs when the address space is so fragmented that it is impossible to find a contiguous chunk of memory of the required size. There may be lots of "available" memory fragments, but none large enough to satisfy a given allocation request.

To help prevent memory fragmentation, XFLAIM has a sophisticated memory management scheme that does the following:

Slab Allocation. All allocations are done in 64K slabs and managed by a slab manager. If larger allocations are needed, they are managed separately and are not allowed to be long-term allocations. Within the 64K chunks, sub-allocations are performed for specific items that need to be cached - blocks, nodes, and values belonging to nodes, as well as supporting structures and other objects.

Automatic Defragmentation. XFLAIM has a background thread that periodically attempts to move cached items to lower memory addresses. This involves packing slabs with lower addresses with cached items from slabs with higher addresses. All cached items know how to "be moved" and whether or not they are currently being accessed and cannot be moved.

Preallocation of Slabs. An application using XFLAIM can tell XFLAIM to pre-allocate slabs. If an application does this when it first starts up, the slabs will be in lower memory addresses, and will not be intermixed with short-term allocations that occur later. This may all but eliminate memory fragmentation problems. However, it has the disadvantage of tying up lots of the address space that cannot be used for other allocations.

Cache Cleanup

A background thread, known as the monitor thread, periodically scans the XFLAIM cache looking for items that are no longer needed. These items are released and the memory allocated to them is returned to the system - unless the memory belongs to the pool of pre-allocated slabs (see above under memory fragmentation).

Database Maintenance

For a variety of reasons, computer systems are subject to failures. These include disk crashes, power failures, software errors, and even sabotage. Despite XFLAIM’s proven stability, extensive experience has shown that there are many factors beyond XFLAIM’s control that can cause database corruptions. These include faulty disk array controller firmware, file system bugs, etc. No database will ever be able to fully isolate itself from external problems that can cause corruptions. Because of this, XFLAIM provides mechanisms that allow corruptions to be detected and repaired.

Run-Time Data Verification

In the normal course of processing database operations, XFLAIM provides capabilities for verifying data.

Block Checksumming

Whenever XFLAIM writes a database block to disk, it calculates a checksum for the data in the block and stores the checksum in the block header. As blocks are read from disk, this checksum is verified. If the checksum is bad, an error is reported.

Database Check

In addition to the continuous run-time data verification mechanisms that are built into XFLAIM, an API for performing a comprehensive on-line database check is provided. There are two levels of checking available: physical checking and index checking. Both can be performed without requiring exclusive access to the database; both update and read transactions may operate concurrently with a database check.

Physical Check

The physical check performs various sanity checks. A comprehensive physical check is able to verify relationships between blocks as well as information within blocks.

Index Check

A structurally sound database may still have logical errors, generally due to code errors in the indexing code (which are rare). The index check is used to verify that all nodes that are referenced from indexes are present and generate the correct keys, and that there are no extraneous keys in the indexes.

Database Rebuild (Salvage)

A database rebuild operation attempts to salvage data from a damaged database. The first thing that a rebuild must do is determine the database block size. Once determined, the rebuild will create an empty destination database for storing the recovered documents. The dictionary collection in the source database is dredged to extract all usable dictionary definitions, which are added to the destination database. Finally, the rebuild tries to extract documents from the source database and adds them to the destination. Note that the rebuild does not try to recover index keys; these are re-created automatically in the destination database by virtue of the fact that the index definitions from the source database were added to the destination’s dictionary.

Space Reclamation

Whenever a block becomes empty, XFLAIM links the block into an available block list (or “avail” list). Subsequently, if XFLAIM needs to create a new block, it will first look in the avail list for a block before extending the database. In certain instances, it may be desirable to have blocks in the avail list returned to the file system to reduce the footprint of a database. XFLAIM provides a function (IF_Db::reduceSize) for reorganizing blocks so that free space can be returned to the file system.

The space reclamation function can be performed on-line, without requiring exclusive access to the database. Update operations, but not reads, are prevented while a reclamation operation is in progress. However, the reduceSize function allows the specification of the maximum amount of unused space to be reclaimed. Typically, it is best to reclaim small chunks at a time by making successive calls to the reclamation function instead of trying to reclaim all unused space in one call. This helps to minimize interference with normal update operations.

[Back To Top]

An XML Database

There is some question in the industry as to what exactly a native XML database should be. There are two possible definitions. A text based native XML database and a model-based XML database. The following is an excerpt from a document by Ronald Bourret, 1999-2001, titled ‘XML And Databases’.

“A text-based native XML database is one that stores XML as text. This might be a file in a file system, a BLOB in a relational database, or a proprietary text format. (It is worth noting that a relational database that has added XML-aware processing of CLOB (Character Large OBject) columns is, in fact, a native XML database with respect to these abilities.)

“The second category of native XML databases is model-based native XML databases. Rather than storing the XML document as text, they build an internal object model from the document and store this model. How the model is stored depends on the database. Some databases store the model in a relational or object-oriented database. For example, storing the DOM in a relational database might result in tables such as Elements, Attributes, PCDATA, Entities, and EntityReferences. Other databases use a proprietary storage format optimized for their model.”

XFLAIM falls into the second category of native XML databases, i.e. it is a model-based native XML database.

Another point to draw here is the concept of Round-Tripping. The idea behind round-tripping is that you should be able to store an XML document in a database, and get that “same” document back again.

“All native XML databases can round-trip documents at the level of elements, attributes, PCDATA, and document order. How much more they can round-trip depends on the database. As a general rule, text-based native XML databases round-trip XML documents exactly, while model-based native XML databases round-trip XML documents at the level of their document model. In the case of particularly minimal document models, this means round-tripping at a level less than canonical XML” (XML and Databases/Ronald Bourret/1999-2001).

To XFLAIM the canonical level means that it is necessary to preserve elements, attributes, values, and relationships between elements and attributes, which includes preserving document order.

In the following XML excerpt, there could be multiple interpretations as to the value of the title element. It may or may not include the leading spaces and new-line characters. In a text-based system, the element is stored and retrieved “as-is” without any alterations. In a model-based system, a parser decides what is significant and feeds canonical XML to the database. Whether or not you get perfect round-tripping depends on the parser and the re-composer.

<TITLE>

" Gone With the Wind"

</TITLE>

The input text XML may not be byte-for-byte identical to the output text XML. But the canonical output from the database will be identical to the canonical input to the database.

[Back To Top]

XFLAIM, XML and the Dictionary

XML (Extensible Markup Language) is a simplified version of SGML (Standard Generalized Markup Language). It uses a series of 'markup' tags to identify the parts of the document. It is a meta language that allows users to create and format their own document markups. Since there are no standard XML markups as such, there needs to be a way to store whatever markup tags come along in the database. XFLAIM does not store XML as plain text. It is formatted for optimal storage and retrieval, meaning you can search quickly to find entries in the database using queries. Rather than store the same XML tag multiple times whenever it is encountered in a document, XFLAIM stores an identifier that corresponds to the XML tag. To convert the identifier back to it's text based value, a dictionary is employed.

The dictionary is where all XML tags are defined and given their unique identifier. The dictionary reference to the XML tags also includes the data type that is associated with the tag and a status indicating whether the dictionary entry is active etc.

The XFLAIM dictionary also stores a number of other important pieces of information. For example, any time a new data collection is created, an entry is made in the dictionary to record important details about the collection.

When working with the XFLAIM API, there is a method off the IF_Db, createElementDef which interacts directly with the dictionary. Other methods often require the tag identifier to perform their functions. The parameter that represents the tag identifier is often called the name id or the dictionary number or the tag number.

[Back To Top]

XFLAIM and the Document Object Model (DOM)

The Document Object Model (DOM) is a programming interface for XML documents (see Document Object Model, Level 1 Specification, W3C Recommendation 1 October 1998). The XFLAIM database provides access to XML documents by way of a DOM-like API. Because it is an XML database, it has many extensions not found in a typical DOM API - such as support for data types other than text (number, binary), indexing, queries, transactions, backup, and other database features.

In XFLAIM, there are two interfaces that have methods for creating DOM nodes. They are the database object (IF_Db) and the DOM object (IF_DOMNode). For the IF_Db object, the two methods for creating nodes are:

For the IF_DOMNode object, there are the following methods:

There are also convenience methods on the IF_DOMNode object that allow attributes to be created and their values set in a single call. These methods may only be used if the DOM node is an element node:

The first node in any XFLAIM document is called the root node. It cannot have a parent node, nor can it have any sibling nodes. Root nodes may be either a Document DOM node or an Element Root Node. The Element Root Node is a variation of the Element DOM node type that is specific to XFLAIM. It allows us to use an element node as a root node in a document.

Documents may be created in XFLAIM in one of two ways. The first, and perhaps the easiest way, is to import the document using an XML file. This may be done by calling the import method on an IF_Db object.

It is important to remember that if you are creating a document using the API, any element, attribute, prefix tags etc. that you intend to use must first be defined in the dictionary.

The other means of creating documents in XFLAIM is to use the APIs that are documented here to allow you to create documents programmatically as needed. First node created is either an Element Root node or a Document node. Once you have the root of the document, you can build up your document by adding child element nodes and attribute nodes as needed.

For example, consider a simple XML document:

<person>

<name first="John" last="Doe"/>

<age>23</age>

</person>

This document is made up of the element node <person>, a child element node <name> and an attribute <age>. The <name> element has two attributes, <first> and <last>. To create this document in the XFLAIM database, we will assume that the element definitions for "person", "name", and "age" and the attribute definitions for "first" and "last" are already defined in the dictionary.

The first thing that must be done is to begin an update transaction using the transBegin method provided by the IF_Db object. Then a root node must be created. The application can use either the createRootElement or createDocument methods that are provided by the IF_Db object. These calls will return a DOM node that is the root of the document. If the root node is created using the createRootElement method, one of the parameters is the element name id, which is a number that represents the element definition for the "person" element in the dictionary.

If the root node is created using the createDocument method, it is necessary to create a child element node with the "person" element name id before proceeding to build the rest of the document. To do this, the createNode method would be called using the IF_DOMNode object that was returned from the createDocument method. The name id parameter would be the number corresponding to the element definition for the "person" element.

Using the <person> DOM node, it is now necessary to create either the <age> child element or the <name> child element. It is not important which one is created next. When creating the child element for either <name> or <age>, the createNode method would be called from the DOM node object that represented the <person> node. In a similar manner, the <first> and <last> attributes of the <name> element could be created by calling the createAttribute method of the DOM node object that represented the <name> node. Or, alternatively, they could be created and their values set simultaneously by calling the setAttributeValueUTF8 method or the setAttributeValueUnicode method of the DOM object that represented the <name> node.

When finished adding all of the nodes needed to represent this document and there are no more documents to add, the transaction would be committed by calling the transCommit method of the IF_Db object.

There are several steps that have been omitted from this explanation. These would include open the database (if not already opened), obtaining the element and attribute name ids from the dictionary, etc. For a more detailed understanding of the various steps involved, please refer to the coding example (sample.cpp).

[Back To Top] [Title Page] [Programming Interface]

XFLAIM Concepts

Summary of XFLAIM Features

DOM Nodes and Documents

Collections

Indexing

Dynamic Dictionary

Query Capabilities

Read and Update Operations

Transactions

Roll-forward Logging

Database Reliability and Recovery

Concurrency

Caching

Optimized Disk Reading / Writing

Database Validation and Repair

Backup / Restore

Database Monitoring / Statistics Collection

Checksums

Database Size

Cross Platform

Utilities

Checksumming

Database Size

Testing

XFLAIM Concepts

Nodes

Node Type

Name Identifier

Node Data Type

Node Value

Node Identifier

Documents

Hierarchical Structure

Other Features of Documents

Repeating Nodes

Non-Occurring Nodes

Flexible Node Ordering

Collections

Document Identifier

Predefined Collections

Dictionary Collection

Default Data Collection

Indexes

Background Indexing

Suspending Indexes

Resuming Indexes

Dictionary

Dictionary Definition Name Attribute

Definition Document Types

Data Types

Number

Text

Binary

Element Definitions

Attribute Definitions

Prefix Definitions

Collection Definitions

Index Definitions

Index Types

Single Node Index

Compound (Multi-Node) Index

Index Options

Node Paths

International Languages

Each Word

Substring

Metaphone

Case-Insensitive Collation

Node Name Identifier Indexing

Encryption Definitions

Querying the Database

Query

Result Sets

Database Files

Control File

Lock File

Data Files

Naming Convention

Rollback Log Files

Naming Convention