Why LTFS Is a Bad Choice for Tape-Based Object Storage Systems

The Linear Tape File System (LTFS) is an ISO/IEC standardized file system format for tape media. LTFS drivers are available for all major operating system environments and provide read and write access to LTFS-formatted tapes.

Some object storage systems, which can also store data on tape media, use LTFS as their data storage format. In theory, it should be possible to read and restore data without the object storage system.

This article evaluates whether this is feasible in practice and analyzes the advantages and disadvantages of LTFS from the perspective of an object storage system.

LTFS Concept

LTFS was introduced with LTO-5 to simplify data storage on LTO tapes, and in particular to simplify the exchange of large amounts of data. The goal was to make access to tapes as similar as possible to hard drives. It should be possible to mount a tape like a hard disk and the data should be visible and accessible as a file structure. LTFS was intended to increase the acceptance of tape as a storage medium by suggesting that a tape could be used like a hard disk, e.g. by copying files in Explorer using “drag & drop”.

For this purpose, a tape must contain a table of contents, i.e. an index, which can be read in a short time. For this reason, a partitioning function for LTO tapes was introduced at the same time as the LTFS specification. A small index partition (partition 0) is reserved for the index and metadata. The large data partition (partition 1) stores the actual user data including one or more copies of the index data.

LTFS Indexing

When an LTFS-formatted tape is inserted into the tape drive of a computer system that has a compatible LTFS driver installed, the index is first completely read when a mount command is executed. This index is stored either in memory or on the local hard disk. The data on the tape can then be viewed in a file browser as a file and directory structure, similar to a hard disk.

When you try to work with tape in the same way that you are used to working with hard disks, the differences between the two media technologies become clear.

Data Access

Hard disk drives allow blocks of data to be read at random with access times of a few milliseconds (“random access”). In contrast, tape is a sequential medium with access times ranging from seconds to minutes. However, if a large number of small files are to be read from a tape, long access times must be avoided by reading the files sequentially in the order in which they were written. Otherwise, the reading process will take days.

A standard tool like Explorer cannot do this. A special tool would be needed to rearrange the index by directories. Currently, no such tool exists.

Therefore, using LTFS with an object storage system makes little sense. It is much more effective for the object storage system to write and manage the data on the tape media itself.

Thanks to the object storage system’s S3 REST API, applications can take advantage of the characteristics of tape. Unlike the limited capabilities of a file system, S3 can handle high latencies when reading from tape. S3 allows applications to leverage the capabilities and limitations of tape. More and more vendors, such as Veeam and Hammerspace, now support S3-to-Tape in their applications.

Performance and Efficiency

A major drawback of the LTFS format is the significant loss of read and write performance. These losses are caused by the additional file markers, the mandatory alignment of files to block boundaries, and the forced updating of the index. These additional elements result in a very inefficient use of a tape’s storage capacity, especially for smaller files.

When using tape media in an object-based storage system, performance and efficiency are critical factors. Data must be written and read as efficiently as possible and at the maximum speed of the tape drives.

Metadata

LTFS was designed as a file system and therefore has typical file system restrictions on the structure and length of file names and the characters used in file names. Apart from a maximum length, object storage systems generally have no such restrictions on object names. As a result, many common object names cannot be mapped to file names.

In addition, the applications of object storage systems are using more and more custom metadata and tags.

Object: Content - Metadata - Identifier

Although this metadata can be stored in the LTFS index, it is then proprietary, i.e. it is ignored by the LTFS drivers and can drastically increase the size of the index. The latter can cause a tape’s storage capacity to be not fully utilized, especially for smaller objects, due to an overflow of the index partition.

Without the object storage software that wrote the metadata, the data cannot be fully read and interpreted. The argument of software independence in relation to LTFS therefore loses its validity. The use of LTFS in conjunction with object storage systems is no longer justified, but actually unfavorable.

Versioning

As is typical for file systems, LTFS does not support file versions. However, versioning is a common feature in object storage systems and their applications, especially in combination with Object Locks.

The only way the object storage system could get around this disadvantage would be to add a proprietary extension to LTFS. This in turn means that the data, including version information, would only be readable in conjunction with the object storage software. The interchangeability and software independence of LTFS would be lost.

A much more effective solution is to keep version management solely in the object storage software, which directly supports the tape media without LTFS overhead.

Data Spanning

LTFS supports spanning of data segments across multiple tapes. Such spanning occurs when applications transfer large objects to the object storage system in parts (“S3 Multipart Upload”).

However, a simple LTFS driver cannot read and reassemble file segments that are spanning multiple tapes. This requires either the object storage system or a specialized LTFS tool that is currently not available.

Object storage systems with S3-to-Tape support can manage the distribution of data across many thousands of segments. This includes spanning very large files (objects) across multiple media.

Tape Management

In larger enterprises, tape autoloaders and libraries are often used to manage and utilize large numbers of tapes. In large libraries, the number of tapes being managed can be in the four-digit range. Without management software with a corresponding database, it is practically impossible to search and find specific files. Finding the right tape from a large number of tapes is virtually impossible without management software, even if the individual tapes could be read in another environment without management software.

Erasure Coding

Erasure coding is a common method used in object storage systems to protect data on storage media. Data is split into fragments and expanded and encoded with redundant data pieces. The resulting data segments are then stored on multiple media. The simultaneous use of multiple tapes, such as in tape libraries, makes this process very efficient.

PoINT Archival Gateway | Write Path

LTFS precludes the use of erasure coding techniques with tape media. Since LTFS specifies a self-contained file system per tape, files with redundant data cannot be distributed across multiple tapes.

The following graphic shows an example of using Erasure Coding (EC) in an object storage system on a tape group with four media and an EC rate of 3/4. This EC rate allows any one of the four media to fail without data loss. The capacity overhead is only 1.33. In addition, simultaneous, parallel writing to four tape media significantly increases not only data protection, but also performance.

Conclusion

The use of LTFS has functional limitations and drawbacks that can only be circumvented, if at all, by proprietary extensions to LTFS and special reading tools. In addition, the performance and efficiency of LTFS is clearly inferior to a proprietary but optimized format.

Finally, the supposed advantage of interchangeability and independence is practically meaningless when using a large number of tapes, as is common with object storage systems, since the data on the individual media can no longer be found or assigned without the associated management software.

For these reasons, using LTFS in object storage systems would be a bad decision. Instead, the management software should manage the data distribution and support the tape drives as directly as possible. This is the only way to take advantage of the full read/write performance of the drives, the full capacity of the tapes, and the full functionality of S3 and object storage.