Introduction: What is IOFS?

Motivation

It is very hard to analyze I/O requests and their performance as there are many parameters that determine request time, like

Page alignment
VFS scheduling
Process scheduling and priority
Network congestion
Running programs
File system configuration

and many more. Normal analysis tools often need sophisticated knowledge, elevated access and/or custom code adjustments.

For example, here is an overview of Linux highly complex I/O architecture:

This results in most programmers, especially interdisciplinary researchers, not being able to sufficiently optimize their I/O accesses. We try to solve this problem with a pure blackbox approach.

Our goals with our solution are the following:

Give all HPC users a tool to monitor and rudimentary analyze their I/O
Require no code changes, no specific compiler, no specific linking
Easy to set up, no propriatary software or specfic knowledge required
Completely run in user space, no administration required
Integrate in common tooling for further analysis
Require no assumptions about the underlying server topology

As mentioned above, we developed iofs as a blackbox approach to report and, together with blackheap, classify all I/O requests without requiring any further information about the I/O hardware or software configuration.

FUSE

We use the Linux userspace filesystem framework FUSE (Filesystem in Userspace) in order to intercept all I/O requests. FUSE works as follows:

FUSE consists of two parts: The FUSE kernel module and the libfuse library to build file systems. We use FUSE to insert our monitoring logic.

Grafana / Elasticsearch / InfluxDB

In order to allow for easier aggregated analysis of multiple clusters, we support data streaming.

Our inhouse data monitoring setup is set up as follows:

Monitoring

Although we use Grafana as our monitoring platform, iofs is platform agnostic, as it directly inserts the data into the underlying database.

Database

We use the InfluxDB TSDB (Time Series DataBase) as our data source for Grafana. iofs currently only supports the InfluxQL API, not the Flux Syntax. Furthermore, iofs is only tested for InfluxDB 1.x.

We also have some streaming logic to support Elasticsearch as a TSDB, although this is not tested nor actively maintained.

How it works

In order to track and monitor the I/O requests, the user is required to "proxy" all requests through our pseudo-filesystem. Once this is done, the high level workflow works as follows:

User does a request on our pseudo filesystem
The request goes to the Linux VFS
The Linux VFS sees that this is a FUSE filesystem and lets the FUSE kernel module manage the request
The FUSE kernel module calls the appropriate iofs method for the requested operation.
iofs extracts all request informations.
- Then it maps the path to the real path
- Does the underlying file system operation
- Tracks how long the request takes
- Aggregates all measured data into some global data structure
- returns the result of the underlying operation
The result gets passed to the FUSE kernel module, then gets passed to the VFS module, then gets passed to the caller.

Every time interval, all aggregated metrics get streamed to any of the two TSDB supported. Note that this is non-blocking.

IOFS documentation