Introduction: What is IOFS?

Motivation

It is very hard to analyze I/O requests and their performance as there are many parameters that determine request time, like

  • Page alignment
  • VFS scheduling
  • Process scheduling and priority
  • Network congestion
  • Running programs
  • File system configuration

and many more. Normal analysis tools often need sophisticated knowledge, elevated access and/or custom code adjustments.

For example, here is an overview of Linux highly complex I/O architecture:

Overview of Linux Storage Stack

This results in most programmers, especially interdisciplinary researchers, not being able to sufficiently optimize their I/O accesses. We try to solve this problem with a pure blackbox approach.

Our goals with our solution are the following:

  • Give all HPC users a tool to monitor and rudimentary analyze their I/O
  • Require no code changes, no specific compiler, no specific linking
  • Easy to set up, no propriatary software or specfic knowledge required
  • Completely run in user space, no administration required
  • Integrate in common tooling for further analysis
  • Require no assumptions about the underlying server topology

Our Approach

As mentioned above, we developed iofs as a blackbox approach to report and, together with blackheap, classify all I/O requests without requiring any further information about the I/O hardware or software configuration.

FUSE

We use the Linux userspace filesystem framework FUSE (Filesystem in Userspace) in order to intercept all I/O requests. FUSE works as follows:

Diagram how FUSE uses the VFS to support userspace filesystems

FUSE consists of two parts: The FUSE kernel module and the libfuse library to build file systems. We use FUSE to insert our monitoring logic.

Grafana / Elasticsearch / InfluxDB

In order to allow for easier aggregated analysis of multiple clusters, we support data streaming.

Our inhouse data monitoring setup is set up as follows:

Monitoring

Although we use Grafana as our monitoring platform, iofs is platform agnostic, as it directly inserts the data into the underlying database.

Database

We use the InfluxDB TSDB (Time Series DataBase) as our data source for Grafana. iofs currently only supports the InfluxQL API, not the Flux Syntax. Furthermore, iofs is only tested for InfluxDB 1.x.

We also have some streaming logic to support Elasticsearch as a TSDB, although this is not tested nor actively maintained.

How it works

In order to track and monitor the I/O requests, the user is required to "proxy" all requests through our pseudo-filesystem. Once this is done, the high level workflow works as follows:

  • User does a request on our pseudo filesystem
  • The request goes to the Linux VFS
  • The Linux VFS sees that this is a FUSE filesystem and lets the FUSE kernel module manage the request
  • The FUSE kernel module calls the appropriate iofs method for the requested operation.
  • iofs extracts all request informations.
    • Then it maps the path to the real path
    • Does the underlying file system operation
    • Tracks how long the request takes
    • Aggregates all measured data into some global data structure
    • returns the result of the underlying operation
  • The result gets passed to the FUSE kernel module, then gets passed to the VFS module, then gets passed to the caller.

Every time interval, all aggregated metrics get streamed to any of the two TSDB supported. Note that this is non-blocking.