Skip to content

Commit

Permalink
add prophet-zenfs
Browse files Browse the repository at this point in the history
  • Loading branch information
attack204 committed Apr 20, 2024
1 parent 724cfc5 commit c0fc6a9
Show file tree
Hide file tree
Showing 15 changed files with 1,057 additions and 289 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
util/zenfs
util/zenfs.dbg
fs/*.o
fs/*.cc.d
tests/results
Expand Down
201 changes: 2 additions & 199 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,200 +1,3 @@
# ZenFS: RocksDB Storage Backend for ZNS SSDs and SMR HDDs
# Prophet

ZenFS is a file system plugin that utilizes [RocksDB](https://github.com/facebook/rocksdb)'s FileSystem interface to
place files into zones on a raw zoned block device. By separating files into
zones and utilizing write life time hints to co-locate data of similar life
times the system write amplification is greatly reduced compared to
conventional block devices. ZenFS ensures that there is no background
garbage collection in the file system or on the disk, improving performance
in terms of throughput, tail latencies and disk endurance.

## Community
For help or questions about zenfs usage (e.g. "how do I do X?") see below, on join us on [Matrix](https://app.element.io/#/room/#zonedstorage-general:matrix.org), or on [Slack](https://join.slack.com/t/zonedstorage/shared_invite/zt-uyfut5xe-nKajp9YRnEWqiD4X6RkTFw).

To report a bug, file a documentation issue, or submit a feature request, please open a GitHub issue.

For release announcements and other discussions, please subscribe to this repository or join us on Matrix or Slack.

## Dependencies

ZenFS depends on[ libzbd ](https://github.com/westerndigitalcorporation/libzbd)
and Linux kernel 5.4 or later to perform zone management operations. To use
ZenFS on SSDs with Zoned Namespaces, Linux kernel 5.9 or later is required.
ZenFS works with RocksDB version v6.19.3 or later.

# Getting started

## Build

Download, build and install libzbd. See the libzbd [ README ](https://github.com/westerndigitalcorporation/libzbd/blob/master/README.md)
for instructions.

Download rocksdb and the zenfs projects:
```
$ git clone https://github.com/facebook/rocksdb.git
$ cd rocksdb
$ git clone https://github.com/westerndigitalcorporation/zenfs plugin/zenfs
```

Build and install rocksdb with zenfs enabled:
```
$ DEBUG_LEVEL=0 ROCKSDB_PLUGINS=zenfs make -j48 db_bench install
```

Build the zenfs utility:
```
$ pushd
$ cd plugin/zenfs/util
$ make
$ popd
```

## Configure the IO Scheduler for the zoned block device

The IO scheduler must be set to deadline to avoid writes from being reordered.
This must be done every time the zoned name space is enumerated (e.g at boot).

```
echo deadline > /sys/class/block/<zoned block device>/queue/scheduler
```

## Creating a ZenFS file system

Before ZenFS can be used in RocksDB, the file system metadata and superblock must be set up.
This is done with the zenfs utility, using the mkfs command. A ZenFS filesystem can be created
on either a raw zoned block device or on a zonefs filesystem on a zoned block device. For a raw
zoned block device, the device is specified using `--zbd=<zoned block device>`:

```
./plugin/zenfs/util/zenfs mkfs --zbd=<zoned block device> --aux_path=<path to store LOG and LOCK files>
```

If using zonefs, the zonefs file system mountpoint is specified instead using `--zonefs=<zonefs mountpoint>`:

```
./plugin/zenfs/util/zenfs mkfs --zonefs=<zonefs mountpoint> --aux_path=<path to store LOG and LOCK files>
```

In general, all operations of the zenfs utility can target either a raw block device or a zonefs mountpoint.

When using zonefs, the zonefs volumes should be mounted with the option "explicit-open":

```
sudo mount -o explicit-open <zoned block device> <zonefs mountpoint>
```

## ZenFS on-disk file formats

ZenFS Version 1.0.0 and earlier uses version 1 of the on-disk format.
ZenFS Version 2.0.0 introduces breaking on-disk-format changes (inline extents, support for zones larged than 4GB).

To migrate between different versions of the on-disk file format, use the zenfs backup/restore commands.

```
# Backup the disk contents to the host file system using the version of zenfs that was used to store the current database
./plugin/zenfs/util/zenfs backup --path=<path to store backup> --zbd=<zoned block device>
# Switch to the new version of ZenFS you want to use (e.g 1.0.2 -> 2.0.0), rebuild and create a new file system
# Remove the current aux folder if needed.
./plugin/zenfs/util/zenfs mkfs --force --zbd=<zoned block device> --aux_path=<path to store LOG and LOCK files>
# Restore the database files to the new version of the file system
./plugin/zenfs/util/zenfs restore --path=<path to backup> --zbd=<zoned block device>
```

Likewise, it is possible to migrate between a raw zoned block device and a zonefs filesystem by using backup on one
and restore on the other. One thing to be aware of is that for a given block device, zonefs will expose one zone less
to zenfs as the zonefs formatting will consume one zone for the zonefs superblock.

## Testing with db_bench

To instruct db_bench to use zenfs on a specific zoned block device, the --fs_uri parameter is used.
The device name may be used by specifying `--fs_uri=zenfs://dev:<zoned block device name>` for a raw
block device, `--fs_uri=zenfs://zonefs:<zonefs mountpoint>` for a zonefs mountpoint or by specifying
a unique identifier for the created file system by specifying `--fs_uri=zenfs://uuid:<UUID>`. UUIDs
can be listed using `./plugin/zenfs/util/zenfs ls-uuid`

```
./db_bench --fs_uri=zenfs://dev:<zoned block device name> --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction
```

## Performance testing

If you want to use db_bench for testing zenfs performance, there is a a convenience script
that runs the 'long' and 'quick' performance test sets with a good set of parameters
for the drive.

`cd tests; ./zenfs_base_performance.sh <zoned block device name> [ <zonefs mountpoint> ]`


## Crashtesting
To run the crashtesting scripts, Python3 is required.
Crashtesting is done through the modified db_crashtest.py
(original [db_crashtest.py](https://github.com/facebook/rocksdb/blob/main/tools/db_crashtest.py)).
It kills the DB at a random point in time (blackbox) or at predefined places
in the RocksDB code (whitebox) and checks for recovery.
For further reading visit the RocksDB [wiki](https://github.com/facebook/rocksdb/wiki/Stress-test).
However the goal for ZenFS crashtesting is to cover a specified set of
parameters rather than randomized continuous testing.

The convenience script can be used to run all crashtest sets defined in `tests/crashtest`.
```
cd tests; ./zenfs_base_crashtest.sh <zoned block device name>
```

## Prometheus Metrics Exporter

To export performance metrics to Prometheus, do the following:

Set environment variable ZENFS_EXPORT_PROMETHEUS=y when building to enable
prometheus export of metrics. Exporter will listen on 127.0.0.1:8080.

**Requires prometheus-cpp-pull == 1.1.0**

# ZenFS Internals

## Architecture overview
![zenfs stack](https://user-images.githubusercontent.com/447288/84152469-fa3d6300-aa64-11ea-87c4-8a6653bb9d22.png)

ZenFS implements the FileSystem API, and stores all data files on to a raw
zoned block device. Log and lock files are stored on the default file system
under a configurable directory. Zone management is done through libzbd and
ZenFS io is done through normal pread/pwrite calls.

## File system implementation

Files are mapped into into a set of extents:

* Extents are block-aligned, continuous regions on the block device
* Extents do not span across zones
* A zone may contain more than one extent
* Extents from different files may share zones

### Reclaim

ZenFS is exceptionally lazy at current state of implementation and does
not do any garbage collection whatsoever. As files gets deleted, the used
capacity zone counters drops and when it reaches zero, a zone can be reset
and reused.

### Metadata

Metadata is stored in a rolling log in the first zones of the block device.

Each valid meta data zone contains:

* A superblock with the current sequence number and global file system metadata
* At least one snapshot of all files in the file system
* Incremental file system updates (new files, new extents, deletes, renames etc)

# Contribution Guide

ZenFS uses clang-format with Google code style. You may run the following commands
before submitting a PR.

```bash
clang-format-11 -n -Werror --style=file fs/* util/zenfs.cc # Check for style issues
clang-format-11 -i --style=file fs/* util/zenfs.cc # Auto-fix the style issues
```
Prophet-ZenFS should build with Prophet-RocksDB. The step is shown in [Prophet-RocksDB](https://github.com/asu-idi/prophet-rocksdb/blob/main/README.md)
Loading

0 comments on commit c0fc6a9

Please sign in to comment.