Using Data Deduplication and Compression with VDO on RHEL 7 and 8

Storage deduplication technology has been on the market for quite some time now. Unfortunately all of the implementations have been vendor specific proprietary software. With VDO, there is now an open source Linux native solution available.

Red hat has introduced VDO (Virtual Data Optimizer) in RHEL 7.5, a storage deduplication technology bough with Permabit in 2017. Of course it has been open sourced since then.

In contrast to ZFS which provides the same functionality on the file system level, VDO is an inline data reduction which works on block device level, it is file system agnostic.

Use cases

There are basically two major use cases: VM Storage and Object Storage Backends.

VM Storage

The main use case is storage for virtual machines where a lot of data is redundant, i.e. the base operating system of the VMs. This allows to deduplicate the data on disk on a large scale, think about 100 VMs where the operating system takes about 5Gbyte each will be reduced to approx. 5 Gbyte instead of 500 Gbyte.

Typically VM storage can be over committed by factor 10.

Object- and Block storage backends

As a backend for CEPH and Glusterfs, it is recommended to not over commit more than factor 3. The reason for the lower over commitment is that the storage administrator usually does not know what kind of data will be stored on it.

Availability

VDO is available since RHEL 7.5, it is included in the base subscription. At the moment it is not available for Fedora (yet).

The source code is available on github:

At the moment the Kernel code is not yet in the upstream Mainline Kernel, it is ongoing work to get it into the Mainstream Kernel.

Typical setup

Physical disk -> VDO -> Volumegroup -> Logical volume -> file system.

Block device can be a physical disk (or a partition on it), multi path device, LUKS disk, or a software RAID device (md or LVM RAID).

Restrictions

You can not use LVM cache, LVM snapshots and thin provisioned logical volumes on top of VDO. Theoretically you can use LUKS on top of VDO, but it makes no sense because there is nothing to deduplicate. Needless to say that VDO on top of a VDO device does not make any sense as well. Be aware that you can not make use of partitioning or (LVM) Raid on top of VDO devices, all that things should be done in the underlying layer of VDO.

When using SAN, check if your storage box already does deduplication. In this case VDO is useless for you.

Installation

Its straight forward:

[root@vdotest ~]# yum -y install vdo kmod-kvdo

Create the VDO volume

In this test case, I attached a 110Gbyte disk, created a 100 GByte partition and will over commit it by factor 10.

Warning! As of writing this article, never use a whole physical disk, use a partition instead and leave some spare space in the disk to avoid data loss! (see further below)

[root@vdotest ~]# vdo create --name=vdo1 --device=/dev/vdb1 --vdoLogicalSize=1T

Creating volume group, logical volume and file system on top of the VDO volume

[root@vdotest ~]# pvcreate /dev/mapper/vdo1
[root@vdotest ~]# vgcreate vg_vdo /dev/mapper/vdo1
[root@vdotest ~]# lvcreate -n lv_vdo vg_vdo -L 900G
[root@vdotest ~]# mkfs.xfs -K /dev/vg_vdo/lv_vdo
[root@vdotest ~]# echo "/dev/mapper/vg_vdo-lv_vdo       /mnt    xfs     defaults,x-systemd.requires=vdo.service 0 0" >> /etc/fstab

Display the whole stack

[root@vdotest ~]# lsblk /dev/vdb
NAME                MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vdb                 252:16   0  110G  0 disk 
└─vdb1              252:17   0  100G  0 part 
  └─vdo1            253:7    0    1T  0 vdo  
    └─vg_vdo-lv_vdo 253:8    0  900G  0 lvm  /mnt
[root@vdotest ~]#

Populate the disk with data

The ideal test for VDO is to put some real-life VM-Images to the file system on top of it. In this case I scp’ed three IPA server and some instances to that file system. This kind of systems are all quite similar, the disk space saved is tremendous. The total size of the vm images is 105G

Lets have a look:

[root@vdotest ~]# df -h /mnt
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/vg_vdo-lv_vdo  900G  105G  800G  12% /mnt
[root@vdotest ~]# 
[root@vdotest ~]# ll -h /mnt
total 101G
-rw-------. 1 root root 21G Dec 17 09:41 ipa1.lab.delouw.ch.qcow2
-rw-------. 1 root root 21G Dec 17 09:47 ipa1.ldelouw.ch
-rw-------. 1 root root 21G Dec 17 09:54 ipa2.ldelouw.ch
-rw-r--r--. 1 root root 21G Dec 17 09:58 ipaclient-rhel6.home.delouw.ch
-rw-------. 1 root root 21G Dec 17 10:03 ipatest.delouw.ch.qcow2
[root@vdotest ~]#

Lets use the vdostats utility to display the actually used storage on disk:

[root@vdotest ~]# vdostats --si
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo1        107.4G     15.2G     92.2G  14%           89%
[root@vdotest ~]#

Performance Tuning

There are a lot of parameters that can be changed. Unfortunately the documentation available at the moment is rudimentary, thus its more a guesswork than facts.

  • Number of worker threads of different kind
  • Enable or Disable compression

On machines with a lot of CPUs using more threads than the defaults can dramatically boost performance. man 8 vdo gives a glimpse of the different parameters related to threads.

Compression is a quite expensive operation. On top of that, depending on the kind of data you are storing, it does not make much sense to use compression (Well, deduplication is kind of compression as well).

Pitfalls

Be aware! With every storage deduplication solution there comes a big pitfall: The logical volume on top of VDO shows free disk space while the actual disk space on the physical disk can be (almost) exhausted. You need to carefully monitor the actual disk usage.

The fill grade can rapidly change if the data to be stored contains a lot of non-deduplicatable and/or compressible data. A good example is virtual machine images containing a LUKS encrypted disk, In such a case, use LUKS on the storage, not on the VM level.

Even if you update one virtual machine, the delta to other machine images will grow and less physical space is available.

VDO comes with a few Nagios plugins which are very useful for alerting administrators in the cause the available physical disk is filling up. They are located in /usr/share/doc/vdo/examples/nagios

According to df -h, on my test system there is still 800 Gbyte available. What happens if I store my 700 Gbyte Satellite 6 image? The data is mostly RPMs which are already compressed quite well. Lets see….

After a transfer of approx 155 Gbyte, the physical disk got full and the file system is inaccessible. I was hitting the worst case that can happen: Complete and unrecoverable data loss.

The df command shows some 241 Gbyte free.

[root@vdotest ~]# df -h |grep mnt
/dev/mapper/vg_vdo-lv_vdo                900G  241G  660G  27% /mnt
[root@vdotest ~]#

The vdostat command tells a different story, like expected.

[root@vdotest ~]# vdostats --si
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo1        107.4G    107.4G      0.0B 100%           59%
[root@vdotest ~]# 

When attempting to access the data, there will be an I/O error.

[root@vdotest ~]# ll -h /mnt
ls: cannot access /mnt: Input/output error
[root@vdotest ~]# 

Thats bad. I mean really bad. The device is not accessible anymore.

xfs_repair does not work. Do not attempt to make use of the -L option! Your file system will be gone.

Recovering from a full physical disk

Lets resize the partition instead. First unmount the file system

[root@vdotest ~]# umount /mnt

Delete and recreate the partition using fdisk

[root@vdotest ~]# fdisk /dev/vdb
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): d
Selected partition 1
Partition 1 is deleted

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): 
Using default response p
Partition number (1-4, default 1): 
First sector (2048-41943039, default 2048): 
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-41943039, default 41943039): 
Using default value 41943039
Partition 1 of type Linux and of size 20 GiB is set

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
[root@vdotest ~]# partprobe
[root@vdotest ~]#
[root@vdotest ~]# vdo growPhysical -n vdo1

Run a file system check.

Now you are able to mount the file system again and your data is available again.

Documentation

Red Hat maintains a nice documentation about storage administration, VDO is covered by an own chapter. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/storage_administration_guide/#vdo

Conclusion

The technology is very interesting and will kick some ass. Storage deduplication will be more and more important, with VDO there is now a Linux native solution for that.

At the moment it is quite dangerous to use VDO in production. Filling up a physical disk without spare space is an unrecoverable error, a complete data loss. That means: Always create the VDO device on top of a partition that is not using the whole disk or another device that can grow in size to prevent data loss.

If you plan to use VDO in production make sure you have a proper monitoring in place that alerts quite ahead of time to be able to take corrective action.

Nevertheless: Its cool stuff and I’m sure the current situation will be fixed soon.

Leave a Reply

Your email address will not be published. Required fields are marked *