Storage deduplication technology has been on the market for quite some time now. Unfortunately all of the implementations have been vendor-specific proprietary software. With VDO, there is now an open-source Linux native solution available.
Red hat has introduced VDO (Virtual Data Optimizer) in RHEL 7.5, a storage deduplication technology bough with Permabit in 2017. Of course it has been open-sourced since then.
In contrast to ZFS which provides the same functionality on the file system level, VDO is an inline data reduction that works on a block device level, it is file system agnostic.
Use cases
There are basically two major use cases: VM Storage and Object Storage Backends.
VM Storage
The main use case is storage for virtual machines where a lot of data is redundant, i.e. the base operating system of the VMs. This allows to deduplicate the data on disk on a large scale, think about 100 VMs where the operating system takes about 5Gbyte each will be reduced to approx. 5 Gbyte instead of 500 Gbyte.
Typically VM storage can be overcommitted by factor 10.
Object- and Block storage backends
As a backend for CEPH and Glusterfs, it is recommended to not over-commit more than factor 3. The reason for the lower over commitment is that the storage administrator usually does not know what kind of data will be stored on it.
Availability
VDO is available since RHEL 7.5, it is included in the base subscription. At the moment it is not available for Fedora (yet).
The source code is available on github:
At the moment the Kernel code is not yet in the upstream Mainline Kernel, it is ongoing work to get it into the Mainstream Kernel.
Typical setup
Physical disk -> VDO -> Volumegroup -> Logical volume -> file system.
Block device can be a physical disk (or a partition on it), multipath device, LUKS disk, or a software RAID device (md or LVM RAID).
Restrictions
You can not use LVM cache, LVM snapshots and thin provisioned logical volumes on top of VDO. Theoretically you can use LUKS on top of VDO, but it makes no sense because there is nothing to deduplicate. Needless to say that VDO on top of a VDO device does not make any sense as well. Be aware that you can not make use of partitioning or (LVM) Raid on top of VDO devices, all that things should be done in the underlying layer of VDO.
When using SAN, check if your storage box already does deduplication. In this case VDO is useless for you.
Installation
Its straight forward:
[root@vdotest ~]# yum -y install vdo kmod-kvdo
Create the VDO volume
In this test case, I attached a 110Gbyte disk, created a 100 GByte partition and will over-commit it by factor 10.
Warning! As of writing this article, never use a whole physical disk, use a partition instead and leave some spare space in the disk to avoid data loss! (see further below)
[root@vdotest ~]# vdo create --name=vdo1 --device=/dev/vdb1 --vdoLogicalSize=1T
Creating volume group, logical volume and file system on top of the VDO volume
[root@vdotest ~]# pvcreate /dev/mapper/vdo1 [root@vdotest ~]# vgcreate vg_vdo /dev/mapper/vdo1 [root@vdotest ~]# lvcreate -n lv_vdo vg_vdo -L 900G [root@vdotest ~]# mkfs.xfs -K /dev/vg_vdo/lv_vdo [root@vdotest ~]# echo "/dev/mapper/vg_vdo-lv_vdo /mnt xfs defaults,x-systemd.requires=vdo.service 0 0" >> /etc/fstab
Display the whole stack
[root@vdotest ~]# lsblk /dev/vdb NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vdb 252:16 0 110G 0 disk └─vdb1 252:17 0 100G 0 part └─vdo1 253:7 0 1T 0 vdo └─vg_vdo-lv_vdo 253:8 0 900G 0 lvm /mnt [root@vdotest ~]#
Populate the disk with data
The ideal test for VDO is to put some real-life VM-Images to the file system on top of it. In this case I scp’ed three IPA server and some instances to that file system. This kind of systems are all quite similar, the disk space saved is tremendous. The total size of the VM images is 105G
Lets have a look:
[root@vdotest ~]# df -h /mnt Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_vdo-lv_vdo 900G 105G 800G 12% /mnt [root@vdotest ~]#
[root@vdotest ~]# ll -h /mnt total 101G -rw-------. 1 root root 21G Dec 17 09:41 ipa1.lab.delouw.ch.qcow2 -rw-------. 1 root root 21G Dec 17 09:47 ipa1.ldelouw.ch -rw-------. 1 root root 21G Dec 17 09:54 ipa2.ldelouw.ch -rw-r--r--. 1 root root 21G Dec 17 09:58 ipaclient-rhel6.home.delouw.ch -rw-------. 1 root root 21G Dec 17 10:03 ipatest.delouw.ch.qcow2 [root@vdotest ~]#
Lets use the vdostats utility to display the actually used storage on disk:
[root@vdotest ~]# vdostats --si Device Size Used Available Use% Space saving% /dev/mapper/vdo1 107.4G 15.2G 92.2G 14% 89% [root@vdotest ~]#
Performance Tuning
There are a lot of parameters that can be changed. Unfortunately the documentation available at the moment is rudimentary, thus it’s more a guesswork than facts.
- Number of worker threads of different kind
- Enable or Disable compression
On machines with a lot of CPUs using more threads than the defaults can dramatically boost performance. man 8 vdo gives a glimpse of the different parameters related to threads.
Compression is a quite expensive operation. On top of that, depending on the kind of data you are storing, it does not make much sense to use compression (Well, deduplication is kind of compression as well).
Pitfalls
Be aware! With every storage deduplication solution there comes a big pitfall: The logical volume on top of VDO shows free disk space while the actual disk space on the physical disk can be (almost) exhausted. You need to carefully monitor the actual disk usage.
The fill grade can rapidly change if the data to be stored contains a lot of non-deduplicatable and/or compressible data. A good example is virtual machine images containing a LUKS encrypted disk, In such a case, use LUKS on the storage, not on the VM level.
Even if you update one virtual machine, the delta to other machine images will grow and less physical space is available.
VDO comes with a few Nagios plugins which are very useful for alerting administrators in the cause the available physical disk is filling up. They are located in /usr/share/doc/vdo/examples/nagios
According to df -h, on my test system there is still 800 Gbyte available. What happens if I store my 700 Gbyte Satellite 6 image? The data is mostly RPMs which are already compressed quite well. Let’s see…
After a transfer of approx 155 Gbyte, the physical disk got full and the file system is inaccessible. I was hitting the worst case that can happen: Complete and unrecoverable data loss.
The df command shows some 241 Gbyte free.
[root@vdotest ~]# df -h |grep mnt /dev/mapper/vg_vdo-lv_vdo 900G 241G 660G 27% /mnt [root@vdotest ~]#
The vdostat command tells a different story, like expected.
[root@vdotest ~]# vdostats --si Device Size Used Available Use% Space saving% /dev/mapper/vdo1 107.4G 107.4G 0.0B 100% 59% [root@vdotest ~]#
When attempting to access the data, there will be an I/O error.
[root@vdotest ~]# ll -h /mnt ls: cannot access /mnt: Input/output error [root@vdotest ~]#
That’s bad. I mean really bad. The device is not accessible anymore.
xfs_repair does not work. Do not attempt to make use of the -L option! Your file system will be gone.
Recovering from a full physical disk
Let’s resize the partition instead. First, unmount the file system
[root@vdotest ~]# umount /mnt
Delete and recreate the partition using fdisk
[root@vdotest ~]# fdisk /dev/vdb Welcome to fdisk (util-linux 2.23.2). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Command (m for help): d Selected partition 1 Partition 1 is deleted Command (m for help): n Partition type: p primary (0 primary, 0 extended, 4 free) e extended Select (default p): Using default response p Partition number (1-4, default 1): First sector (2048-41943039, default 2048): Using default value 2048 Last sector, +sectors or +size{K,M,G} (2048-41943039, default 41943039): Using default value 41943039 Partition 1 of type Linux and of size 20 GiB is set Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. WARNING: Re-reading the partition table failed with error 16: Device or resource busy. The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8) Syncing disks. [root@vdotest ~]# partprobe [root@vdotest ~]#
[root@vdotest ~]# vdo growPhysical -n vdo1
Run a file system check.
Now you are able to mount the file system again and your data is available again.
Documentation
Red Hat maintains a nice documentation about storage administration, VDO is covered by an own chapter. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/storage_administration_guide/#vdo
Conclusion
The technology is very interesting and will kick some ass. Storage deduplication will be more and more important, with VDO there is now a Linux native solution for that.
At the moment it is quite dangerous to use VDO in production. Filling up a physical disk without spare space is an unrecoverable error, a complete data loss. That means: Always create the VDO device on top of a partition that is not using the whole disk or another device that can grow in size to prevent data loss.
If you plan to use VDO in production make sure you have a proper monitoring in place that alerts quite ahead of time to be able to take corrective action.
Nevertheless: Its cool stuff and I’m sure the current situation will be fixed soon.
# vdo create –name=vdo1 –device=/dev/vdb –vdoLogicalSize=1T
Then you can extend the disk online because you can omit the fdisk when extending .
Hi,
Do you still think that vdo solution is not intented to put into production servers? Have the problems (mainly the data-loss ones!) been fixed? Any other option for deduplicate data under linux? (btrfs, zfs, etc.)
Regards