Leveraging Network-Bound Disk Encryption at Enterprise Scale

Tang and Clevis

Tang and Clevis

Network-Bound Disk Encryption (NBDE) adds scaling to LUKS by automated disk unlocking on system startup.

Why should I encrypt disks? If you dont want to see your corporate and private data leaked, you should do so as an additional security measure.

Use cases

There are basically two use cases for disk encryption. The first one is to prevent data leaks when a device gets stolen or lost (mobile computers, unsecured server rooms etc.). Theft of devices is usually not a threat for enterprise grade data centers with physical security.

Here comes the second use case for this enterprise grade data centers: At some point in time, disks will get disposed, either because of a defect or they get outdated technology wise. That means a data leak is possible at the end of the disks life cycle. A defect disk can not be wiped at all. For someone with deep pockets, there is still a chance to at least partially access the data. Wiping six TiB disks takes many hours just for overwriting them with zeros, not even with random data. An encrypted disk without a passphrase set can just simply get disposed without considering if it needs to be wiped or physically destroyed.

Note: Disk encryption does not help protecting you from a data theft by a person having access to the data, it also does not help against misbehaving software.

As you can imagine, it is a good idea to encrypt your storage. The standard for disk encryption in Linux is LUKS (Linux Unified Key Setup).

Adding Tang and Clevis for scaling

Unfortunately LUKS does not scale at all, because the passphrase must be entered manually on system startup, a no-go for data center operations. Tang and Clevis adds the scaling factor to the game.

Tang is the server component, Clevis and LUKS-meta the client component. The secret itself is stored on the client, the client asks the server for the data needed for the decryption of the key stored in the LUKS meta data. For more information on the crypto algorithms used, please see the Slide Deck “Tang and Clevis” by Fraser Tweedale

Availability and support

Tang and clevis have been added to RHEL 7.4 and are supported. The packages tang-nagios and clevis-udisk2 are in technical preview phase and are not supported. The packages are included in the base subscription.

It is also available for Fedora as well.

Set up the Tang servers

Setting up a Tang server is straight forward. For redundancy, please set up at least two Tang servers, a maximum of seven Tang Servers are supported by the client, which corresponds to the number of LUKS slots (eight) minus the one used for the initial passphrase.

[root@tang1 ~]# yum -y install tang
[root@tang1 ~]# systemctl enable --now tangd.socket
[root@tang1 ~]# jose jwk gen -i '{"alg":"ES512"}' -o /var/db/tang/new_sig.jwk
[root@tang1 ~]# jose jwk gen -i '{"alg":"ECMR"}' -o /var/db/tang/new_exc.jwk

Display the Thumbprint to be added to the Kickstart later on.

[root@tang1 ~]# jose jwk thp -i /var/db/tang/new_sig.jwk

Automated client setup during Kickstart

Be aware that you can run into problems when re-provisioning a system that contains old LUKS keys. You probably want to wipe them. In the following setup, all the slots are located on the second partition.

# Wipe LUKS keys on the second partition of disk vda
%pre
cryptsetup isLuks /dev/vda2  && dd if=/dev/zero of=/dev/vda2 bs=512 count=2097152
%end

part /boot      --fstype ext2 --size=512 --ondisk=vda
part pv.0       --size=1 --grow --ondisk=vda --encrypted --passphrase=dummy-master-pass

volgroup vg_luksclient pv.0

logvol /        --name=lv_root    --vgname=vg_luksclient --size=4096
logvol /home    --name=lv_home    --vgname=vg_luksclient --size=512 --fsoption=nosuid,nodev
logvol /tmp     --name=lv_tmp    --vgname=vg_luksclient --size=512 --fsoption=nosuid,nodev,noexec
logvol /var     --name=lv_var    --vgname=vg_luksclient --size=2048 --fsoption=nosuid,nodev
logvol /var/log --name=lv_var_log --vgname=vg_luksclient --size=2048 --fsoption=nosuid,nodev
logvol swap     --fstype swap --name=lv_swap    --vgname=vg_luksclient --size=4096

Be aware that the transfer of the Kickstart file will be done in clear text, that means that this dummy-master-pass is exposed. It should be automatically removed. You can add a master key via a secure way after the installation with Ansible, Puppet or simply manually via SSH.

Ensure you have the clevis-dracut package installed so that the init ramdisk will get created in the right way.

%packages
clevis-dracut
%end

In the %post section of the Kickstart file, add the following to register your system to the Tang servers.

%post
clevis bind luks -f -k- -d /dev/vda2 tang '{"url":"http://tang1.example.com","thp":"vkaGTzcBNEeF_X5KX-w9754Gl80"}' <<< "dummy-master-pass"
clevis bind luks -f -k- -d /dev/vda2 tang '{"url":"http://tang2.example.com","thp":"x_KcDG92bVP3SUL9KOzmzps4sZg"}' <<< "dummy-master-pass"
%end

In case you want to remove the master password, put the following line into your %post section of the Kickstart file:

%post
cryptsetup luksRemoveKey /dev/vda2 - <<<"dummy-master-pass"
%end

Usage of a passphrase

There are pros and cons about doing so. On one hand, if all Tang servers are unavailable, there is not a slight chance to access the data if there is no master password set. On the other hand, a master password can be leaked and it should be changed from time to time which needs to be automated (i.e. with Ansible) to scale.

I personally tend to use a master password. Choose wisely depending on your specific use case if you set a master password or not.

Good to know

Be aware that the password prompt on system startup will always show up. It disappears automatically after a few seconds if a Tang server have been reached.

Documentation

The following documents helps you further to get an idea about the Tang/Clevis setup:

A nice presentation from a conference is available here: https://www.usenix.org/conference/lisa16/conference-program/presentation/atkisson

Another more technical presentation is available here: http://redhat.slides.com/npmccallum/sad#/

Important commands

There are a few LUKS and clevis related commands you should know about.

cryptsetup

Cryptsetup is used to handle the LUKS slots, adding and removal of passphrases. More information is available in man 8 cryptsetup

luksmeta

luksmeta gives you access to the meta data of LUKS. I.e. showing which slots are in use:

[root@luksclient ~]# luksmeta show -d /dev/vda2 
0   active empty
1   active cb6e8904-81ff-40da-a84a-07ab9ab5715e
2   active cb6e8904-81ff-40da-a84a-07ab9ab5715e
3   active cb6e8904-81ff-40da-a84a-07ab9ab5715e
4 inactive empty
5 inactive empty
6 inactive empty
7 inactive empty
[root@luksclient ~]#

The following command is reading the meta data and put the encrypted content to the file meta

luksmeta load -d /dev/vda2 -s 1  > meta

It looks like this:

eyJhbGciOiJFQ0RILUVTIiwiY2xldmlzIjp7InBpbiI6InRhbmciLCJ0YW5nIjp7ImFkdiI6eyJrZXlzIjpbeyJhbGciOiJFUzUxMiIsImNydiI6IlAtNTIxIiwia2V5X29wcyI6WyJ2ZXJpZnkiXSwia3R5IjoiRUMiLCJ4IjoiQVdCeFZSYk9MOXBYNjhRU0lqSEZyNzVuNUVXdDZGblkySmNaNVgxX0s4MldaNW9kMUNQTUJwQ0dsS1ZFZ29LOFQwMERPazFsMHJRQ2kyOEg4SDBsVXlfaCIsInkiOiJBTzNLdmsyc2pqYVpSM3RrbW5KcVQyWGYtd1lnbXZSa0JqNUpmNFgzWmtHTDRHbTYtbE5qemhzVEdraEZLRmZUdnJLUElDTHBSQndCTnNXc0JuZUlVTEViIn0seyJhbGciOiJFQ01SIiwiY3J2IjoiUC01MjEiLCJrZXlfb3BzIjpbImRlcml2ZUtleSJdLCJrdHkiOiJFQyIsIngiOiJBTm05WmQwUDFyT1F6MXhhQVFNTzJxRjRua3ZHMVpKS2VHNkFaWjdPTEo1ejhKS000N0otMUhZWnkyZk0zT29ZQVdiUndQdnJ6aUt4MFJmNWh0QlkzNXBxIiwieSI6IkFTdVZYR3JRQ0c4R3dKTENXbWpVbC1jN0llUUh4TC01cFRGYTJaOU1ESnU4Ym9JZFo3WlNiZHBHZUFWMnhMTzlCTnlqbE5zSzB2ZWJrR3ZDcmU5bDl0aFYifSx7ImFsZyI6IkVTNTEyIiwiY3J2IjoiUC01MjEiLCJrZXlfb3BzIjpbInZlcmlmeSJdLCJrdHkiOiJFQyIsIngiOiJBT01Yc2JqcGd3MUVwMEdmSHd2NFRGTzFhWlNxZlY2NWlURVpWWDc4Y3M1SkI2dlBHaUZwd2RiZnpVWlpzR2FCZVludXdzTHJ0UTZTYm0zWHVTdGNHTFlFIiwieSI6IkFSZUZMckNHZnB6S2tzaTZvVXdLdEZjOW9IbHVHdDJjd3AwNmR1M3dEUGgta0t5N3RfTmZEU1JOSXRuWkZIbWs3eVlwSnkxYlpiRDRUTklTcXFIZDlDbDIifV19LCJ1cmwiOiJodHRwOi8vdGFuZzEuaG9tZS5kZWxvdXcuY2gifX0sImVuYyI6IkEyNTZHQ00iLCJlcGsiOnsiY3J2IjoiUC01MjEiLCJrdHkiOiJFQyIsIngiOiJBQnlMWjZmcWJKVVdzalZVc1ZjN0hwWlhLQ1BIZjJWenhyTExkODdvajBERnhGeTJRUTJHSXNEbFJ6OTg0cmtkNDJVQ3pDVy1VcGE4bG9nTl9BT0hsU0syIiwieSI6IkFBcHJaLUxFMUk3NUxWMXZtTHhkYUl0TmlETnpjUUVpLXJsR1FwVjFnT2IwWU5rbDFyWVgxdU45OE9WcHdiWUowTEpYYnYtRGZnSjU2RjBPMkNFczJOck4ifSwia2lkIjoicTAzQXd4VG5sU3lpQjRnelBTYTBfcXhsVzU4In0..Ws5k2fgQ26yN-mMv.1NwlYoyaUmF5X0jqGDcKO3HWn02StXotqnjZKaZtSUXioyW0-rc8HxH6HkkJTMQJk_EXr8ZXB4hmTXfUqAtRqgpEW4SdzJIw_AsGbJm5h_8lQLPIF4o.fbbNxK51MC14hX46Dgkj6Q

You can decrypt it:

[root@luksclient ~]# clevis decrypt tang < meta 
OTQy6NGfqTjppwIrrM4cc15zr-sxy5PPmKExHul1m-pcMjEHjGdoN5uqD9vcEiuMM56VapPV_LedXYEkktYO-g[root@luksclient ~]#

OTQy6NGfqTjppwIrrM4cc15zr-sxy5PPmKExHul1m-pcMjEHjGdoN5uqD9vcEiuMM56VapPV_LedXYEkktYO-g is the cleartext passphrase returned. It actually can be used to type it in the console, I recommend a serial console where you can copy-paste 😉

If you run the same command again when both Tang servers are down, you will get an error:

[root@luksclient ~]# clevis decrypt tang < meta
Error communicating with the server!
[root@luksclient ~]#

As you can see, you don’t need to provide a Tang Server URL.

lsblk

Lsblk is a nice little tool which shows the available storage in a tree. You can see the different layers of the storage subsystem.

[root@luksclient ~]# lsblk 
NAME                                          MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
vda                                           252:0    0   20G  0 disk  
├─vda1                                        252:1    0  512M  0 part  /boot
└─vda2                                        252:2    0 19.5G  0 part  
  └─luks-f0a70f08-b745-429f-ba8e-ec07e8953c3d 253:0    0 19.5G  0 crypt 
    ├─vg_luksclient-lv_root                   253:1    0    4G  0 lvm   /
    ├─vg_luksclient-lv_swap                   253:2    0    4G  0 lvm   [SWAP]
    ├─vg_luksclient-lv_var_log                253:3    0    2G  0 lvm   /var/log
    ├─vg_luksclient-lv_var                    253:4    0    2G  0 lvm   /var
    ├─vg_luksclient-lv_tmp                    253:5    0  512M  0 lvm   /tmp
    └─vg_luksclient-lv_home                   253:6    0  512M  0 lvm   /home
[root@luksclient ~]# 

json_reformat

If you want to play with JSON, install the package yajl.

With json_reformat you can minimize JSON and you are required to do so as clevis encrypt sss does not allow spaces, it fails.

Lets reformat this:

[root@luksclient ~]# echo '{"t": 1,"pins": {"tang": [{"url": "http://tang1.example.com"}, {"url": "http://tang2.example.com"}]}}'|json_reformat -m && echo ""
{"t":1,"pins":{"tang":[{"url":"http://tang1.example.com"},{"url":"http://tang2.example.com"}]}}
[root@localhost ~]# 

How to figure out to which servers the client is enrolled

I was curious how clevis figures out what Tang server to connect to. There is nothing written to the initrd, that means it must be stored somewhere in the LUKS metadata. It was taking me some time to figure out how it works.

Just decode the meta data to JSON:

 luksmeta load -d /dev/vda2 -s 1|jose b64 dec -i- |json_reformat 

Unfortunately the JSON seems to be invalid, at least json_reformat brings an error parse error: premature EOF. However, you will see the URL.

Test scenarios

I made a few tests with to figure out how Tang and Clevis works when something is going south.

Tang server(s) not available during system installatioon

If only one Tang server is available, installation work, server gets enrolled to only one Tang server. The server must be enrolled to the second Tang server manually after it came up again.

If both servers are down during installation, the installation finished successful, the temporary passphrase is still active as LUKS will deny removing the last passphrase available. Of course, the LUKS metadata is not available. You can enroll the servers manually after one or both servers come back online. Remember to remove the temporary passphrase afterwards.

Tang Server(s) not available during reboot

If one Tang server is not available, the other one is used, no impact.

If both servers are down, Plymouth asks for the LUKS passphrase. If you removed the the passphrase, you will not be able to boot the server. After starting one or both Tang servers, boot continues.

Drawbacks

Tang and Clevis are both very young projects and not yet mature. I’ve figured out the following drawbacks:

Missing Registry

At the moment there is no way to report which servers are registered to what Tang server. This makes it hard to check from a central point if a server is really registered to two (or more) Tang servers to ensure smooth operation in the case of a failed Tang server.

This is particular true if one (or more) Tang server is down during install time of the client system. As a workaround, set up a monitoring script that checks if there are two active slots. I.e.

if [ $(luksmeta show -d /dev/vda2|grep " active"|grep -v empty|wc -l) -ne 2 ] || [ $? -eq 0 ]; then
        echo "Something is wrong with the LUKS metadata, please check"|mail -s "LUKS Metadata failure" monitoring@example.com
fi

Logging

Logging of Tang requests is very basic at the moment, some improvement is needed here as well. Again, the documentation for the return codes is lacking

Scalability

When using more than one Tang server, always that one defined in the first slot be be accessed. There is no round-robin or similar load-balancing method. This means that that the sequence of Tang Server must be shuffled on the client which involves some logic in the Kickstart file.

One Tang server should be able to handle more than 2k requests per second, so the problem only kicks in very large environments, where more than 2000 server are booting (or getting installed) at the same time.

Maturity

Its a brand new project using completely new ideas and methods. At the moment not much experience is there, an issue that will be solved over time.

Documentation

There is almost no documentation available which goes beyond a few lines to show how to set up the server and client. Whats missing is how to troubleshoot the environment. Another missing part is how to handle key rotation, its unclear for me if and what has to be done on the client.

Easy-to-read documentation is important, in particular for Tang and Clevis which is using some new style die-hard cryptographic mathematics.

Conclusion

Both, client and server have a very small footprint and are performing well. The idea of Tang and Clevis is brilliant and a first incarnation is ready to use. Due to the drawbacks mentioned above I think it is not yet ready for production and it will take a while until it is.

Due to the nature of the project, stability and reliability is a key point, that is why people should test it and provide feedback.

I would like to thank the involved engineers, cool stuff.

Have fun:-)

Building a virtual CEPH storage cluster

cephThis post will guide you trough the procedure to build up a testbed on RHEL7 for a complete CEPH cluster. At the end you will have an admin server, one monitoring node and three storage nodes. CEPH is a object and block storage mostly used for virtual machine images and bulk BLOBS such as video- and other media. It is not intended to be used as a file storage (yet).

Machine set up
I’ve set up five virtual machines, one admin and monitoring server and five OSD servers.

  • ceph-admin.example.com
  • ceph-mon01.example.com
  • ceph-osd01.example.com
  • ceph-osd02.example.com
  • ceph-osd03.example.com

Each of them have a disk for the OS of 10GB, the OSD servers additional 3x10GB disks for the storage, in total 90GB for the stroage. Each virtual machine got 1GB RAM assigned, which is barley good enough for some first tests.

Configure your network
While it is recommended to have two separate networks, one public and one for cluster interconnect (heartbeat, replication etc). However, for this testbed only one network is used.

While it is recommended practice to have your servers configured using the Fully qualified hostname (FQHN) you must also configure the short hostname for CEPH.

Check if this is working as needed:

[root@ceph-admin ~]# hostname
ceph-admin.example.com
[root@ceph-admin ~]# hostname -s
ceph-admin
[root@ceph-admin ~]# 

To be able to resolve the short hostname, edit your /etc/resolv.conf and enter a domain search path

[root@ceph-admin ~]# cat /etc/resolv.conf 
search example.com
nameserver 192.168.100.148
[root@ceph-admin ~]# 

Note: In my network, all is IPv6 enabled and I first tried to set CEPH up with all IPv6. I was unable to get it working properly with IPv6! Disable IPv6 before you start. Disclaimer: Maybe I made some mistakes.

You also need to keep time in sync. The usuage of NTP or chrony is best practice anyway.

Register and subscribe the machines and attach the repositories needed

This procedure needs to be repeated on every node, inlcuding the admin server and the monitoring node(s)

[root@ceph-admin ~]# subscription-manager register
[root@ceph-admin ~]# subscription-manager list --available > pools

Search the pools file for the Ceph subscription and attach the pool in question.

[root@ceph-admin ~]# subscription-manager attach --pool=<the-pool-id>

Disable all repositories and enable the needed ones

[root@ceph-admin ~]# subscription-manager repos --disable="*"
[root@ceph-admin ~]# subscription-manager repos --enable=rhel-7-server-rpms \
--enable=rhel-7-server-rhceph-1.2-calamari-rpms \
--enable=rhel-7-server-rhceph-1.2-installer-rpms \
--enable=rhel-7-server-rhceph-1.2-mon-rpms \
--enable=rhel-7-server-rhceph-1.2-osd-rpms

Set up a CEPH user
Of course, you should set a secure password instead of this example 😉

[root@ceph-admin ~]# useradd -d /home/ceph -m -p $(openssl passwd -1 <super-secret-password>) ceph

Creating the sudoers rule for the ceph user

[root@ceph-admin ~]# echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph
[root@ceph-admin ~]# chmod 0440 /etc/sudoers.d/ceph

Setting up passwordless SSH logins. First create a ssh key for root. Do not set a pass phrase!

[root@ceph-admin ~]# ssh-keygen -t rsa -N "" -f /root/.ssh/id_rsa

And add the key to ~/.ssh/authorized_keys of the ceph user on the other nodes.

[root@ceph-admin ~]# ssh-copy-id ceph@ceph-mon01
[root@ceph-admin ~]# ssh-copy-id ceph@ceph-osd01
[root@ceph-admin ~]# ssh-copy-id ceph@ceph-osd02
[root@ceph-admin ~]# ssh-copy-id ceph@ceph-osd03

Configure your ssh configuration.

To make your life easier (not providing –username ceph) when you run ceph-deploy) set up the ssh client config file. This can be done for the user root in ~/.ssh/config or in /etc/ssh/ssh_config.

Host ceph-mon01
     Hostname ceph-mon01
     User ceph

Host ceph-osd01
     Hostname ceph-osd01
     User ceph

Host ceph-osd02
     Hostname ceph-osd02
     User ceph

Host ceph-osd03
     Hostname ceph-osd03
     User ceph

Set up the admin server

Go to https://access.redhat.com and download the ISO image. Copy the image to your admin server and mount it loop-back.

[root@ceph-admin ~]# mount rhceph-1.2.3-rhel-7-x86_64.iso /mnt -o loop

Copy the required product certificated to /etc/pki/product

[root@ceph-admin ~]# cp /mnt/RHCeph-Calamari-1.2-x86_64-c1e8ca3b6c57-285.pem /etc/pki/product/285.pem
[root@ceph-admin ~]# cp /mnt/RHCeph-Installer-1.2-x86_64-8ad6befe003d-281.pem /etc/pki/product/281.pem
[root@ceph-admin ~]# cp /mnt/RHCeph-MON-1.2-x86_64-d8afd76a547b-286.pem /etc/pki/product/286.pem
[root@ceph-admin ~]# cp /mnt/RHCeph-OSD-1.2-x86_64-25019bf09fe9-288.pem /etc/pki/product/288.pem

Install the setup files

[root@ceph-admin ~]# yum install /mnt/ice_setup-*.rpm

Set up a config directory:

[root@ceph-admin ~]# mkdir ~/ceph-config
[root@ceph-admin ~]# cd ~/ceph-config

and run the installer

[root@ceph-admin ~]# ice_setup -d /mnt

To initilize, run calamari-ctl

[root@ceph-admin ceph-config]# calamari-ctl initialize
[INFO] Loading configuration..
[INFO] Starting/enabling salt...
[INFO] Starting/enabling postgres...
[INFO] Initializing database...
[INFO] Initializing web interface...
[INFO] You will now be prompted for login details for the administrative user account.  This is the account you will use to log into the web interface once setup is complete.
Username (leave blank to use 'root'): 
Email address: luc@example.com
Password: 
Password (again): 
Superuser created successfully.
[INFO] Starting/enabling services...
[INFO] Restarting services...
[INFO] Complete.
[root@ceph-admin ceph-config]#

Create the cluster

Ensure you are running the following command in the config directory! In this example it is ~/ceph-config.

[root@ceph-admin ceph-config]# ceph-deploy new ceph-mon01

Edit some settings in ceph.conf

osd_journal_size = 1000
osd_pool_default_size = 3
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 128
osd_pool_default_pgp_num = 128

In production, the first value should be bigger, at least 10G. The number of placement groups is according the number of your cluster members, the OSD servers. For small clusters up to 5, 128 pgs are fine.

Install the CEPH software on the nodes.

[root@ceph-admin ceph-config]# ceph-deploy install ceph-admin ceph-mon01 ceph-osd01 ceph-osd02 ceph-osd03

Adding the initual monitor server

[root@ceph-admin ceph-config]# ceph-deploy mon create-initial

Connect the all nodes server to calamari:

[root@ceph-admin ceph-config]# ceph-deploy calamari connect ceph-mon01 ceph-osd01 ceph-osd02 ceph-osd03 ceph-admin

Make your admin server being an admin server

[root@ceph-admin ceph-config]# yum -y install ceph ceph-common
[root@ceph-admin ceph-config]# ceph-deploy admin ceph-mon01 ceph-osd01 ceph-osd02 ceph-osd03 ceph-admin

Purge and add your data disks:

[root@ceph-admin ceph-config]# ceph-deploy disk zap ceph-osd01:vdb
[root@ceph-admin ceph-config]# ceph-deploy disk zap ceph-osd01:vdc
[root@ceph-admin ceph-config]# ceph-deploy disk zap ceph-osd01:vdd
[root@ceph-admin ceph-config]# ceph-deploy disk zap ceph-osd02:vdb
[root@ceph-admin ceph-config]# ceph-deploy disk zap ceph-osd02:vdc
[root@ceph-admin ceph-config]# ceph-deploy disk zap ceph-osd02:vdd
[root@ceph-admin ceph-config]# ceph-deploy disk zap ceph-osd01:vdb
[root@ceph-admin ceph-config]# ceph-deploy disk zap ceph-osd02:vdc
[root@ceph-admin ceph-config]# ceph-deploy disk zap ceph-osd03:vdd

[root@ceph-admin ceph-config]# ceph-deploy osd create ceph-osd01:vdb
[root@ceph-admin ceph-config]# ceph-deploy osd create ceph-osd01:vdc
[root@ceph-admin ceph-config]# ceph-deploy osd create ceph-osd01:vdd
[root@ceph-admin ceph-config]# ceph-deploy osd create ceph-osd02:vdb
[root@ceph-admin ceph-config]# ceph-deploy osd create ceph-osd02:vdc
[root@ceph-admin ceph-config]# ceph-deploy osd create ceph-osd02:vdd
[root@ceph-admin ceph-config]# ceph-deploy osd create ceph-osd03:vdb
[root@ceph-admin ceph-config]# ceph-deploy osd create ceph-osd03:vdc
[root@ceph-admin ceph-config]# ceph-deploy osd create ceph-osd03:vdd

You now can check the health of your cluster:

[root@ceph-admin ceph-config]# ceph health
HEALTH_OK
[root@ceph-admin ceph-config]# 

Or with some more information:

[root@ceph-admin ceph-config]# ceph status
    cluster 117bf1bc-04fd-4ae1-8360-8982dd38d6f2
     health HEALTH_OK
     monmap e1: 1 mons at {ceph-mon01=192.168.100.150:6789/0}, election epoch 2, quorum 0 ceph-mon01
     osdmap e42: 9 osds: 9 up, 9 in
      pgmap v73: 192 pgs, 3 pools, 0 bytes data, 0 objects
            318 MB used, 82742 MB / 83060 MB avail
                 192 active+clean
[root@ceph-admin ceph-config]# 

Whats next?
A storage is worthless if not used. A follow-up post will guide you trough how to use CEPH as storage for libvirt.

Further reading

Creating and managing iSCSI targets

If you want to create and manage iSCSI targets with Fedora or RHEL, you stumble upon tgtd and tgtadm. This tools are easy to use but have some obstacles to take care of. This is a quick guide on how to use tgtd and tgtadm.

iSCSI terminology
In the iSCSI world, we not taking about server and client, but iSCSI-Targets, which is the server and iSCSI-Initiators which are the clients

Install the tool set
It is just one package to install, afterwards enable the service:

target:~# yum install scsi-target-utils
target:~# chkconfig tgtd on
target:~# service tgtd start

Or Systemd style:

target:~# systemctl start tgtd.service
target:~# systemctl enable tgtd.service

Online configuration vs. configuration file
There are basically two ways of configuring iSCSI targets:

  • Online configuration with tgtadm, changes are getting available instantly, but not consistent over reboots
  • Configuration files. Changes are presistent, but not instantly available

Well, there is the dump parameter for tgtadm but i.e. passwords are replaced with “PLEASE_CORRECT_THE_PASSWORD” which makes tgtadm completely useless if you are using CHAP authentication.

If you do not use CHAP authentication and use IP based ACLs instead, tgtadm can help you, just dump the config to /etc/tgt/conf.d

Usage of tgtadm

After you have created the storage such as a logical volume (used in this example), a partition or even a file, you can add the first target:

target:~# tgtadm --lld iscsi --op new --mode target --tid 1 --targetname iqn.2013-07.com.example.storage.ssd1

Then you can add a LUN (logical Unit) to the target

target:~# tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 --backing-store /dev/vg_storage_ssd/lv_storage_ssd

It is always a good idea to restrict access to your iSCSI targets. There are two ways to do so: IP based and user (CHAP Authentication) based ACL.

In this example we first add two addresses and later on remove one of them again just as a demo

target:~# tgtadm --lld iscsi --mode target --op bind --tid 1 --initiator-address=192.168.0.106
target:~# tgtadm --lld iscsi --mode target --op bind --tid 1 --initiator-address=192.168.0.107

Go to both initiators where the IPs and check if the Targets are visible:

iscsiadm --mode discovery --type sendtargets --portal 192.168.0.1

Lets remove the ACL for the IP address 192.168.0.107

target:~# tgtadm --lld iscsi --mode target --op unbind --tid 1 --initiator-address=192.168.0.107

Test if the Target is still visible on the host with IP address 192.168.0.107, it is not anymore.

If you want to use CHAP authentication, please be aware that tgtadm –dump does not save password, so initiators will not be able to login after a restart of the tgtd.

To add a new user:

target:~# tgtadm --lld iscsi --op new --mode account --user iscsi-user --password secret

And add the ACL to the target:

target:~# tgtadm --lld iscsi --op bind --mode account --tid 2 --user iscsi-user

To remove an account for the target:

target:~# tgtadm --lld iscsi --op unbind --mode account --tid 2 --user iscsi-user

As a wrote further above, configurations done by tgtadm are not persistent over reboot or restart of tgtd. For basic configurations as descibed above, the dump parameter is working fine. As configuration files in /etc/tgt/conf.d/ are automatically included, you just dump the config into a separate file.

target:~# tgt-admin --dump |grep -v default-driver > /etc/tgt/conf.d/my-targets.conf

The other way round
If you are using more sophisticated configuration, you probably want to manage your iSCSI configration the other way round.

You can edit your configuration file(s) in /etc/tgt/conf.d and invoke tgt-admin with the respective parameters to update the config instantly.

tgt-admin (not to be mistaken as tgtadm) is a perl script which basically parses /etc/tgt/targets.conf and updates the targets by invoking tgtadm.

To update your Target(s) issue:

tgt-admin --update ALL --force

For all your targets, incl. active ones (–force) or

tgt-admin --update --tid=1 --force

For updating Target ID 1

SIGKILL is nasty but sometimes needed
tgtd can not be stopped as usual daemons, you need to do a sledgehammer operation and invoke kill -9 to the process followed by service tgtd start command.

How the start up and stop process is working in a proper workaround way is being teached by Systemd, have a look at /usr/lib/systemd/system/tgtd.service which does not actually stop tgtd but just removes the targets.

Conclusion
tgtadm can be help- and sometimes harmful. Carefully consider what is the better way for you, creating config files with tgtadm or update the configuration files and activate them with tgt-admin.

Confused about write barriers on file systems…

As ext3 is already known as a very robust file system why is the default mount option still barrier=0? The problem is LVM and the device mapper. They  do not support barriers.

When mounting ext3 on a LV, the option barrier=1 it should be ignored and a warning written. So far so good. Trying this brings a lot of confusion. According to a Red Hat bugzilla entry one should get a warning, but no signs of that neither in /var/log/messages nor dmesg output. Even more confusing is the output of mount , it shows the LV is mounted with the barrier=1 option.

The conclusion is: Enable write barriers on physical disk-partitions brings a plus of reliability to your files system, on LVM setups it should better be disabled for the moment.

Is this fun? No…