Leveraging Network-Bound Disk Encryption at Enterprise Scale

Tang and Clevis
Tang and Clevis

Network-Bound Disk Encryption (NBDE) adds scaling to LUKS by automated disk unlocking on system startup.

Why should I encrypt disks? If you dont want to see your corporate and private data leaked, you should do so as an additional security measure.

Use cases

There are basically two use cases for disk encryption. The first one is to prevent data leaks when a device gets stolen or lost (mobile computers, unsecured server rooms etc.). Theft of devices is usually not a threat for enterprise grade data centers with physical security.

Here comes the second use case for this enterprise grade data centers: At some point in time, disks will get disposed, either because of a defect or they get outdated technology wise. That means a data leak is possible at the end of the disks life cycle. A defect disk can not be wiped at all. For someone with deep pockets, there is still a chance to at least partially access the data. Wiping six TiB disks takes many hours just for overwriting them with zeros, not even with random data. An encrypted disk without a passphrase set can just simply get disposed without considering if it needs to be wiped or physically destroyed.

Note: Disk encryption does not help protecting you from a data theft by a person having access to the data, it also does not help against misbehaving software.

As you can imagine, it is a good idea to encrypt your storage. The standard for disk encryption in Linux is LUKS (Linux Unified Key Setup).

Adding Tang and Clevis for scaling

Unfortunately LUKS does not scale at all, because the passphrase must be entered manually on system startup, a no-go for data center operations. Tang and Clevis adds the scaling factor to the game.

Tang is the server component, Clevis and LUKS-meta the client component. The secret itself is stored on the client, the client asks the server for the data needed for the decryption of the key stored in the LUKS meta data. For more information on the crypto algorithms used, please see the Slide Deck “Tang and Clevis” by Fraser Tweedale

Availability and support

Tang and clevis have been added to RHEL 7.4 and are supported. The packages tang-nagios and clevis-udisk2 are in technical preview phase and are not supported. The packages are included in the base subscription.

It is also available for Fedora as well.

Set up the Tang servers

Setting up a Tang server is straight forward. For redundancy, please set up at least two Tang servers, a maximum of seven Tang Servers are supported by the client, which corresponds to the number of LUKS slots (eight) minus the one used for the initial passphrase.

[root@tang1 ~]# yum -y install tang
[root@tang1 ~]# systemctl enable --now tangd.socket
[root@tang1 ~]# jose jwk gen -i '{"alg":"ES512"}' -o /var/db/tang/new_sig.jwk
[root@tang1 ~]# jose jwk gen -i '{"alg":"ECMR"}' -o /var/db/tang/new_exc.jwk

Display the Thumbprint to be added to the Kickstart later on.

[root@tang1 ~]# jose jwk thp -i /var/db/tang/new_sig.jwk

Automated client setup during Kickstart

Be aware that you can run into problems when re-provisioning a system that contains old LUKS keys. You probably want to wipe them. In the following setup, all the slots are located on the second partition.

# Wipe LUKS keys on the second partition of disk vda
%pre
cryptsetup isLuks /dev/vda2  && dd if=/dev/zero of=/dev/vda2 bs=512 count=2097152
%end

part /boot      --fstype ext2 --size=512 --ondisk=vda
part pv.0       --size=1 --grow --ondisk=vda --encrypted --passphrase=dummy-master-pass

volgroup vg_luksclient pv.0

logvol /        --name=lv_root    --vgname=vg_luksclient --size=4096
logvol /home    --name=lv_home    --vgname=vg_luksclient --size=512 --fsoption=nosuid,nodev
logvol /tmp     --name=lv_tmp    --vgname=vg_luksclient --size=512 --fsoption=nosuid,nodev,noexec
logvol /var     --name=lv_var    --vgname=vg_luksclient --size=2048 --fsoption=nosuid,nodev
logvol /var/log --name=lv_var_log --vgname=vg_luksclient --size=2048 --fsoption=nosuid,nodev
logvol swap     --fstype swap --name=lv_swap    --vgname=vg_luksclient --size=4096

Be aware that the transfer of the Kickstart file will be done in clear text, that means that this dummy-master-pass is exposed. It should be automatically removed. You can add a master key via a secure way after the installation with Ansible, Puppet or simply manually via SSH.

Ensure you have the clevis-dracut package installed so that the init ramdisk will get created in the right way.

%packages
clevis-dracut
%end

In the %post section of the Kickstart file, add the following to register your system to the Tang servers.

%post
clevis bind luks -f -k- -d /dev/vda2 tang '{"url":"http://tang1.example.com","thp":"vkaGTzcBNEeF_X5KX-w9754Gl80"}' <<< "dummy-master-pass"
clevis bind luks -f -k- -d /dev/vda2 tang '{"url":"http://tang2.example.com","thp":"x_KcDG92bVP3SUL9KOzmzps4sZg"}' <<< "dummy-master-pass"
%end

In case you want to remove the master password, put the following line into your %post section of the Kickstart file:

%post
cryptsetup luksRemoveKey /dev/vda2 - <<<"dummy-master-pass"
%end

Usage of a passphrase

There are pros and cons about doing so. On one hand, if all Tang servers are unavailable, there is not a slight chance to access the data if there is no master password set. On the other hand, a master password can be leaked and it should be changed from time to time which needs to be automated (i.e. with Ansible) to scale.

I personally tend to use a master password. Choose wisely depending on your specific use case if you set a master password or not.

Good to know

Be aware that the password prompt on system startup will always show up. It disappears automatically after a few seconds if a Tang server have been reached.

Documentation

The following documents helps you further to get an idea about the Tang/Clevis setup:

A nice presentation from a conference is available here: https://www.usenix.org/conference/lisa16/conference-program/presentation/atkisson

Another more technical presentation is available here: http://redhat.slides.com/npmccallum/sad#/

Important commands

There are a few LUKS and clevis related commands you should know about.

cryptsetup

Cryptsetup is used to handle the LUKS slots, adding and removal of passphrases. More information is available in man 8 cryptsetup

luksmeta

luksmeta gives you access to the meta data of LUKS. I.e. showing which slots are in use:

[root@luksclient ~]# luksmeta show -d /dev/vda2 
0   active empty
1   active cb6e8904-81ff-40da-a84a-07ab9ab5715e
2   active cb6e8904-81ff-40da-a84a-07ab9ab5715e
3   active cb6e8904-81ff-40da-a84a-07ab9ab5715e
4 inactive empty
5 inactive empty
6 inactive empty
7 inactive empty
[root@luksclient ~]#

The following command is reading the meta data and put the encrypted content to the file meta

luksmeta load -d /dev/vda2 -s 1  > meta

It looks like this:

eyJhbGciOiJFQ0RILUVTIiwiY2xldmlzIjp7InBpbiI6InRhbmciLCJ0YW5nIjp7ImFkdiI6eyJrZXlzIjpbeyJhbGciOiJFUzUxMiIsImNydiI6IlAtNTIxIiwia2V5X29wcyI6WyJ2ZXJpZnkiXSwia3R5IjoiRUMiLCJ4IjoiQVdCeFZSYk9MOXBYNjhRU0lqSEZyNzVuNUVXdDZGblkySmNaNVgxX0s4MldaNW9kMUNQTUJwQ0dsS1ZFZ29LOFQwMERPazFsMHJRQ2kyOEg4SDBsVXlfaCIsInkiOiJBTzNLdmsyc2pqYVpSM3RrbW5KcVQyWGYtd1lnbXZSa0JqNUpmNFgzWmtHTDRHbTYtbE5qemhzVEdraEZLRmZUdnJLUElDTHBSQndCTnNXc0JuZUlVTEViIn0seyJhbGciOiJFQ01SIiwiY3J2IjoiUC01MjEiLCJrZXlfb3BzIjpbImRlcml2ZUtleSJdLCJrdHkiOiJFQyIsIngiOiJBTm05WmQwUDFyT1F6MXhhQVFNTzJxRjRua3ZHMVpKS2VHNkFaWjdPTEo1ejhKS000N0otMUhZWnkyZk0zT29ZQVdiUndQdnJ6aUt4MFJmNWh0QlkzNXBxIiwieSI6IkFTdVZYR3JRQ0c4R3dKTENXbWpVbC1jN0llUUh4TC01cFRGYTJaOU1ESnU4Ym9JZFo3WlNiZHBHZUFWMnhMTzlCTnlqbE5zSzB2ZWJrR3ZDcmU5bDl0aFYifSx7ImFsZyI6IkVTNTEyIiwiY3J2IjoiUC01MjEiLCJrZXlfb3BzIjpbInZlcmlmeSJdLCJrdHkiOiJFQyIsIngiOiJBT01Yc2JqcGd3MUVwMEdmSHd2NFRGTzFhWlNxZlY2NWlURVpWWDc4Y3M1SkI2dlBHaUZwd2RiZnpVWlpzR2FCZVludXdzTHJ0UTZTYm0zWHVTdGNHTFlFIiwieSI6IkFSZUZMckNHZnB6S2tzaTZvVXdLdEZjOW9IbHVHdDJjd3AwNmR1M3dEUGgta0t5N3RfTmZEU1JOSXRuWkZIbWs3eVlwSnkxYlpiRDRUTklTcXFIZDlDbDIifV19LCJ1cmwiOiJodHRwOi8vdGFuZzEuaG9tZS5kZWxvdXcuY2gifX0sImVuYyI6IkEyNTZHQ00iLCJlcGsiOnsiY3J2IjoiUC01MjEiLCJrdHkiOiJFQyIsIngiOiJBQnlMWjZmcWJKVVdzalZVc1ZjN0hwWlhLQ1BIZjJWenhyTExkODdvajBERnhGeTJRUTJHSXNEbFJ6OTg0cmtkNDJVQ3pDVy1VcGE4bG9nTl9BT0hsU0syIiwieSI6IkFBcHJaLUxFMUk3NUxWMXZtTHhkYUl0TmlETnpjUUVpLXJsR1FwVjFnT2IwWU5rbDFyWVgxdU45OE9WcHdiWUowTEpYYnYtRGZnSjU2RjBPMkNFczJOck4ifSwia2lkIjoicTAzQXd4VG5sU3lpQjRnelBTYTBfcXhsVzU4In0..Ws5k2fgQ26yN-mMv.1NwlYoyaUmF5X0jqGDcKO3HWn02StXotqnjZKaZtSUXioyW0-rc8HxH6HkkJTMQJk_EXr8ZXB4hmTXfUqAtRqgpEW4SdzJIw_AsGbJm5h_8lQLPIF4o.fbbNxK51MC14hX46Dgkj6Q

You can decrypt it:

[root@luksclient ~]# clevis decrypt tang < meta 
OTQy6NGfqTjppwIrrM4cc15zr-sxy5PPmKExHul1m-pcMjEHjGdoN5uqD9vcEiuMM56VapPV_LedXYEkktYO-g[root@luksclient ~]#

OTQy6NGfqTjppwIrrM4cc15zr-sxy5PPmKExHul1m-pcMjEHjGdoN5uqD9vcEiuMM56VapPV_LedXYEkktYO-g is the cleartext passphrase returned. It actually can be used to type it in the console, I recommend a serial console where you can copy-paste 😉

If you run the same command again when both Tang servers are down, you will get an error:

[root@luksclient ~]# clevis decrypt tang < meta
Error communicating with the server!
[root@luksclient ~]#

As you can see, you don’t need to provide a Tang Server URL.

lsblk

Lsblk is a nice little tool which shows the available storage in a tree. You can see the different layers of the storage subsystem.

[root@luksclient ~]# lsblk 
NAME                                          MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
vda                                           252:0    0   20G  0 disk  
├─vda1                                        252:1    0  512M  0 part  /boot
└─vda2                                        252:2    0 19.5G  0 part  
  └─luks-f0a70f08-b745-429f-ba8e-ec07e8953c3d 253:0    0 19.5G  0 crypt 
    ├─vg_luksclient-lv_root                   253:1    0    4G  0 lvm   /
    ├─vg_luksclient-lv_swap                   253:2    0    4G  0 lvm   [SWAP]
    ├─vg_luksclient-lv_var_log                253:3    0    2G  0 lvm   /var/log
    ├─vg_luksclient-lv_var                    253:4    0    2G  0 lvm   /var
    ├─vg_luksclient-lv_tmp                    253:5    0  512M  0 lvm   /tmp
    └─vg_luksclient-lv_home                   253:6    0  512M  0 lvm   /home
[root@luksclient ~]# 

json_reformat

If you want to play with JSON, install the package yajl.

With json_reformat you can minimize JSON and you are required to do so as clevis encrypt sss does not allow spaces, it fails.

Lets reformat this:

[root@luksclient ~]# echo '{"t": 1,"pins": {"tang": [{"url": "http://tang1.example.com"}, {"url": "http://tang2.example.com"}]}}'|json_reformat -m && echo ""
{"t":1,"pins":{"tang":[{"url":"http://tang1.example.com"},{"url":"http://tang2.example.com"}]}}
[root@localhost ~]# 

How to figure out to which servers the client is enrolled

I was curious how clevis figures out what Tang server to connect to. There is nothing written to the initrd, that means it must be stored somewhere in the LUKS metadata. It was taking me some time to figure out how it works.

Just decode the meta data to JSON:

 luksmeta load -d /dev/vda2 -s 1|jose b64 dec -i- |json_reformat 

Unfortunately the JSON seems to be invalid, at least json_reformat brings an error parse error: premature EOF. However, you will see the URL.

Test scenarios

I made a few tests with to figure out how Tang and Clevis works when something is going south.

Tang server(s) not available during system installatioon

If only one Tang server is available, installation work, server gets enrolled to only one Tang server. The server must be enrolled to the second Tang server manually after it came up again.

If both servers are down during installation, the installation finished successful, the temporary passphrase is still active as LUKS will deny removing the last passphrase available. Of course, the LUKS metadata is not available. You can enroll the servers manually after one or both servers come back online. Remember to remove the temporary passphrase afterwards.

Tang Server(s) not available during reboot

If one Tang server is not available, the other one is used, no impact.

If both servers are down, Plymouth asks for the LUKS passphrase. If you removed the the passphrase, you will not be able to boot the server. After starting one or both Tang servers, boot continues.

Drawbacks

Tang and Clevis are both very young projects and not yet mature. I’ve figured out the following drawbacks:

Missing Registry

At the moment there is no way to report which servers are registered to what Tang server. This makes it hard to check from a central point if a server is really registered to two (or more) Tang servers to ensure smooth operation in the case of a failed Tang server.

This is particular true if one (or more) Tang server is down during install time of the client system. As a workaround, set up a monitoring script that checks if there are two active slots. I.e.

if [ $(luksmeta show -d /dev/vda2|grep " active"|grep -v empty|wc -l) -ne 2 ] || [ $? -eq 0 ]; then
        echo "Something is wrong with the LUKS metadata, please check"|mail -s "LUKS Metadata failure" monitoring@example.com
fi

Logging

Logging of Tang requests is very basic at the moment, some improvement is needed here as well. Again, the documentation for the return codes is lacking

Scalability

When using more than one Tang server, always that one defined in the first slot be be accessed. There is no round-robin or similar load-balancing method. This means that that the sequence of Tang Server must be shuffled on the client which involves some logic in the Kickstart file.

One Tang server should be able to handle more than 2k requests per second, so the problem only kicks in very large environments, where more than 2000 server are booting (or getting installed) at the same time.

Maturity

Its a brand new project using completely new ideas and methods. At the moment not much experience is there, an issue that will be solved over time.

Documentation

There is almost no documentation available which goes beyond a few lines to show how to set up the server and client. Whats missing is how to troubleshoot the environment. Another missing part is how to handle key rotation, its unclear for me if and what has to be done on the client.

Easy-to-read documentation is important, in particular for Tang and Clevis which is using some new style die-hard cryptographic mathematics.

Conclusion

Both, client and server have a very small footprint and are performing well. The idea of Tang and Clevis is brilliant and a first incarnation is ready to use. Due to the drawbacks mentioned above I think it is not yet ready for production and it will take a while until it is.

Due to the nature of the project, stability and reliability is a key point, that is why people should test it and provide feedback.

I would like to thank the involved engineers, cool stuff.

Have fun:-)

7 thoughts on “Leveraging Network-Bound Disk Encryption at Enterprise Scale

  1. Nathaniel McCallum says:

    Thanks for the write up. I’m glad you like it!

    Please allow me to make a few clarifications:

    1. Clevis and Tang *are* supported in RHEL 7.4 and are not tech preview. However, the GNOME unlocker and Nagios plugin are tech preview.

    2. There is no need to manually generate Tang’s keys. Just run systemctl enable… The keys will be generated automatically.

    3. Rather than registering two bindings (one per Tang server), you can use the SSS pin in Clevis with a threshold of 1. This eliminates your need to check for two slots afterward.

    4. The metadata stored in LUKSMeta is a JWE in Compact Serialization. It is not invalid, you’re just parsing it the wrong way. See for example the “jose jwe fmt” command to convert it to raw JSON. Alternatively, you can parse the fields manually following RFC 7516. See also the “jose fmt” command for handy JSON parsing.

    5. We don’t do central management because everyone already has central management of things like kickstart, puppet, ansible, etc. You should define your policy in whatever system you already use. We don’t want to reinvent the wheel here.

    6. If you are worried about one of the Tang servers being down during installation, then distribute the Tang servers’ advertisements in your central config. This is documented in the clevis-encrypt-tang man page. If you do this, provisioning happens entirely offline and doesn’t need to contact the server at all. See the adv parameter here: https://github.com/latchset/clevis/blob/master/doc/clevis-encrypt-tang.1.md

    7. We do have load balancing! You can use DNS round-robin or the Shamir’s Secret Sharing pin in clevis. See: https://github.com/latchset/clevis/blob/master/doc/clevis-encrypt-sss.1.md

    8. We do have documentation! The man pages are shipped in the RPMs and are also available online here:
    https://github.com/latchset/clevis/tree/master/doc
    https://github.com/latchset/tang/tree/master/doc
    https://github.com/latchset/luksmeta/blob/master/luksmeta.8.md

    9. Our next release will target non-removable volumes and various bits of polish. After that, we hope to land TPM2 support in Clevis.

    Of course, I fully admit this is a new project and may have some rough edges. We welcome everyone’s feedback! Please file issues at the above github projects. Patches are likewise welcome.

    • Luc de Louw says:

      Hi,

      Thanks for your input, very appreciated.

      1. Text corrected, I was probably reading to fast.

      2. Correct, but how to figure out which is the file to fetch the Thumbprint?

      3. and 7. How does clevis encrypt sss work together with clevis bind luks in an easy way? Additionally I did not getting secret sharing to work at all. My cfg looks like this: {“t”: 1,”pins”: {“tang”: [{“url”: “http://tang1.example.com”}, {“url”: “http://tang2.example.com”}]}} which I checked is valid JSON. echo blah| clevis encrypt sss $cfg bails out showing the usage help. [Update]After removing all spaces from the single line JSON it works, but the question about using it with clevis bind luks is still valid [/Update]
      DNS round-robin would require rsycing the keys, I do not like the idea that much. A nice way would be integration with IPA

      4. I tried a lot and did not got to a point to get it done right. Finally I just looked at /usr/bin/clevis-decrypt-tang how its supposed to work.
      5. Sure, there should a an easy way to check if policies are met

      6. In the link mentioned I read: “You can fetch the advertisment with a simple command (just be careful your network isn’t compromised!)”. As the adv URL is not encryped and public reachable, it should not contain any private keys etc. Did I missed a point here?

      8. Yes there is documentation, I even linked them in the article. IMHO, the problem with the documentation is that there are not much examples available which makes it hard to understand how to proceed.

      9. Nice 🙂

      Another question is how the key rollover works client side. Just by contacting the server for an decrypt action?

      Thank you

      • Nathaniel McCallum says:

        2. You can fetch the thumbprint from the signing key (the one with {“alg”:”ECDSA”}). Alternatively, you can just get the advertisement with wget or curl from $URL/adv. Then pass in the advertisement file using the “adv” parameter to the clevis tang policy.

        3 and 7. We also do not recommend DNSRR. I gave a working example of sss in the man page and in my talk which you link to. Yes, more integrations (including possibly IPA) would be great. Patches welcome!

        5. What are you envisioning here? Do you have a particular use case?

        6. Correct. The advertisment contains only public keys and signatures. You can fetch it with wget or curl on $URL/adv. Then you can paste it in a kickstart or something else. If you do this, tang does not require network connectivity at all during encryption.

        8. I’d love some patches to improve this. If you can make them by early next week, I can include them in the next release.

        10. Client side key rotation currently requires reprovisioning (just run clevis bind luks again).

  2. Ade Bradshaw says:

    Luc

    Thanks for the article – one small thing – I found I needed to also add tang to the kickstart of the clients also (as well as clevis-dracut)

    Ade

  3. RJForster says:

    I can’t get this to work for machines with more than one hard drive.
    I run the “clevis bind luks …” command for each drive (and each tang server) and that works successfully. The dracut command also seems to work but upon a reboot the box doesn’t communicate with the tang server(s) (I see nothing with tcpdump)

    If I then go back and remove the lines from my kickstart file that correspond to the extra hard drives then it all works just fine end to end.

Leave a Reply

Your email address will not be published. Required fields are marked *