An InfinityDB Encrypted Java NoSQL Database is 100% encrypted and 100% authenticated 100% of the time, with hashing, password changing and signing with multiple certificates, while retaining up to 10x compression. It it is otherwise identical to the InfinityDB Embedded Database.
Transparent Data Encryption for Data at Rest
InfinityDB Encrypted uses ‘Transparent Data Encryption’ or TDE for data at rest to minimize the impact of security. It is necessary only to provide a password on file creation and opening, and all the rest is handled internally. You can combine this TDE with other security measures to increase safety of all data at rest. Because InfinityDB Database uses a single file for all data in a given database, the encryption covers all data at once reliably, even while in use. The encrypted files can be used as backups with no encryption step, and with no decryption on restore. There is no point in time when any unencrypted data reaches storage.
- All stored block data is 100% encrypted 100% of the time according to a password using AES-128 or AES-256
- All stored block data and header is 100% integrity checked 100% of the time according to the password using HMAC-SHA256
- Database size is not practically limited by encryption, because encryption is on-the-fly, not batched
- Per-block Randomization – the block contains data, HMAC, random HMAC salt, and independent random encryption ‘initialization vector’ for maximum strength. Each write re-randomizes.
- Compression up to 10x is preserved, as in the Embedded version
- Passwords may be changed instantly at any time for ‘re-keying’ or ‘key rotation’.
- Encryption levels may be selected – strong or regular for export compliance. Future versions will provide more
- File content is fully dynamic – encrypted data can be updated continuously
- Fast full-file hashing or ‘fingerprinting’ of either encrypted or unencrypted blocks provides strong content and integrity checking
- Optional signing ensures a guaranteed overall file content; backups are therefore authenticated and safe from external modification
- Signing algorithms are selectable including SHA256, SHA3 or MD5 or any other with RSA, DSA, or ECDSA
- Multiple X509 certificates and their trust chains or bare public keys are stored in the file and organized automatically
- Partial signing by different processes will finally reach fully signed state; not all private keys are needed at once
Compatible with InfinityDB Embedded
- The API is a superset of the InfinityDB Embedded Database API
- Reaches about 50% of embedded mode performance for disk I/O, but full speed in the memory cache
- Compression is preserved at 1 to 10x or more
- Unencrypted files are compatible; they can still be opened for read or write and stay unencrypted
These features of InfinityDB Encrypted provide vital security for the entire set of Items in the database, which means all of the database content is protected. InfinityDB Encrypted data in storage is never in plaintext unprotected form at rest or in motion, but is accessed and modified directly in place in the single encrypted InfinityDB file. Hence, an InfinityDB file can also be used as a secure ‘message’ for communication, or as a backup. There is no ‘metadata’ like internal filenames, database names, schemas, user names or ‘personally identifiable information’ to be leaked.
Avoids Common Leaks
Some DBMS and other systems expose data while it is being used, so it ‘leaks’ in the clear into uncontrollable locations. Full-disk encryption or encrypting file systems can help, but they have no enforcement mechanism to prevent data being copied out into the clear either at rest or in motion, and there is no means for verifying authenticity or detecting corruption. Even if data is only momentarily decrypted in storage, leaks are possible, such as at least these:
- Disk or RAID array block caches hold raw data, and they will often be battery-backed up for recovery after hardware failure;
- Disks are often removed from hosts or RAID sets and not destroyed or else are removed for safety before they actually fail;
- Entire host systems may be decommissioned or reconfigured or re-purposed without wiping or destroying the storage;
- The file system cache in memory contains clear text;
- Cluster nodes may store and communicate clear text (MongoDB communicates clear text);
- Map/reduce processing, such as by Hadoop, normally only works with clear text;
- Cloud services cannot credibly guarantee security of data at rest or in motion and are not FIPS 140-2 compliant;
- Clear text files in a file system remain on the underlying storage even after being deleted;
- ‘Shredding’ file data by overwriting does not work, because a file system may copy data internally – for example zfs copies every block that is written;
- ‘Journalling’ file systems like ext3, ext4, jfs, and xfs copy data or metadata to a ‘log’ or ‘journal’ area;
- Defragmentation leaves copies on storage;
- SSD’s or Flash storage leave data in un-erased blocks and ‘overprovisioned’ space;
- Some DBMS may generate clear-text sort areas, indexes, logs, and so on; and
- Abrupt shutdown leaves any temporary clear text behind.
InfinityDB Encrypted avoids these leaks because during the lifetime of the database, data never exists unencrypted in storage.
Unlimited Practical Data Size
InfinityDB Encrypted has no slow batch encryption/decryption step that limits the practical database size or causes long service interruptions.
Encryption of course prevents unauthorized reading of the data based on a secret ‘Password-Based-Encryption’ or ‘PBE’ password. This password can be short or long to provide ease of memorization or strong security. The industry standard encryption is used – AES-128 or AES-256.
Authentication and Integrity Protection
The password is also used to protect the integrity of all of the content by means of an ‘HMAC’ hash, which combines the password with a regular secure hash function to identify accidental or intentional corruption of the data every time the relevant portion of the file is read. Because the HMAC is dependent on the PBE password, it cannot be calculated by someone not having the password, preventing impersonation by an attacker. Without the HMAC, encrypted data could be modified externally, and the decrypted data would change in some unpredictable way, with unpredictable results. The HMAC is calculated on each write of a data block and on reading it back, and it is simply verified to have remained unchanged. The industry standard HMAC algorithm is used – HMAC-SHA256.
Hashing for ‘Fingerprinting’
A fingerprint can be calculated quickly using a hash algorithm with or without the PBE password to determine quickly that the contents of the database are as expected. This is like a standard file hash but it avoids being dependent on the part of the encrypted file that does not encode the ItemSpace, which is the set of encrypted blocks. If the hash is stored separately, it is easy to recompute it and check that has not changed. Either the encrypted or the plaintext data blocks can be hashed. Hashing the plaintext blocks is slower and it requires the PBE password, but it verifies the HMAC of every block as a side-effect.
Quickly Changeable Passwords
Passwords can be changed even after the file is created. Unlike most file encryption systems, InfinityDB uses a standard two-step ‘AES key wrapping’ mechanism to convert the PBE password into the final data encryption and HMAC keys. With this feature, one can better isolate and secure production, backup, test, or transmitted databases. Passwords can be changed regularly, for example for ‘key rotation’ or ‘re-keying’, which is required in many environments. Changing the PBE password to a large random number and ‘forgetting’ it is a way to effectively ‘delete’ or ‘shred’ the database completely and permanently – this is called ‘crypto-shredding’. The key wrapping mechanism keeps an instantly alterable ‘wrapped encryption key’ in the file header that can only be decrypted with the PBE password, which functions as a ‘key encryption key’.
Signing can be used to verify that the entire database has a good, trusted state. For example, it can be used with backups, so that a database to be restored by being copied into the active system was not corrupted in any way, and was last modified by a trusted client. The trusted client not only had access to the PBE password, thereby proving that they were authorized, but also that client left the database in a state that they wanted to preserve, not an intermediate, experimental, incomplete, or suspected incorrect state. A third-party attacker cannot corrupt the signed database either in a random shotgun attack or in a ‘backup attack’ by using blocks read from pairs of backup databases, or even by obtaining the PBE password and altering the database normally. Inadvertent non-malicious file modifications are detected as well. The signing or signature verification processes read the entire database, checking every byte.
Signatures also avoid the weakness of PBE passwords in that the PBE passwords must be provided to all parties that need read or write access to the database, so they are distributed widely. Instead, private/public key pairs can be used for ‘asymmetrical’ cryptography to make key handling far safer. This is a standard route that uses the X509 certificates also used in SSL/TLS (https) security. The private keys are kept safe by individual participants in their own ways. Each signing participant has a private/public key pair, and their public keys are broadcast directly or indirectly to all participants who wish to ‘verify’ the data to determine whether the data is trustworthy. The public keys may either be used alone (‘bare’ public keys) or they can be vouched for with certificates, and a chain of such certificates signing each other can lead to a ‘root’ certificate that is commonly available and trusted by everyone. With such a ‘trust chain’ it is not necessary for trustors to have direct access to any public keys at all, and the trust rules can depend on the set of certificates in the signature in client-implemented custom ways. There can be multiple signatories, with their certificates persisted in the file itself, some signed, some not. This is based on standard technology, and can provide vital, flexible security.
Here are the implementation features.
Encryption and Integrity Checking
Each underlying file block is separately encrypted with a secure random initialization vector using AES-128 or AES-256. Each block is independently integrity checked with HMAC-SHA256 that covers all other block data. The encryption and HMAC keys are independent and securely randomly generated. The block numbers are encrypted and authenticated per-block as well. Every write of a block changes its stored data completely and seemingly-randomly, even for partial block changes or identical block data. Corruption or truncation of a file is immediately detected on read of a corrupted block.
Hashing for ‘Fingerprinting’
A global SHA256-based hash can be calculated quickly on demand, dependent on only the block data and the file’s logical length, i.e. the ItemSpace content. Either the encrypted or plaintext blocks may be hashed. The encrypted block hash is what is actually signed. Different initial databases will always have different hashes, but a given database that is not modified will continue to have the same hash as long as its content is not modified. The encrypted hash is very fast, and does not require the password. The unencrypted hash requires the password and is slower, but it checks all of the HMAC’s on each block. Both hashes will detect file truncation.
The hash algorithm is not guaranteed to remain unchanged, so in the future if for example SHA256 is compromised or for other reasons, new InfinityDB Encrypted versions will include more algorithms.
The password technology used is the well-established standard ‘key wrapping’, in which the PBE password is converted to a ‘key-encryption key’ or ‘KEK’ internally. The AES-128-based KEK encrypts the final data encryption and HMAC keys producing a ‘wrapped’ key stored in the file. The actual data encryption and HMAC keys are permanent long secure random numbers but do not occur anywhere in the file. They are derived when needed from the PBE password and some data in the file: a 32-byte random salt plus the wrapped key. The PBE password is only kept in memory momentarily before being zeroed after it is used to determine the data encryption key and HMAC key, which then remain in memory while the file is open. Java cannot guarantee that data in memory will not be copied by the garbage collector, so the PBE password should be zeroed as quickly as possible. In principle, the longer-lived encryption and HMAC keys are at risk of being exposed by a memory dump as well, as with almost any crypto system, so such dumps must be kept secret or zeroed. These internal ‘hidden’ keys can be destroyed by destroySensitiveData() and destroyAllPrivateKeys().
A database file can be signed in order to ensure that its entire contents are as expected and not corrupted. Each time the file is signed, a hash of the database content is computed, then a certificate or bare public key along with a private key is used to compute a signature over that hash and then the signature is written into the file in a header. Later, signature verification uses the certificate or public key again to verify the header data, and then the header hash is compared with a re-computed hash.
Multiple certificate chains and bare public keys can co-exist in the file header in ‘SignatureInfo’s. Certificate organization features like duplicate certificate path elimination, trust chain signing sequence checking, and chain sorting on signing sequence are provided. Each SignatureInfo also designates a signing hashing algorithm that further identifies it, such as SHA256 or MD5.
The signing and signature verification hashes the full set of encrypted data blocks at high speed. SignatureInfos can be in either signed or unsigned state, and the state persists until block data actually changes, so multiple signers do not need to have the file open at once. Signing requires only providing the signer’s private key or keys and then invoking sign(), and the private key or keys are automatically matched to the public keys of certificates that become signed. If multiple private keys are provided, the signing process shares a single hash computation. Signature verification does not require the encryption password. Signature verification by default requires that all certificates are signed.
Signature Certificate Validation Strategies
Certificate paths in the SignatureInfos can be validated based on a set of trusted certificates. External storage or availability of signing certificate paths after they are put in the file is not necessary: only private keys for signing are needed, and for signature verification, only trusted public keys or trusted intermediate or root certificates are needed.
Signature verification by default requires that all certificates are signed. However, the full set of SignatureInfos or the signed SignatureInfos can be retrieved from the file and enumerated by client code. In the future, verification can use client-implemented strategies like ‘any signature based on this public key is enough’ or ‘any N signatures is enough’, or ‘any validated signature certificates having certain distinguished name patterns is enough’. Signature certificates can be validated without the password.
Other Databases have tried to retro-fit security, but there are so many remaining issues that the security is still weak or non-existent. The big databases like ORACLE or SQL Server can be assumed to have succeeded in hardening the internals. Still, applications, users, and database administrators have to pay considerable attention in order to provide credible protection. It is necessary to pay attention to text log files, transaction logs, backups, slaves, temporary files like sort areas, index content, and any other kind of dangling dumps or copies and so on. InfinityDB Encrypted has a trivial single-file architecture that avoids these attack surfaces.
In the ‘Database-as-a-Service’ model, one tries to ‘outsource’ the DBMS from the application so that the DBMS is untrusted. The DBMS stores only externally encrypted data, but the encryption techniques are special, so that for example, order-preserving encryption is used in some cases. There are many attacks possible and more are being discovered. There are ways to reconstruct the data by finding subtle leaks based on even just the size of the results of a series of random meaningless queries. The resulting DBMS will be totally uninterpretable, if it is to be credibly secure, so it will be difficult to deal with in day-to-day use.
Rewriting applications to handle security ad-hoc at the application level can be very complex and delicate and can impose burdens on everyone, while not being credibly secure.
In any case, security can be improved by adding or switching to InfinityDB Encrypted for critical data.
- Enveloping. This allows the database to be accessible only to selected accessors i.e. database recipients based on public/private key pairs. A set of Envelope certificates are kept in the file, and for each, there is a copy of the PBE password in the file encrypted with the Envelope certificate’s public key. The PBE password can be decrypted with the private key for an envelope certificate. So, instead of keeping track of potentially many PBE passwords, one per file, a given private key can be used to access a whole set of files that have the proper envelope certificate. Since the PBE passwords are no longer necessarily exposed externally, they can be very long and strong, even non-user readable. Because a given file can have multiple envelope certificates, access control is flexible and tight, yet simple. The private keys can be kept tightly secure within each recipient, while the PBE password would tend to be more widely available, since it is a symmetric secret key.
- Multi-threaded encrypted data block hashing also for signing and signature verification to reach very high speeds.
- Key Management System integration will allow separate storage and manipulation of encryption keys, so that they can participate in the ‘key life-cycle’. This allows keys to be created, given designated lifetimes, invalidated, archived, and deleted. Central control of keys is possible for enterprise use. The KMIP ‘Key Manager Interoperability Protocol’ is being investigated for this. Strong storage using HSM ‘Hardware Security Modules’, possibly on-premises, becomes possible this way.
The implementation uses an underlying ‘shim’ called EncryptedRandomAccessFile that provides its overlying InfinityDB database with a logical GeneralizedRandomAccessFile, while physically storing the data as encrypted blocks in a normal RandomAccessFile. The InfinityDB-specific GeneralizedRandomAccessFile is necessary instead of a subclass of RandomAccessFile, because the latter cannot be subclassed (this is considered an original mistake in Java – InputStream and OutputStream are OK though).
The EncryptedRandomAccessFile also contains a ‘header’ before the encrypted blocks that describes the file state, and which contains structure for future extensions, signature information and eventually information for ‘enveloping’. The header itself is variable-length but has a limited fixed space at the front of a particular file – if too much data is attempted to be written in that space, an IOException is thrown, but the file is still usable in its previous state. Currently the size is fixed at 100K but later it will be settable on create(). This should be plenty. The header can change without the hash being changed.
InfinityDB Encrypted has been tested with the Sun security provider as well as Bouncy Castle. Bouncy Castle is the main alternative to Sun, and it adds many features, such as a complete certificate generation capability.
For info and suggestions please email firstname.lastname@example.org.