An InfinityDB Encrypted Java NoSQL Database is 100% encrypted and 100% authenticated 100% of the time, with hashing, password changing and signing with multiple certificates, while retaining up to 10x compression. It it is otherwise identical to the InfinityDB Embedded Database.
Transparent Data Encryption for Data at Rest
InfinityDB Encrypted uses ‘Transparent Data Encryption’ or TDE for data at rest to minimize the impact of security. It is necessary only to provide a password on file creation and opening, and all the rest is handled internally. This simple step provides not only encryption to prevent data disclosure, but also client authentication and integrity protection. Integrity protection detects malicious or inadvertent modification of the bytes of the file from outside, providing an Exception on reading bad data. Because InfinityDB Database uses a single file for all data in a given database, these features cover all data at once reliably, even while in use. The encrypted files can be used as backups with no encryption step, and with no decryption on restore. There is no point in time when any unencrypted data reaches storage, and the file data is indistinguishable from random bytes. You can combine this TDE with the other provided security measures to further increase safety of all data.
- All stored data is 100% encrypted 100% of the time according to a password using standard AES-128 or AES-256
- All stored data is 100% integrity checked and authenticated 100% of the time according to the password using standard HMAC-SHA256
- Passwords may be changed easily and instantly at any time for ‘key rotation’ given the previous password
- The password is a ‘Key Encryption Key’ and is not stored in the file and cannot practically be reconstructed
- All secure algorithms and techniques are well-established and standards-based
- Database size is not practically limited by encryption, because encryption is on-the-fly, not batched
- Per-block Randomization – each data block contains data, HMAC, random HMAC salt, and independent random encryption ‘initialization vector’ and block address for maximum strength. Each write re-randomizes.
- Compression up to 10x is preserved, as in the Embedded version
- Encryption levels may be selected – strong or regular for export compliance. Future versions will provide more
- File content is fully dynamic – encrypted data can be updated continuously
- Fast full-file hashing or ‘fingerprinting’ of either encrypted or unencrypted blocks provides strong content and integrity checking of 100% of the data
- Optional signing ensures a guaranteed overall file content; backups are therefore authenticated and safe from external modification
- Signing algorithms are selectable including SHA256, SHA3 or MD5 or any other with RSA, DSA, or ECDSA
- Multiple X509 certificates and their trust chains or bare public keys are stored in the file and organized automatically
- Partial signing by different processes will finally reach fully signed state; not all private keys are needed at once
Compatible with InfinityDB Embedded
- The API is a superset of the InfinityDB Embedded Database API
- Reaches about 50% of embedded mode performance for disk I/O, but full speed in the memory cache
- Compression is preserved at 1 to 10x or more
- Unencrypted files are compatible; they can still be opened for read or write and stay unencrypted
These features of InfinityDB Encrypted provide vital security for the entire set of Items in the database, which means all of the database content is protected. InfinityDB Encrypted data in storage is never in plaintext unprotected form at rest or in motion, but is accessed and modified directly in place in the single encrypted InfinityDB file. Hence, an InfinityDB file can also be used as a secure ‘message’ for communication, or as a backup. There is no ‘metadata’ like internal filenames, database names, schemas, user names or ‘personally identifiable information’ to be leaked.
Avoids Common Leaks
Most DBMS and other systems expose data at least while it is being used, so it ‘leaks’ in the clear into uncontrollable locations. Even if data is only momentarily decrypted in storage, leaks are possible, such as at least these:
- Disk or RAID array block caches hold raw data, and they will often be battery-backed up for recovery after hardware failure;
- Disks are often removed from hosts or RAID sets and not destroyed or else are removed for safety before they actually fail;
- Entire host systems may be decommissioned or reconfigured or re-purposed without wiping or destroying the storage;
- The file system cache in memory contains clear text;
- Cluster nodes may store and communicate clear text (MongoDB communicates clear text);
- Map/reduce processing, such as by Hadoop, normally only works with clear text;
- Cloud virtual servers cannot credibly ensure security of data are not FIPS 140-2 compliant;
- Deleted files remain on the underlying storage;
- ‘Shredding’ file data by overwriting does not work, because a file system may copy data internally – for example zfs copies every block that is written;
- ‘Journalling’ file systems like ext3, ext4, jfs, and xfs copy data or metadata to a ‘log’ or ‘journal’ area;
- Defragmentation leaves copies on storage that is not apparent in the file system;
- SSD’s or Flash storage leave data in un-erased blocks and ‘overprovisioned’ space;
- Archives such as tar or zip may continue to contain clear text copies of data thought to have been erased;
- Many DBMS’ keep sort areas, indexes, logs, rollback segments, and even base tables in the clear in storage; and
- Abrupt shutdown of systems, applications or processes leaves any temporary clear text files behind.
InfinityDB Encrypted avoids these leaks because during the entire lifetime of the database, data never exists unencrypted in storage.
Full-Disk Encryption or encrypting file systems can help, but they have no enforcement mechanism to prevent data being copied out into the clear either at rest or in motion, and there is no means for verifying authenticity or detecting data corruption. Individual users cannot be given fine-grain selective access to data. With FDE there is a single crude security domain, and if the encryption key is disclosed, all data is compromised that may have ever been on any disk that may have used that key, even decommissioned or trashed disks if not properly crypto-shredded or destroyed. Encrypted disks are not routinely re-keyed or given unique keys due to operational awkwardness. FDE may provide a false sense of security.
Unlimited Practical Data Size
InfinityDB Encrypted has no slow batch encryption/decryption step that limits the practical database size or causes long service interruptions.
Encryption prevents unauthorized reading of the data based on a secret ‘Password-Based-Encryption’ or ‘PBE’ password. This password can be short or long to provide ease of memorization or strong security. The industry standard encryption is used – AES-128 or AES-256. The password is not stored in the file and it cannot practically be reconstructed from the file data.
Authentication and Integrity Protection
The password is also used to protect the integrity of all of the content by means of an ‘HMAC’ hash, which combines the password with a regular secure hash function to identify accidental or intentional corruption of the data every time the relevant portion of the file is read. Because the HMAC is dependent on the PBE password, it cannot be calculated by someone not having the password, preventing impersonation by an attacker. Without the HMAC, encrypted data could be modified externally, and the decrypted data would change in some unpredictable way, with unpredictable results. The HMAC is calculated on each write of a data block and on reading it back it is verified to have remained unchanged. The industry standard method is used with HMAC-SHA256.
Hashing for ‘Fingerprinting’
A fingerprint can be calculated quickly using a hash algorithm with or without the PBE password to determine quickly that the contents of the database are as expected. This is like a standard file hash but it avoids being dependent on the part of the encrypted file that does not encode the ItemSpace, which is the set of encrypted blocks. If the hash is stored separately, it is easy to recompute it and check that has not changed. Either the encrypted or the plaintext data blocks can be hashed. Hashing the plaintext blocks is slower and it requires the PBE password, but it verifies the HMAC of every block as a side-effect.
Instantly Changeable Passwords
Passwords can be changed even after the file is created. Unlike most file encryption systems, InfinityDB uses a standard two-step ‘AES key wrapping’ mechanism to transparently convert the client-supplied PBE password into the internal data encryption and HMAC keys. The PBE password or the internal keys cannot be reconstructed from file data. With this feature, one can better isolate and secure production, backup, test, or transmitted databases. Passwords can be changed regularly, for example for ‘key rotation’ or ‘re-keying’, which is required in many environments. Changing the PBE password to a large random number and ‘forgetting’ it is a way to effectively ‘delete’ or ‘shred’ the database completely and permanently – this is called ‘crypto-shredding’. The key wrapping mechanism keeps an instantly alterable ‘wrapped encryption key’ in the file header that can only be decrypted with the PBE password, which functions as a ‘key encryption key’.
Signing can be used to verify that the entire database has a good, trusted state. For example, it can be used with backups, so that a database to be restored by being copied into the active system was not corrupted in any way, and was last modified by a trusted client. The trusted client not only had access to the PBE password, thereby proving that they were authorized, but also that client left the database in a state that they wanted to preserve, not an intermediate, experimental, incomplete, or suspected incorrect state. A third-party attacker cannot corrupt the signed database either in a random shotgun attack or in a ‘backup attack’ by using blocks read from pairs of backup databases, or even by obtaining the PBE password and altering the database normally. Inadvertent non-malicious file modifications are detected as well. The signing or signature verification processes read the entire database, checking every byte.
Signatures also avoid the weakness of PBE passwords in that the PBE passwords must be provided to all parties that need read or write access to the database, so they are distributed widely. Instead, private/public key pairs can be used for ‘asymmetrical’ cryptography to make key handling far safer. This is a standard route that uses the X509 certificates also used in SSL/TLS (https) security. The private keys are kept safe by individual participants in their own ways. Each signing participant has a private/public key pair, and their public keys are broadcast directly or indirectly to all participants who wish to ‘verify’ the data to determine whether the data is trustworthy. The public keys may either be used alone (‘bare’ public keys) or they can be vouched for with certificates, and a chain of such certificates signing each other can lead to a ‘root’ certificate that is commonly available and trusted by everyone. With such a ‘trust chain’ it is not necessary for trustors to have direct access to any public keys at all, and the trust rules can depend on the set of certificates in the signature in client-implemented custom ways. There can be multiple signatories, with their certificates persisted in the file itself, some signed, some not. This is based on standard technology, and can provide vital, flexible security.
Here are the implementation features.
Encryption and Integrity Checking
Each underlying file block is separately encrypted with a secure random initialization vector using AES-128 or AES-256. Each block is independently integrity checked with HMAC-SHA256 that covers all other block data. The encryption and HMAC keys are independent and securely randomly generated. The block numbers are encrypted and authenticated per-block as well. Every write of a block changes its stored data completely and seemingly-randomly, even for partial block changes or identical block data. Corruption or truncation of a file is immediately detected on read of a corrupted block.
Hashing for ‘Fingerprinting’
A global SHA256-based hash can be calculated quickly on demand, dependent on only the block data and the file’s logical length, i.e. the ItemSpace content. Either the encrypted or plaintext blocks may be hashed. The encrypted block hash is what is actually signed. Different initial databases will always have different hashes, but a given database that is not modified will continue to have the same hash as long as its content is not modified. The encrypted hash is very fast, and does not require the password. The unencrypted hash requires the password and is slower, but it checks all of the HMAC’s on each block. Both hashes will detect file truncation.
The hash algorithm is not guaranteed to remain unchanged, so in the future if for example SHA256 is compromised or for other reasons, new InfinityDB Encrypted versions will include more algorithms.
The password technology used is the well-established standard ‘key wrapping’, in which the PBE password is converted to a ‘key-encryption key’ or ‘KEK’ internally. The AES-128-based KEK encrypts the final data encryption and HMAC keys producing a ‘wrapped’ key stored in the file. The actual data encryption and HMAC keys are large random binary numbers that do not occur anywhere in the file. They are derived when needed from the PBE password and some data in the file header: a 32-byte random salt plus the wrapped key. The PBE, data encryption, and HMAC keys cannot practically be derived from the raw file content. The PBE password is not used for this directly, but instead is processed first using a standard hash iteration step. The iterations repeatedly apply a standard PBE hash enough times that it takes about 1msec to 10msec to finish, so that brute-force attacks are slowed down and become impractical.
Note that the owner of any process such as a running Java Virtual Machine can obtain a memory dump of it that can expose the PBE key and the derived data encryption and HMAC keys. This is not allowed in production systems of course, which always have a dedicated private user, or at least a trusted user. In a virtual private server in the cloud, the process memory cannot be proven to be securely private. Furthermore, the virtual memory swap space contains copies of process memory pages. All secure processes are subject to this risk. To minimize the risk, the PBE password, which is passed as a char array may be zeroized by client code after opening. The internal data encryption and HMAC keys stay in memory while the file is open. However, the JVM may copy any data in memory during garbage collection or at any other time, so keys cannot be guaranteed to be erased from the process memory. Hence, the changeability of the PBE key provided by InfinityDB is essential, allowing routine rotation of the PBE password. Compromise of the internal data encryption and HMAC keys is less problematic, because using them would require reverse-engineering of the InfinityDB code and access to the database file. The data encryption and HMAC keys are large random binary numbers generated securely at file creation time, therefore they may be harder to locate in a memory dump.
A database file can be signed in order to ensure that its entire contents are as expected and not corrupted. Each time the file is signed, a hash of the database content is computed, then a certificate or bare public key along with a private key is used to compute a signature over that hash and then the signature is written into the file in a header. Later, signature verification uses the certificate or public key again to verify the header data, and then the header hash is compared with a re-computed hash.
Multiple certificate chains and bare public keys can co-exist in the file header in ‘SignatureInfo’s. Certificate organization features like duplicate certificate path elimination, trust chain signing sequence checking, and chain sorting on signing sequence are provided. Each SignatureInfo also designates a signing hashing algorithm that further identifies it, such as SHA256 or MD5.
The signing and signature verification hashes the full set of encrypted data blocks at high speed. SignatureInfos can be in either signed or unsigned state, and the state persists until block data actually changes, so multiple signers do not need to have the file open at once. Signing requires only providing the signer’s private key or keys and then invoking sign(), and the private key or keys are automatically matched to the public keys of certificates that become signed. If multiple private keys are provided, the signing process shares a single hash computation. Signature verification does not require the encryption password. Signature verification by default requires that all certificates are signed.
Signature Certificate Validation Strategies
Certificate paths in the SignatureInfos can be validated based on a set of trusted certificates. External storage or availability of signing certificate paths after they are put in the file is not necessary: only private keys for signing are needed, and for signature verification, only trusted public keys or trusted intermediate or root certificates are needed.
Signature verification by default requires that all certificates are signed. However, the full set of SignatureInfos or the signed SignatureInfos can be retrieved from the file and enumerated by client code. In the future, verification can use client-implemented strategies like ‘any signature based on this public key is enough’ or ‘any N signatures is enough’, or ‘any validated signature certificates having certain distinguished name patterns is enough’. Signature certificates can be validated without the password.
Other Databases have tried to retro-fit security, but there are so many remaining issues that the security is still weak or non-existent. The big databases like ORACLE or SQL Server can be assumed to have succeeded in hardening the internals. Still, applications, users, and database administrators have to pay considerable attention in order to provide credible protection. It is necessary to pay attention to text log files, transaction logs, backups, slaves, temporary files like sort areas, index content, and any other kind of dangling dumps or copies and so on. InfinityDB Encrypted has a trivial single-file architecture that avoids these attack surfaces.
In the ‘Database-as-a-Service’ model, one tries to ‘outsource’ the DBMS from the application so that the DBMS is untrusted. The DBMS stores only externally encrypted data, but the encryption techniques are special, so that for example, order-preserving encryption is used in some cases. There are many attacks possible and more are being discovered. There are ways to reconstruct the data by finding subtle leaks based on even just the size of the results of a series of random meaningless queries. The resulting DBMS will be totally uninterpretable, if it is to be credibly secure, so it will be difficult to deal with in day-to-day use.
Rewriting applications to handle security ad-hoc at the application level can be very complex and delicate and can impose burdens on everyone, while not being credibly secure.
In any case, security can be improved by adding or switching to InfinityDB Encrypted for critical data.
- Enveloping. This allows the database to be accessible only to selected accessors i.e. database recipients based on public/private key pairs. A set of Envelope certificates are kept in the file, and for each, there is a copy of the PBE password in the file encrypted with the Envelope certificate’s public key. The PBE password can be decrypted with the private key for an envelope certificate. So, instead of keeping track of potentially many PBE passwords, one per file, a given private key can be used to access a whole set of files that have the proper envelope certificate. Since the PBE passwords are no longer necessarily exposed externally, they can be very long and strong, even non-user readable. Because a given file can have multiple envelope certificates, access control is flexible and tight, yet simple. The private keys can be kept tightly secure within each recipient, while the PBE password would tend to be more widely available, since it is a symmetric secret key.
- Multi-threaded encrypted data block hashing also for signing and signature verification to reach very high speeds.
- Key Management System integration will allow separate storage and manipulation of encryption keys, so that they can participate in the ‘key life-cycle’. This allows keys to be created, given designated lifetimes, invalidated, archived, and deleted. Central control of keys is possible for enterprise use. The KMIP ‘Key Manager Interoperability Protocol’ is being investigated for this. Strong storage using HSM ‘Hardware Security Modules’, possibly on-premises, becomes possible this way.
The implementation uses an underlying ‘shim’ called EncryptedRandomAccessFile that provides its overlying InfinityDB database with a logical GeneralizedRandomAccessFile, while physically storing the data as encrypted blocks in a normal RandomAccessFile. The InfinityDB-specific GeneralizedRandomAccessFile is necessary instead of a subclass of RandomAccessFile, because the latter cannot be subclassed (this is considered an original mistake in Java – InputStream and OutputStream are OK though).
The EncryptedRandomAccessFile also contains a ‘header’ before the encrypted blocks that describes the file state, and which contains structure for future extensions, signature information and eventually information for ‘enveloping’. The header itself is variable-length but has a limited fixed space at the front of a particular file – if too much data is attempted to be written in that space, an IOException is thrown, but the file is still usable in its previous state. Currently the size is fixed at 100K but later it will be settable on create(). This should be plenty. The header can change without the hash being changed.
InfinityDB Encrypted has been tested with the Sun security provider as well as Bouncy Castle. Bouncy Castle is the main alternative to Sun, and it adds many features, such as a complete certificate generation capability.