You're about to wind up your day and use your smartphone to check what's in your smart fridge to decide if you need to pass by the store or request delivery before you get home. You quickly pay for the purchase using your credit card registered on your account and promptly receive a push notification confirming the purchase and estimated delivery times.
You use your Metro transit card to jump on the bus or subway train to start making your way home, all the while listening to your favorite podcast on Spotify. Once you get home, you pick up your dinner and jump on to a Zoom call with your loved ones, quickly glancing to confirm the green padlock is active and your call is secure.
Your typical day may resemble the above or some aspects of it, but everything that we take for granted in a typical day requires some form of cryptography. A tiny bit of code that keeps us safe in the digital world – who to trust, who we say we are, was our data tampered with before delivery, or even if we are allowed to access a website.
Yet the word cryptography evokes images of spies (James Bond included), secret messages, covert government agencies, conspiracy theories, and wars flood our minds at the mention of 'cryptography.' In fact, movies like "The Da Vinci Code" and "The Imitation Game" revolve around this fascinating science of concealing information.
TL;DR:
What is Cryptography?
Simply put, cryptography is the method of scrambling data so that it looks like gibberish to anyone except those who know the trick to decode it. Regardless of whether the data is being transmitted or at 'rest' in storage, cryptography uses algorithms to encrypt data, so that only the intended recipient can process the data.
As we delve deeper into cryptography, the following keywords will keep cropping up, and better to mention them now before proceeding further.
- Encrypt - scrambling data to make it incomprehensible.
- Decrypt - unscrambling encrypted data to its original comprehensible format.
- Plaintext - unencrypted or decrypted data; could be text, images, audio, or video.
- Ciphertext - encrypted data.
- Cipher - another word for an encryption algorithm used to scramble data.
- Key - a complex sequence of characters generated by the encryption algorithm, allowing scrambling and unscrambling data.
A textbook cryptography scenario would thus play out as:
Alice wants to communicate with Bob but does not want Eve to read or overhear their conversation. Alice encrypts her message (plaintext) using a cryptographic algorithm with a secret 'key' (only known to Alice and Bob) to create and send the encrypted message – ciphertext. Eve may intercept but will not be able to understand the ciphertext. Bob receives the encrypted message and immediately applies the secret key while reversing the cryptographic algorithm - decrypting the message back into plaintext.
Image: DZone
If you are familiar with cryptography, then you have probably come across Alice and Bob. If you have ever wondered how this cryptographic couple came to be, this article provides a quaint timeline.
While we may be content to leave cryptography to the experts and movies, it is all around us. From the moment you unlock your phone in the morning, access a website, make an online payment, watch Netflix, or purchase an NFT.
It's hard to believe, but cryptography has been around for thousands of years. Early cryptography focused on protecting messages during transportation between allies. Modern cryptography matured to verify data integrity, authenticate identities, implement digital signatures, and many others.
The etymology of cryptography traces its roots back to the Greek words 'Kryptos' meaning 'hidden' and 'graphein,' meaning 'writing.' Ironically, American artist Jim Sanborn erected a sculpture aptly named 'Kryptos,' on the Central Intelligence Agency (CIA) grounds in Langley, Virginia. Yet to be fully deciphered, the sculpture displays scrambled letters hiding a secret message in plain sight in harmony with its location, name, and theme.
If cryptography is so old, why don't we know more about it?
"History is written by the victors." It is unlikely that a victorious army or government will publish details of secret weapons used to win wars. Herein lies the reason why little history on this important topic and its evolution exists. But, what do we know for sure?
A Brief History
As soon as humans started living in different groups or tribes, the idea that we had to work against each other surfaced, and the need for secrecy arose. Think military, political, and national affairs crucial for survival and conquest.
As early as 1900 BC, the non-standard use of Egyptian hieroglyphics hid the meaning of messages from those who did not know the meaning. The Greeks developed the 'Scytale,' which consisted of a parchment strip wrapped around a cylinder with a unique diameter; an enemy needs only try cylinders of varying diameters to decipher the message.
Remember the secret spy decoder ring prize in cereal boxes? It was based on the Caesar cipher. The full name, Caesar Shift Cipher, named in honor of Julius Caesar, was used to encrypt military and official messages in ancient Rome.
The concept behind this type of encryption is simple; shift the alphabet left or right to a set number of spaces and re-write the message using the letter-shift. The recipient of the ciphertext would shift the alphabet back by the same number and decipher the message.
Advancements in cryptology slowed until the Middle ages, with European governments using encryption in one form or another for communication. During this time, cryptanalysis techniques were developed to decipher encrypted messages, starting with the Caesar Cipher. From about 1500, several notable individuals and governments started working to improve encryption and decryption techniques; the cat and mouse game began!
Alan Turing and his work on breaking the Nazi Enigma machine with its over 15,000,000,000,000,000,000 (you read correct, 15 followed by 18 zeros) possible settings used to encrypt plaintext messages, is the most notable historically. Alan Turing's legacy is not limited to contributing to the end of World War II, but he laid the foundation for modern computing and the Turing test to evaluate artificial intelligence.
Little did Turing know that his work on decrypting Nazi war messages would lead to the regulation of cryptography by both international and national law. So much so that to date, cryptography is classified under the Military Electronics and Auxiliary Military Equipment sections of the United States Munitions List (USML) and thus subject to the International Traffic in Arms Regulation (ITAR).
Simply put, shortly after the war, the use or export of a device or software program that included cryptography was highly regulated and required a special U.S. government license. These controls were largely successful at slowing the spread of cryptographic technology internationally, but as a result, the U.S. lost its vanguard position.
Governments are concerned about cryptography from both a national and military perspective and a federal and law enforcement perspective. After the tragic San Bernardino shooting, Apple received a court order to break the encryption and unlock the shooter's iPhone as part of the FBI's investigation. Apple never unlocked the phone, but the case did raise concerns about governments willing to circumvent privacy and cybersecurity standards under the guise of the "greater good."
Types of Cryptography
"Necessity is the mother of invention"
Mankind's increasing reliance on technology and need for secrecy pushed the use of cryptography beyond its historical requirement of only concealing information in transit or in storage – Windows BitLocker may come to mind. The Enigma machine provided privacy; any intercepted communication was incomprehensible.
But as part of that equation, we also need to verify the integrity of messages received and ensure that only those authorized could decrypt and read the message. There are three major categories for cryptographic ciphers: hash functions, symmetric, and asymmetric algorithms.
The encryption of a specific set of plaintext with a particular key and cipher will always generate one specific ciphertext. Even if repeated a million times, the ciphertext will remain the same, provided the original plaintext, key, and cipher stay the same.
This side effect of cryptography can be used to verify if data remained unaltered in storage or during transmission. These unique cryptographic digests or hashes provide a means to verify the integrity of data.
Hash Functions
Also called message digests and one-way encryption are ciphers that do not use a key and generate a fixed-length hash value based on the plaintext submitted.
These ciphers are designed to ensure that even small changes in the plaintext will significantly differ in the hash value. Thus, hash functions provide a digital fingerprint of a file's contents and implement a mechanism to verify if a file is altered from the original – integrity. As one-way encryption, hash functions are not meant to be 'decrypted.'
Several hash ciphers exist, but these are the most notable ones:
- Message Digest (MD) ciphers are a series of byte-oriented algorithms (MD2, MD4, MD5) that produce 1 128-bit hash value. Despite being the most recent version and designed to solve weaknesses in MD4, German cryptographer Hans Dobbertin revealed weakness in MD5 in 1996.
- Secure Hash Algorithm (SHA) is a series of ciphers (SHA-1, SHA-2, and SHA-3) designed to produce varying hash outputs depending on the version. SHA-2 compromises six algorithms in the Secure Hash Standard (SHS) - SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256. Each is designed for a specific purpose and generates corresponding bit-sized hash values.
Hash functions are also used for malware detection in cybersecurity products, as well as identifying copyrighted data. Each piece of data or code generates a unique value that can be used to quickly identify and verify files during analysis.
Symmetric Encryption a.k.a. Secret Key Cryptography (SKC)
The type of encryption considered thus far is classified as Symmetric-key (or single-key) encryption and is focused on privacy and confidentiality.
Symmetric algorithms can be further broken down into stream and block ciphers. A good analogy to understand the difference between the two is to consider encrypting a stream of water straight out of a tap and encrypting fixed-bucket sizes of water.
A block cipher will operate on a fixed-size block of data – imagine filling a data bucket to the brim, encrypt, and proceed to fill the bucket with the next block of data. Block ciphers can operate in one of several different modes. Electronic Codebook (ECB) is the simplest but has a core weakness, a given set of plaintext will always encrypt to the same ciphertext. Cipher Block Chaining (CBC), Cipher Feedback (CFB), Output Feedback (OFB), and Counter (CTR) modes implement some form of feedback mechanism or additional steps to overcome the weaknesses of ECB.
Stream ciphers encrypt each drop of water out of the tap rather than capturing buckers. It combines a byte from a generated keystream – a pseudorandom cipher digit-to encrypt each bit uniquely. There are two significant categories of stream ciphers - a synchronous stream cipher or a self-synchronizing stream cipher.
A few symmetric ciphers worth reading further on include:
- Data Encryption Standard (DES) was designed by IBM in the 1970s but was replaced by Triple DES (3DES) by the early 2000s and DESX due to several identified weaknesses.
- Advanced Encryption Standard (AES) officially replaced DES in 2001. AES uses Rijndael's block cipher, with the latest specification using any combination of 128, 192, or 256 bits key and block lengths.
- Rivest Ciphers, named after Ron Rivest, is a series of symmetric ciphers - RC1, RC2, RC3, RC4, RC5, and RC6. Each iteration improves the previous version; the RC4 stream cipher is the most widely used in commercial products.
- Global System for Mobile Communication (GSM) refers to several stream ciphers used for over-the-air communication. Despite newer versions, the older A5/1 version is the de facto encryption standard used for mobile phone networks, including 3G and 4G, despite being repeatedly broken.
Each category of symmetric encryption is suited for a specific purpose, speed and simplicity of implementation, but in general, symmetric encryption is most suitable for securing large volumes of data. The weakness of symmetric encryption is finding a way to securely share the single encryption/decryption key.
For instance, if Eve somehow managed to get the key, she could decrypt intercepted messages, alter, encrypt, and send her modified messages. Eve could therefore manipulate Alice and Bob, and they would be none the wiser.
Asymmetric encryption a.k.a. Public Key Cryptography (PKC)
There is disagreement about when and who invented PKC. Stanford University professor Martin Hellman and graduate student Whitfield Diffie officially published their November 1976 paper "New Directions in Cryptography." However, Diffie and Hellman credit Ralph Merkle for first describing a public key distribution system, though not a two-key system in 1974.
Declassified documents of the UK's Government Communication Headquarters (GHCQ) reveal initial research started in 1969. By 1975, James Ellis, Clifford Cocks, and Malcolm Williamson had worked out all PKC's fundamental details but could not publish their work.
However, all cryptographers agree that it is one of the most critical cryptographic developments in the past 300 years because it solves securely distributing encryption keys over insecure communication channels. Its development is so significant that it has led to the development of various other technologies.
PKC uses two keys – a public and a private key – based on one-way mathematical functions that are easy to compute but difficult to reverse. In this way, the two keys are mathematically related, but knowledge of one key does not guarantee someone will easily determine the second key.
Asymmetric keys use two very large prime numbers as their starting point. The two numbers can be passed through an exponential or multiplication function to generate an even larger prime number. The reverse of either function, calculating logarithms or factorization, is tough and the 'magic' behind PKC.
Going back to our example of Alice and Bob...
- Bob publishes his public key, which Alice uses to encrypt her plaintext message and send the corresponding ciphertext to Bob.
- Bob then uses his private key to decrypt and retrieve the original plain text.
- Eve can intercept Bob's public key and Alice's ciphertext but cannot determine the private key or decrypt the plaintext.
- Alice can go a step further by encrypting less sensitive plaintext using her private key; Bob decrypts this second ciphertext using Alice's public key to retrieve the original plaintext message.
In the latter instance, PKC implements non-repudiation – Alice cannot deny sending the message.
Some of the public key ciphers in use today for privacy, key exchange, or digital signatures include:
- RSA, named after the three MIT mathematicians Ronald Rivest, Adi Shamir, and Leonard Adleman, is the first and most widely used cipher on the Internet: for key exchange, digital signatures, and encrypting small data sets. Unfortunately, methods such as the General Number Field Sieve and cheaper ever-increasing computing power make breaking RSA keys easier. Fortunately, the RSA key size can be increased.
- Diffie-Hellman is used for key exchange only and not for authentication or digital signatures.
- Digital Signature Algorithm (DSA) enables message authentication via digital signatures.
- Elliptic Curve Cryptography (ECC) is a series of ciphers based on elliptic curves designed for limited computing power and memory devices, such as smartcards. The most notable version is the Elliptic Curve Digital Signature Algorithm (ECDSA) used by many cryptocurrencies because it is the equivalent of DSA but stronger for similar parameters.
While PKC solves the problem of securely sharing a key, it does have several weaknesses and is therefore only applicable in specific situations.
Below is a quick summary of the differences between symmetric and asymmetric cryptography:
Key Differences | Symmetric Encryption | Asymmetric Encryption |
Size of ciphertext | Encryption creates smaller ciphertext compared to the original plaintext - compression. | Larger ciphertext generated compared to the original plaintext. |
Data size | Used to encrypt large data sets. | Used to encrypt small data sets. |
Resource Utilization | Symmetric key encryption requires low computing resources. | Asymmetric key encryption requires high computing resources. |
Key Lengths | 128 or 256-bit key size | RSA 2048-bit or higher key size. |
Number of keys | Single key for encryption & decryption. | Two keys for encryption & decryption. |
Security | Secure communication of a single encryption key lowers security implementation. | More secure because two keys are required for encryption and decryption. |
Maturity | Fairly old technique | Modern encryption technique - 1979. |
Speed | Symmetric encryption is fast | Asymmetric encryption is slower. |
Algorithms | RC4, AES, DES, and 3DES | RSA, Diffie-Hellman, ECC algorithms. |
Why 3 Encryption Techniques?
We have barely scratched the surface of the different algorithm types within the three encryption categories - Hash, SKC, and PKC. The ciphers in each category are used for specific purposes, but they are often combined depending on the technological requirements.
Consider the diagram below – using the three cryptographic techniques to secure communication via a digital signature and digital envelope.
- Alice generates a Random Session Key (RSK) and uses it with Secret Key Cryptography (SKC) to create an Encrypted Message.
- Using Bob's public key, Alice encrypts her RSK to generate an Encrypted Session Key, and together with the Encrypted Message, form a Digital Envelope.
- To generate the Digital Signature of her message, Alice computes the hash value of her message and encrypts this value with her private key.
- Bob receives the communication, uses his private key to decrypt and retrieve Alice's RSK, and subsequently uses the RSK to decrypt Alice's encrypted message.
- Bob computes the hash value of Alice's decrypted message and compares this with the hash values obtained by decrypting the digital signature with Alice's public key. If the two values are the same, Alice truly sent the message.
These simplified steps demonstrate how hash functions, PKC, and SKC ciphers work together to implement confidentiality, integrity, key exchange, and non-repudiation.
However, none of the three encryption techniques work without trust. How do we know if Bob's public key is authentic and not published by someone claiming to be Bob?
Alternatively, Mallory (a malicious attacker) may intercept Bob's public key, create her own private key, generate a new public key for Alice, and Mallory would be able to decrypt all communication between Alice and Bob.
Public Key Infrastructure (PKI): The power behind the matrix!
Cryptography and all online interactions require trust! Whether it is responding to an email, downloading/updating software, or purchasing an item, all need a level of trust. We trust the servers we connect to will provide legitimate software updates for our systems. If, however, those servers were compromised, then attackers can use them to propagate malware (like the SolarWinds Sunburst attack).
Let's consider the example of a driver's license issued in one state (e.g., California). This license or 'certificate' establishes who you are, the type of vehicles you are allowed to drive, the state that issued the license, and even the issue and expiry dates of the license.
If you go over to other states, their jurisdictions will trust the authority of California to issue the license and trust the information it contains. By extension, depending on the country you visit, that country will trust the US government's authority to issue that license.
Coming back to our example of Alice and Bob, PKI establishes trust by using Digital Certificates along with a 'trust chain.' These digital certificates are issued by a trusted third party. They can be traced back to the issuer, contain a public key, serial number, policies about how it was issued, how it may be used, and an issuing and expiry date.
Most importantly, these digital certificates can be used to verify an entity – device, person, program, website, organization, or something else. This verification promotes confidence that a received message is from a known trusted entity.
X.509 (version 3) defines the standard format for public key certificates associating the key with entities. These certificates are used in SSL/TLS connections to ensure browsers connect to trustworthy websites and services.
Certification Authorities (CAs) such as Verisign, DigiCert, GlobalSign, and SecureTrust, to name a few, are responsible for issuing, managing, and revocation of digital certificates. The CAs mentioned are considered Root CAs that sit at the top of the CA hierarchy and self-assign root certificates. Below the root are the subordinate CAs that could be Policy or Issuing CAs, all working together to establish the trust chain.
Alice can apply to a publicly trusted CA, go through the verification process, and, if successful, issue her own X.509 digital certificate. This digital certificate will accompany any message Bob receives from Alice after that. Bob has confidence in the issuing CA and therefore trusts the authenticity of Alice's messages.
Each time you visit a website, send an email, or digitally sign an online document, X.509 certificates encrypt traffic to and from the server and provide identity assurance. This was the original idea behind PKI, but it quickly evolved in the early 2000s during the rise of the mobile workforce when organizations used PKI to authenticate both staff and their devices connecting over Virtual Private Networks (VPNs) to their office networks and servers.
VPNs use advanced encryption protocols to mask your IP address and network traffic over insecure internet connections. VPN protocols include OpenVPN, IKEv2/IPsec, L2TP/IPsec, SSTP, and WireGuard, all of which rely on a combination of hashing, symmetric, and asymmetric encryption ciphers for implementation. Tor also uses multi-layered encryption to secure data traversing its network.
The Internet of Things (IoT), tiny devices, sensors, and micro-programs interconnect to create synergistic heterogeneous environments. Countless bits of data are exchanged and used to make decisions for people and even larger systems. Smart homes, factories, offices, self-driving cars, and self-flying drones are few things that were only possible in science fiction in the early 1990s.
These tiny devices securely connect to their cloud servers to relay data, authenticate, and retrieve software updates. The servers, in turn, connect to other servers providing specific services such as authentication, transaction processing, content streaming, or communication, to name a few.
The history of cryptography is shrouded in mystery, and we may never know all the details, but we cannot deny how much we rely on it in our daily lives. Recently, it is heavily used in blockchains and, by extension, cryptocurrencies and non-fungible tokens (NFTs).
Every coin has two sides, and cryptography is no different. The same cryptography that keeps us and our data safe in a digital world is also misused. Silk Road rose in part due to encryption used by criminals to hide in plain sight from the authorities. Ransomware evolved due to improvements in encryption and is actively used to cripple businesses and critical infrastructure.
Cryptography's Future
What does the future hold for cryptography? Will an uncrackable algorithm be created? Will new cryptanalysis techniques render all encryption ciphers useless?
Will cryptocurrencies give rise to unique security needs that only cryptography can solve? Will quantum cryptography be the new area of focus for researchers and governments?
Encryption exists for data in transit and during storage, however, what about during application and database processing? Startups like Baffle are developing 'security meshes' to protect data during processing and storage in databases to mitigate data breaches.
From keeping data secret to creating a digital currency, cryptography has come a long way from shifting letters using leather straps and hieroglyphs. It is difficult to predict the next step in cryptographic evolution. Only time will tell.