Is Encryption the Answer to Your Cloud Storage Woes?

Cloud storage

Cloud storage

In today’s world of infrastructure security, data security becomes more increasingly important when using cloud computing at all “levels”; infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and Software-as-a-service(SaaS).

In case that you are not aware, most of the privacy laws enacted today exempt organizations  from reporting loss of protected data if that data was encrypted.  This could just prompt someone to exhort  to simply encrypt all of the data as it leaves the facility and moves around ,out there in the Cloud.

That’s like saying we will eliminate war if we could eliminate hunger. That’s a simplified view and I wish that reality is so simple. Let me explain. When your data is out there is the Cloud, it is in one of three states: in motion, at rest, or in process.

Data-in-Motion:
Data in motion refers to data as it is moved from a stored state as a file or database entry to another form in the same or to a different location. Any time you upload data to be stored in the cloud, the time at which the data is being uploaded , the data is considered to be data-in-motion. Data in motion can also apply to data that is in transition and not necessarily permanently stored. Your username and password for accessing a Web site or authenticating yourself to the cloud would  be considered sensitive pieces of data in motion that are not actually stored in unencrypted form.

Because data in motion only exists as it is in transition between points – such as data transit between end points – securing this data focuses on preventing the data from being tampered with as well as making sure that it remains confidential. One risk has to do with a third party observing the data while it was in motion.

Data-at-Rest:

Data at rest refers to any data in computer storage, including files on an employee’s computer, corporate files on a server, or copies of these files on off-site tape backup.  As with Data-in Motion, the Cloud extends the places your data is “resting” and who has access, without your knowledge.

Protecting data at rest in a cloud is not radically different than protecting it outside a cloud. Generally speaking, the same principles apply. Granted there is an added risk as the data owning enterprise does not physically control the data. The trick to achieving  actual security advantage with on-premise is following with effective security.

Data-in-Process:
Data-in-process is the data while it is actually being processed inside a server or  workstation.  The data could be in memory, in cache, or in registers inside the CPU.  Normally we don’t worry very much about that.  This is data that is changing quickly, usually coming and going at microsecond time scales, and data that disappears when power goes away.  Certainly inside your own data center, this state of your data is not a worry.  Inside the Cloud, it does become more complex.  Through virtualization, multiple customers are likely using the same physical memory, cache and CPU over the same short time interval.  Bugs in the virtualization layer or the operating system can allow that information to be inappropriately available to other processes running in the same server.  Memory dumps and other diagnostic tools may collect data from multiple customers, which are then seen by those working on fixing the problem.

We distinguish among these three data states for two reasons:

  1. These states have different characterizations that impact the requirements on encryption.
  2. They require different solutions, as there is no product that handles all three states.

In fact there is no way currently, nor likely in the near future, to deal with Data-in-Process.  Any encryption algorithm would substantially increase the time it takes to do anything, and wouldn’t solve the problem anyway.  The only way to deal with Data-in-Process is through your selection of a CSP.  Make sure they have the process and procedures to treat any such data, such as dumps, with appropriate care.  The CSP should let you read their process documents, and should share with you at least summary results from any periodic audit.

The most obvious attribute that distinguishes between Data-in-Motion and Data-at-Rest is lifetime, sometimes referred to as “retention.”  This is the length of time between when the data is first encrypted and when it is last decrypted.

Data-in-Motion has a lifetime measured in milliseconds.  Even getting data from halfway around the world and back to you is no more than half a second.  Once the data gets to the other end, you never need that encrypted message again.  If it later is sent somewhere else, it just gets encrypted again.  If one of the endpoints loses the encryption key, basic network protocols will automatically re-initialize the connection, probably with a different encryption key, and resend the message.   Nothing is loss.  Nothing is vulnerable since what is left on the network is encrypted.  Typically, Data-in-Motion encryption keys are maintained only for the duration of a single session (e.g., the length of time you are doing online banking).  This time is usually measured in minutes, and some encryption products will automatically create a different key every so many minutes.  These keys are usually stored only in the server or workstation memory at the end points.  When someone signs off or turns off the workstation, the keys disappear.

Data-at-Rest may have a lifetime measured in decades.  This means that you have to keep the keys available, but secure, for long periods of time.  The bad news is that if you lose access to the keys, you lose the data.  The good news is that you can effectively destroy the data by simply destroying the key.  If the data had been stored electronically and encrypted, it would have taken only a few seconds to destroy the keys and the data would have been rendered useless, with no environmental impact.

Clearly you want the strongest encryption you can find, right?  Maybe, but probably not.  The stronger the encryption the longer it takes to encrypt and decrypt the data.  Unless you have hardware assistance on the encryption and decryption, it can have a significant impact on the time it takes the data to move through a network, or to retrieve it from storage.  It is usually more of an impact on Data-at-Rest where you may be accessing thousands or millions of bytes of data at a time.  By the very nature of networks, large blocks of data are broken down into relatively small pieces and transmitted separately.

The other issue that encryption techniques help is in ensuring the integrity of the data – making sure that the record received at the other end is exactly what you sent, or the data you retrieved from the database has not been inappropriately changed since it was written ten years ago.  Many encryption products will also add additional information to the encrypted data that is verified upon decryption to ensure that the data was not changed.  These hashing algorithms also have keys, and again the strength of the hash depends on the length of the key, as does the time it takes to hash the data originally and then again during the decryption.  The hashing information has to be stored or transmitted with the encrypted record, and that extra data is approximately the size of the hashing key.  For Data-in-Motion, this can sometimes almost double the size of the message to be transmitted.  In most cases, there is enough available band pass that it has no real affect on the network, but is something that must be considered.  For Data-at-Rest, because the data is stored and read in blocks that are very large, the overhead is usually only a few percent of the total size.

The bottom line is that there are existing solutions to the encryption issue for both Data-in-Motion and Data-at-Rest, but only a few of them are actually designed to work in the Cloud, and some may actually interfere with the CSP’s operation.  As one example, many CSPs promise to reduce the size of your storage (and thus its cost) by de-duplication: looking for the same data in multiple files and only storing it once.  Think of an email that has been passed back and forth a dozen times with each person adding a paragraph or two.  You end up with the most of that resulting email stored a dozen times in each person’s email files.  By de-duplication you might reduce the total storage by an order of magnitude.  However, de-duplication doesn’t work on encrypted data because the same “phrase” will look different after being encrypted based on exactly where it is in the data.  Compression algorithms also do not work well on encrypted data, and may in fact actually increase the size of the data.

Conclusion

The security concerns around storing data in the cloud are not inherently unique compared to data that is stored within the premises of an organisation.  That is not to say that the risks to the data in these very different environments are alike.  Ultimately, the key is to understand the data protection options you have available and implement a sound encryption strategy for protecting your sensitive data.