Decrypting Open Office documents


Updated 2008-12-12: David LeBlanc has now published the corrected version of the Office Crypto documentation so I'm making my sample code available under a creative common share-alike licence.

Office Open XML decription code

At the end of June Microsoft released many documents which aim to provide normative information about various the protocols and file formats used by Microsoft products. Among these documents is MS_OFFCRYPTO which provides information about the algorithms used to encrypt Microsoft Office documents (new and old) and which are available under the terms of Microsoft's Open Specification Promise.

The really useful information in this document are the algorithms described in sections 2.3.4.7 and 2.3.4.9.  Section 2.3.4.7 describes a pseudo function to generate an encryption key that can be used to decrypt the encrypted package (Office 2007 zip file).  Section 2.3.4.9 describes the algorithm to use to determine if the supplied password is valid.

This post is to provide and document the code I've created to decrypt an encrypted Office 2007 document.

Initially I was unsuccessful in my attempt to write code to decrypt encrypted Office documents and I have to thank David LeBlanc for his help and patience guiding me to a solution.  David wrote the MS-OFFCRYPTO document so I've been very fortunate to have had such an expert guide. 

Update: David LeBlanc has blogged that a new version of MS-OFFCRYPTO is available and corrects the information making the following comment redundant.

Right now I'm not allowed to disclose a key piece of information David provided so I can't post the code yet but as soon as I can, I will.  However I think I will be able to provide a helpful tip that will allow a reader to work through the pseudo-algorithm documented in section 2.3.4.7 and create workable code on their own (see below).  I've also added dumps of key arrays created during the key generation process so you can tell whether your code is working.  Using the set of "input" arrays below and using the given password you should be able to follow along and check your code generates the same byte arrays at each step.

Reading the MS-OFFCRYPTO document

There's a lot of stuff in the MS-OFFCRYPTO document which is necessary in theory, so Microsoft *has* to document it, but which is overkill when considering the needs of just decrypting an Office 2007 document.

While a normal Office 2007 document is a zip file, an encrypted document is an OLE storage file.  While an Office 2007 document can be manipulated using the System.Security.Cryptography.Package class, encrypted documents must be handled using the Windows API Storage interfaces and functions.  The code that will be attached contains a class that wraps these interfaces functions so you will be able to open and access the file contents from a C#.

The reason the file is a storage file rather a zip file seems to be because the Office 2007 team tried to use the Microsoft DRM to implement Office encryption.  Micrsoft's DRM technology stores both the payload (the encrypted document) and other information decribing the encryption algorithms and other transforms used to obfuscate the payload.  A DRM compliant application can use this information to decode the payload (assuming the application knows the password or license).  As a result the DRM technology allows a producer application to use any arbitrary encryption algorithms to create an encrypted payload and describe these algorithms so an capable consumer can decode the payload.

However, Office 2007 doesn't really use the whole DRM infrastructure when encrypting Office documents.  Presumably DRM is used so that if a company wants to use the DRM infrastructure to encrypt documents using some proprietary algorithm they can. 

In some senses the Office team implements a proprietary encryption mechanism but for some reason, they chose to do so in a way that is not (cannot?) be descibed in DRM compliant terms.  A measure of the impact of this approach is that the System.Security.Cryptography.Package class is unable to open an encrpted Office file.

On the plus side, it does mean there's no need to plough through all the DRM encrption/transform descriptions.  Instead, you can take a shortcut and read just two streams from the storage file and ignore the rest!  The streams to read are EncryptionInfo and EncryptedPackage.

EncryptionInfo stream

The code reads the storage file to access this stream which is parsed to retrieve information such as the encryption and hashing algorithms to use, keys sizes and various byte blocks used to verify the password.

Although the contents of this stream are documented in the file, to faciliate understanding of the structure, a hex dump of the content of a sample encryption info stream is included in the comments at the beginning of the code file and reproduced below.

The sample content in the dump is taken from a .xlsb storage file encrypted using the password "password" (without the quotes).

The code

The code is a single C# class with a single entry point which can be called as:

Package OfficeCrypto.OpenEncryptedOfficeFile(string filename, string password);

It takes the name of the encrypted file, the password used to encrypt it and returns a System.IO.Packaging.Package instance.  If there are errors along the way - for example the password may be wrong - it will generate exceptions you need to catch.

I've tried to document it reasonably thoroughly and reference relevant parts of sections 2.3.4.7 and 2.3.4.9. In fact, the code is not optimized deliberately so its structure and the variable names used can closely resemble the algorithm descriptions. Refactoring this code may make it appreciably quicker

Because there is a lot of code, I've added *lots* of "regions" so you can start by collapsing all regions then "drill-in" to areas of the code to get a good idea of the structure and functions available.

AES and SHA1 implementations

Being managed code it uses the managed code implementations of SHA1 and AES (in the System.Security.Cryptography namespace). To verify these implementations return expected values when used, there are two test functions: one to test SHA1 and one to test AES. The test functions use known inputs and verify the results against expected return values. These test values are taken from articles on Wikipedia and links to these articles are included in the code comments.

Generating the encryption key

This operation is performed by the function GeneratePasswordHashUsingSHA1 and is the heart of the code. Its also the piece of this code that does not appear to work for me without David LeBlanc's insight.

The clue I think I'm OK providing is that you need to ignore the strictures of step 3 of section 2.3.4.7 and include step 4(a)  even if the algorithm you are instructed to use is AES128. 

Update: Some of the following comments are no longer required because the new version of MS-OFFCRYPTO is updated to include the same comments.

It seemes that, by default, Office 2007 documents are encrypted using AES128 and the encryption key is generated using SHA1.  The AES128 block size is 16 (0x10) bytes (128/8).  SHA1 will always generate a 20 (0x14) byte key.  Step 3 of 2.3.4.7 says if the key size is greater than the block size (which it is when AES128 and SHA1 are used) just take the first 16 (0x10) bytes of the SHA1 key.  However this doesn't work for me.

Including step 4(a) of section 2.3.4.7 and using the first 16 (0x10) bytes of the hash generated by this step does work for me.  Maybe it will work for you as well.

The other clue is that you should not try to use the CryptoAPI and should instead use the Rijndael (or AES) managed class (in the System.Security.Cryptography namespace).  When you read the documentation about the EncryptionInfo stream contents you will see that the codes defining the encryption and hashing algorithms to use are exactly those defined in WinCrypt.h which might lead you toward the CryptoAPI.

However, although its not stated, the encryption/decryption algorithms do not use padding.  Maybe its me, but I can't figure out how to use the CryptoAPI without a padding mode.  So far as I can tell, whenever a block cipher like AES is used the CryptoAPI will *always* use a PKCS5 padding.  When I try to use any other padding mode the API always returns an error.

By default the Rijndael (or AES) implementations also uses PKCS5 so you need to explicitly set a padding mode of None.  But at least this is an option with the managed code implementation.

Hex dumps

While the documentation in MS-OFFCRYPTO does include descriptions of algorithms, it doesn't contain any conformance data so when the decryption process doesn't work and you need to debug your code it's not possible to tell which aspect of the decryption process hasn't worked correctly.  The hex dumps below are provided to try and provide a simple form of conformance data.

!! HEALTH WARNING !!

These dumps are provided as examples and "as-is". They work for me though there's no saying they will work for you so I'm making no promises as to their fitness for any specific purpose.

Let me first reproduce the hex dump I include in the comments of the code.  This is a dump of the EncryptionInfo stream created when I password protected an Office 2007 workbook using the password "password"

/// 00000000 03 00 02 00 Version
/// 00000004 24 00 00 00 Flags (fCryptoAPI + fAES)
/// 00000008 A4 00 00 00 Header length
/// 0000000C 24 00 00 00 Flags (again)
/// 00000010 00 00 00 00 Size extra
/// 00000014 0E 66 00 00 Alg ID
AlgID 0x0000660E = 128-bit AES,
AlgID 0x0000660F = 192-bit AES,
AlgID 0x00006610 = 256-bit AES
/// 00000018 04 80 00 00 Alg hash ID 0x00008004 SHA1
/// 0000001C 80 00 00 00 Key size
0x00000080 = 128-bit
0x000000C0 = 192 bit
0x00000100 = 256-bit
/// 00000020 18 00 00 00 Provider type 0x00000018 AES
/// 00000024 A0 C7 DC 02 00 00 00 00 Reserved
/// 0000002C 4D 00 69 00 M?i? CSP Name
/// 00000030 63 00 72 00 6F 00 73 00 6F 00 66 00 74 00 20 00 c?r?o?s?o?f?t? ?
/// 00000040 45 00 6E 00 68 00 61 00 6E 00 63 00 65 00 64 00 E?n?h?a?n?c?e?d?
/// 00000050 20 00 52 00 53 00 41 00 20 00 61 00 6E 00 64 00 ?R?S?A? ?a?n?d?
/// 00000060 20 00 41 00 45 00 53 00 20 00 43 00 72 00 79 00 ?A?E?S? ?C?r?y?
/// 00000070 70 00 74 00 6F 00 67 00 72 00 61 00 70 00 68 00 p?t?o?g?r?a?p?h?
/// 00000080 69 00 63 00 20 00 50 00 72 00 6F 00 76 00 69 00 i?c? ?P?r?o?v?i?
/// 00000090 64 00 65 00 72 00 20 00 28 00 50 00 72 00 6F 00 d?e?r? ?(?P?r?o?
/// 000000A0 74 00 6F 00 74 00 79 00 70 00 65 00 29 00 00 00 t?o?t?y?p?e?)
///
/// 000000B0 10 00 00 00 Salt size
/// 000000B4 90 AC 68 0E 76 F9 43 2B 8D 13 B7 1D Salt
/// 000000C0 B7 C0 FC 0D
/// 000000C4 43 8B 34 B2 C6 0A A1 E1 0C 40 81 CE Encrypted verifier
/// 000000D0 83 78 F4 7A
/// 000000D4 14 00 00 00 Hash length
/// 000000D8 48 BF F0 D6 C1 54 5C 40 EncryptedVerifierHash
/// 000000E0 FE 7D 59 0F 8A D7 10 B4 C5 60 F7 73 99 2F 3C 8F
/// 000000F0 2C F5 6F AB 3E FB 0A D5

OK, it doesn't look quite the same as in the code file because Wordpress removes all the nice whitespace padding though I hope you can still easily see the structure.

This stream tells the code to use AES 128 (offset 0x00000014) and SHA1 (offset 0x00000018). It also specifies the salt size to use (offset 0x000000B0) and the salt used when encrypting the document (offset 0x000000B4-0x000000C3). The 16 (0x10) byte encrypted verifier at (offset 0x000000C4) and the 32 (0x20) byte encrypted hash of the verifier at (offset 0x000000D8) are for use when verifying the password (see below).

The first step is to hash the salt and password. In this case it's a 16 byte salt and 16 bytes of password (the unicode representation of "password"). The result is the following 20 (0x14) byte hash:

00000000 A1 21 9D 6D 2D 77 A1 92 EA 2F A2 E6 E3 7B C8 60
00000010 CF EF 5F DE

The algorithm then has to iterate from 0..49999 concatenating the iteration number (4 bytes) and the previous hash result to generate a new hash. After the zeroth iteration (i==0) this is what I see:

00000000 8B 33 F7 48 FA 35 AF BB 34 22 E8 AC D7 C6 DA E1
00000010 8A F1 81 78

At the end of the iteration (after hashing with i==49999) I see:

00000000 7D C5 97 D9 01 2A A3 E0 B8 56 3B 56 69 00 06 10
00000010 CC C3 A6 D4

Next the last hash generated by the iterator has to be hashed with four zero byte (what the documentation calls "block 0"). In the iterator the hash is appended to the iterator count then hashed. Here the four zero bytes are appended to the hash. Anyway here's my result.

00000000 A6 65 59 03 FD 23 94 C8 83 1E 71 62 D7 8B 42 55
00000010 51 B9 14 E4

One of the clues given above is to include step 4(a) of the key derivation algorithm in all cases. After this step I see:

00000000 AC 7C 92 51 7C 31 2F B0 9F E9 32 E9 C0 62 D9 12
00000010 38 29 30 35

AES128 has a block size of 16 (0x10) bytes so take just the first 16 (0x10) bytes:

00000000 AC 7C 92 51 7C 31 2F B0 9F E9 32 E9 C0 62 D9 12

Now you have a key that should successfully decrypt the payload (though don't forget the first 8 bytes specify the length of the unencrypted content and should be removed before the payload is decrypted.

However you can follow the algorithm in 2.3.4.9 to verify the key you've generated.

The first step is to decode the 16 (0x10) byte encrypted verifier which starts at 000000C4 in thehex  dump of encrypyion info stream above. After using the Rijndael manager cipher, setting a padding mode of none and specifying a block size of 16 (0x10) bytes or 128 (0x80) bits I see the following decrypted verifier:

00000000 11 92 99 99 FF 00 11 11 22 33 77 88 88 99 CC CC

Using the same cipher and decrypting the verifier hash at 000000D8 in the hex dump above I get:

00000000 A6 D5 6B D6 51 2C E2 01 AC 0E 82 E1 EE 43 79 32
00000010 6D 1C 1C BB

The final step is to hash the decrypted verifier (generated in the step before last) using SHA1.

00000000 A6 D5 6B D6 51 2C E2 01 AC 0E 82 E1 EE 43 79 32
00000010 6D 1C 1C BB

The last two results should be identical (and in my case they are) which confirms the key is valid and can be used to decode the payload.

And that, as they say, is all there is to it.

Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


Other Posts
Clues to Excel calculation performance
Can you buy Olympic gold?

Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.

Reader Comments

Thanks. It’s worked for me.

Do you know a way of determining whether the user has set a Modify password as well as an Open password for a docx?

Giles, take a look at the \settings\documentProtection element which is a root element of WordprocessingML Document Settings part.

See section 2.15.1.28 on page 1158 of “Open Office XML Part 4 - Markup Language Reference”

Hi, im trying to decrypt office documents for fun but i cannot pass over the first step.

I get
SALT={0×0D, 0xFC, 0xC0, 0xB7, 0×1D, 0xB7, 0×13, 0×8D, 0×2B, 0×43, 0xF9, 0×76, 0×0E, 0×68, 0xAC, 0×90};
and
PASSWORD={0×00, 0×70, 0×00, 0×61, 0×00, 0×73, 0×00, 0×73, 0×00, 0×77, 0×00, 0×6f, 0×00, 0×72, 0×00, 0×64};

concat them and hash the result, but i dont get the result you had doing, presumably, the same, wich is:
00000000 A1 21 9D 6D 2D 77 A1 92 EA 2F A2 E6 E3 7B C8 60
00000010 CF EF 5F DE

I dont know where the mistake is.
Thanks.

Olav, I’ve updated the article to include the code. Hopefully you will be able to use the code to decrypt the document then compare what I do and what you are doing to identify any differences.

Good luck

Thanks, i didnt considered to encode the password as ‘reversed’ unicode
0×70 0×00 instead of 0×00 0×70.
Thanks.

Well done.

But I’m having a hard time decrypting a file with NO password.
I’ve read the specification (part 2.15.1.28) and microsoft says: “First, the password shall be hashed using the following algorithm: If the password is empty, return 0.”

I tried, unsuccessfully:
- with password = “0″
- with encrypted key = byte[0]
- with encrypted key = byte[1]{0}
- with GeneratePasswordHashUsingSHA1 which returns “0″
- with GeneratePasswordHashUsingSHA1 which returns byte[0]
- with GeneratePasswordHashUsingSHA1 which returns byte[1]{0}
- …

I’m lost, what should I do?

If there’s no password, you don’t unencrypt the file. What I mean is that the file will not be encrypted without a password. Instead you just read the contents.

Thanks for your answer but the file seems to be encrypted.
I can get the encryption information and when I get the table of bytes and I cut the data (the first 8 bytes and the end bytes), Package.Open() returns an exception.

Source code added in method “DecryptInternal”:
long length = BitConverter.ToInt64(encryptedPackage, 0); if (!string.IsNullOrEmpty(password))
{
// Encryption key generation
// Password verification
// Decrypt
}
else
{
Array.Copy(encryptedPackage, 8, encryptedPackage, 0, encryptedPackage.Length - 8);
}

byte[] result = encryptedPackage;
if (encryptedPackage.Length > length)
{
result = new byte[length];
Array.Copy(encryptedPackage, result, result.Length);
}

return result;

I’ve just encrypted an Excel 2007 file but supplied no password. The file saved by Excel is not encrypted. I can tell this because the first two bytes of the file are ‘P’ and ‘K’, the signature of a Zip file. Encrypted files are OLE storage files and start with the 8 byte signature for these files.

If you have an encrypted file then either it was encrypted by something other than an Office product (why would my version of Office behave differently to yours) or the file is not encrypted.

Sorry, but I forgot to say that I protected the workbook with password but I didn’t set password protection (read/write) on this file.

My file is encrypted because I’m using OLE Storage. Since I didn’t set a password on the file, there is no way to know the desencryption key to decode the file content (I try with the password of workbook protection).

Thanks a lot

Hi J-B

This comment is being added to the blog post so anyone else reading the comments will be able to see if the same solution works.

J-B’s spreadsheet was encrypted but used the ‘internal’ or default password of VelvetSweatshop. This password is used by Excel 2003 when it encrypts a document. It seems to be retained in Excel 2007 when it saves a .xls to a xlsx or .xlsm. As a result, the workbook is encrypted but a password is not required when it is opened in Excel 2007.

However this is because Excel 2007 is silently passing this password to the decryption algorithm it uses. Because Excel does it, we must also do it.

When I use the default password to decrypt J-B’s workbook it works as expected.

[…] on Bill Seddon’s rather excellent notes I’ve implemented decrypting MS XML documents, though only tested it on .xlsx files protected […]

Hi, I successfully decrypted the OOXML file but I get an exception whenever I try to read anything from the decrypted package. I get “Unable to perform a read operation in write-only mode”. The file has read/write privileges and the Package.FileOpenAccess value is “Read | Write”. Any ideas?

Thanks, Joe

Hi Joe,

How do you know you decrypted the stream successfully? It could be that the call to decrypt completed without error but the resulting stream contained garbage. OOXML documents are .zip files so if the decryption is successful the first two characters of the stream you try to open should be ‘PK’. What do you see?

It could also be the package encrypted is an binary file (the Office 2007 equivalent of the 2003 file format which can’t be opened with the packaging API (because they are OLE Storage files).

If the first two bytes are not PK then check to see if the first 8 bytes are the OLE storage signature bytes.

Hi Bills,

True speaking am a bit lost with MS-OFFCRYPTO is it a program or somethg that we need to modify, i dont know

Can u please help me urgently; as i have an excel doc which has been encrypted in excel 2007 n i have forgotten the password.

i hve downloaded the file liquidity.oleStorage.dll where should i use it or if better i send the file an d decrypt the password.

Hi

MS-OFFCRYPTO is Microsoft’s paper specification of the encryption methods used in Office products.

The downloads on the blog are an implementation of some of the specifications using C# and are only useful to other programmers working with Microsoft’s .NET framework.

You don’t say if you are programmer.

However, it likely will not matter. You will not be able de-crypt a 2007 workbook encypted using the 2007 encryption options if you have lost or forgotten the password.

Bill

Hi Bill,

I was thinking that may be you will be able to help for my excel document as it is very important for me lots of data will lost.

Can you suggest me a software that i can u to decrypt my excel doc.

This looks like just what I’m looking for, but the code I download doesn’t have the OfficeCrypto.OpenEncryptedOfficeFile entry point described. Am I missing some code?
Thanks