Decrypting Open Office documents


Office Open XML decription code

At the end of June Microsoft released many documents which aim to provide normative information about various the protocols and file formats used by Microsoft products. Among these documents is MS_OFFCRYPTO which provides information about the algorithms used to encrypt Microsoft Office documents (new and old) and which are available under the terms of Microsoft’s Open Specification Promise.

The really useful information in this document are the algorithms described in sections 2.3.4.7 and 2.3.4.9.  Section 2.3.4.7 describes a pseudo function to generate an encryption key that can be used to decrypt the encrypted package (Office 2007 zip file).  Section 2.3.4.9 describes the algorithm to use to determine if the supplied password is valid.

This post is to provide and document the code I’ve created to decrypt an encrypted Office 2007 document.

Initially I was unsuccessful in my attempt to write code to decrypt encrypted Office documents and I have to thank David LeBlanc for his help and patience guiding me to a solution.  David wrote the MS-OFFCRYPTO document so I’ve been very fortunate to have had such an expert guide. 

Update 2008-12-12: David LeBlanc has now published the corrected version of the Office Crypto documentation so I’m making my sample code available under a creative common share-alike licence.

Update 2012-04-04: @Webie has created a C implementation of the password validation routines for use on Linux using OpenSSL and libgsf (to read OLE Storage files). At this time, support is provided for 2007 and 2010 Agile encryption. You will find his implementation here: https://github.com/magnumripper/magnum-jumbo

Reading the MS-OFFCRYPTO document

There’s a lot of stuff in the MS-OFFCRYPTO document which is necessary in theory, so Microsoft *has* to document it, but which is overkill when considering the needs of just decrypting an Office 2007 document.

While a normal Office 2007 document is a zip file, an encrypted document is an OLE storage file.  While an Office 2007 document can be manipulated using the System.Security.Cryptography.Package class, encrypted documents must be handled using the Windows API Storage interfaces and functions.  The code that will be attached contains a class that wraps these interfaces functions so you will be able to open and access the file contents from a C#.

The reason the file is a storage file rather a zip file seems to be because the Office 2007 team tried to use the Microsoft DRM to implement Office encryption.  Micrsoft’s DRM technology stores both the payload (the encrypted document) and other information decribing the encryption algorithms and other transforms used to obfuscate the payload.  A DRM compliant application can use this information to decode the payload (assuming the application knows the password or license).  As a result the DRM technology allows a producer application to use any arbitrary encryption algorithms to create an encrypted payload and describe these algorithms so an capable consumer can decode the payload.

However, Office 2007 doesn’t really use the whole DRM infrastructure when encrypting Office documents.  Presumably DRM is used so that if a company wants to use the DRM infrastructure to encrypt documents using some proprietary algorithm they can. 

In some senses the Office team implements a proprietary encryption mechanism but for some reason, they chose to do so in a way that is not (cannot?) be descibed in DRM compliant terms.  A measure of the impact of this approach is that the System.Security.Cryptography.Package class is unable to open an encrpted Office file.

On the plus side, it does mean there’s no need to plough through all the DRM encrption/transform descriptions.  Instead, you can take a shortcut and read just two streams from the storage file and ignore the rest!  The streams to read are EncryptionInfo and EncryptedPackage.

EncryptionInfo stream

The code reads the storage file to access this stream which is parsed to retrieve information such as the encryption and hashing algorithms to use, keys sizes and various byte blocks used to verify the password.

Although the contents of this stream are documented in the file, to faciliate understanding of the structure, a hex dump of the content of a sample encryption info stream is included in the comments at the beginning of the code file and reproduced below.

The sample content in the dump is taken from a .xlsb storage file encrypted using the password “password” (without the quotes).

The code

The code is a single C# class with a single entry point which can be called as:

Package OfficeCrypto.OpenEncryptedOfficeFile(string filename, string password);

It takes the name of the encrypted file, the password used to encrypt it and returns a System.IO.Packaging.Package instance.  If there are errors along the way – for example the password may be wrong – it will generate exceptions you need to catch.

I’ve tried to document it reasonably thoroughly and reference relevant parts of sections 2.3.4.7 and 2.3.4.9. In fact, the code is not optimized deliberately so its structure and the variable names used can closely resemble the algorithm descriptions. Refactoring this code may make it appreciably quicker

Because there is a lot of code, I’ve added *lots* of “regions” so you can start by collapsing all regions then “drill-in” to areas of the code to get a good idea of the structure and functions available.

AES and SHA1 implementations

Being managed code it uses the managed code implementations of SHA1 and AES (in the System.Security.Cryptography namespace). To verify these implementations return expected values when used, there are two test functions: one to test SHA1 and one to test AES. The test functions use known inputs and verify the results against expected return values. These test values are taken from articles on Wikipedia and links to these articles are included in the code comments.

Generating the encryption key

This operation is performed by the function GeneratePasswordHashUsingSHA1 and is the heart of the code. Its also the piece of this code that does not appear to work for me without David LeBlanc’s insight.

The clue I think I’m OK providing is that you need to ignore the strictures of step 3 of section 2.3.4.7 and include step 4(a)  even if the algorithm you are instructed to use is AES128. 

Update: Some of the following comments are no longer required because the new version of MS-OFFCRYPTO is updated to include the same comments.

It seemes that, by default, Office 2007 documents are encrypted using AES128 and the encryption key is generated using SHA1.  The AES128 block size is 16 (0x10) bytes (128/8).  SHA1 will always generate a 20 (0x14) byte key.  Step 3 of 2.3.4.7 says if the key size is greater than the block size (which it is when AES128 and SHA1 are used) just take the first 16 (0x10) bytes of the SHA1 key.  However this doesn’t work for me.

Including step 4(a) of section 2.3.4.7 and using the first 16 (0x10) bytes of the hash generated by this step does work for me.  Maybe it will work for you as well.

The other clue is that you should not try to use the CryptoAPI and should instead use the Rijndael (or AES) managed class (in the System.Security.Cryptography namespace).  When you read the documentation about the EncryptionInfo stream contents you will see that the codes defining the encryption and hashing algorithms to use are exactly those defined in WinCrypt.h which might lead you toward the CryptoAPI.

However, although its not stated, the encryption/decryption algorithms do not use padding.  Maybe its me, but I can’t figure out how to use the CryptoAPI without a padding mode.  So far as I can tell, whenever a block cipher like AES is used the CryptoAPI will *always* use a PKCS5 padding.  When I try to use any other padding mode the API always returns an error.

By default the Rijndael (or AES) implementations also uses PKCS5 so you need to explicitly set a padding mode of None.  But at least this is an option with the managed code implementation.

Hex dumps

While the documentation in MS-OFFCRYPTO does include descriptions of algorithms, it doesn’t contain any conformance data so when the decryption process doesn’t work and you need to debug your code it’s not possible to tell which aspect of the decryption process hasn’t worked correctly.  The hex dumps below are provided to try and provide a simple form of conformance data.

!! HEALTH WARNING !!

These dumps are provided as examples and “as-is”. They work for me though there’s no saying they will work for you so I’m making no promises as to their fitness for any specific purpose.

Let me first reproduce the hex dump I include in the comments of the code.  This is a dump of the EncryptionInfo stream created when I password protected an Office 2007 workbook using the password “password

/// 00000000 03 00 02 00 Version<br />
/// 00000004 24 00 00 00 Flags (fCryptoAPI + fAES)<br />
/// 00000008 A4 00 00 00 Header length<br />
/// 0000000C 24 00 00 00 Flags (again)<br />
/// 00000010 00 00 00 00 Size extra<br />
/// 00000014 0E 66 00 00 Alg ID<br />
AlgID 0x0000660E = 128-bit AES,<br />
AlgID 0x0000660F = 192-bit AES,<br />
AlgID 0x00006610 = 256-bit AES<br />
/// 00000018 04 80 00 00 Alg hash ID 0x00008004 SHA1<br />
/// 0000001C 80 00 00 00 Key size<br />
0x00000080 = 128-bit<br />
0x000000C0 = 192 bit<br />
0x00000100 = 256-bit<br />
/// 00000020 18 00 00 00 Provider type 0x00000018 AES<br />
/// 00000024 A0 C7 DC 02 00 00 00 00 Reserved<br />
/// 0000002C 4D 00 69 00 M?i? CSP Name<br />
/// 00000030 63 00 72 00 6F 00 73 00 6F 00 66 00 74 00 20 00 c?r?o?s?o?f?t? ?<br />
/// 00000040 45 00 6E 00 68 00 61 00 6E 00 63 00 65 00 64 00 E?n?h?a?n?c?e?d?<br />
/// 00000050 20 00 52 00 53 00 41 00 20 00 61 00 6E 00 64 00 ?R?S?A? ?a?n?d?<br />
/// 00000060 20 00 41 00 45 00 53 00 20 00 43 00 72 00 79 00 ?A?E?S? ?C?r?y?<br />
/// 00000070 70 00 74 00 6F 00 67 00 72 00 61 00 70 00 68 00 p?t?o?g?r?a?p?h?<br />
/// 00000080 69 00 63 00 20 00 50 00 72 00 6F 00 76 00 69 00 i?c? ?P?r?o?v?i?<br />
/// 00000090 64 00 65 00 72 00 20 00 28 00 50 00 72 00 6F 00 d?e?r? ?(?P?r?o?<br />
/// 000000A0 74 00 6F 00 74 00 79 00 70 00 65 00 29 00 00 00 t?o?t?y?p?e?)<br />
///<br />
/// 000000B0 10 00 00 00 Salt size<br />
/// 000000B4 90 AC 68 0E 76 F9 43 2B 8D 13 B7 1D Salt<br />
/// 000000C0 B7 C0 FC 0D<br />
/// 000000C4 43 8B 34 B2 C6 0A A1 E1 0C 40 81 CE Encrypted verifier<br />
/// 000000D0 83 78 F4 7A<br />
/// 000000D4 14 00 00 00 Hash length<br />
/// 000000D8 48 BF F0 D6 C1 54 5C 40 EncryptedVerifierHash<br />
/// 000000E0 FE 7D 59 0F 8A D7 10 B4 C5 60 F7 73 99 2F 3C 8F<br />
/// 000000F0 2C F5 6F AB 3E FB 0A D5<br />

OK, it doesn’t look quite the same as in the code file because WordPress removes all the nice whitespace padding though I hope you can still easily see the structure.

This stream tells the code to use AES 128 (offset 0x00000014) and SHA1 (offset 0x00000018). It also specifies the salt size to use (offset 0x000000B0) and the salt used when encrypting the document (offset 0x000000B4-0x000000C3). The 16 (0x10) byte encrypted verifier at (offset 0x000000C4) and the 32 (0x20) byte encrypted hash of the verifier at (offset 0x000000D8) are for use when verifying the password (see below).

The first step is to hash the salt and password. In this case it’s a 16 byte salt and 16 bytes of password (the unicode representation of “password”). The result is the following 20 (0x14) byte hash:

00000000 A1 21 9D 6D 2D 77 A1 92 EA 2F A2 E6 E3 7B C8 60<br />
00000010 CF EF 5F DE<br />

The algorithm then has to iterate from 0..49999 concatenating the iteration number (4 bytes) and the previous hash result to generate a new hash. After the zeroth iteration (i==0) this is what I see:

00000000 8B 33 F7 48 FA 35 AF BB 34 22 E8 AC D7 C6 DA E1<br />
00000010 8A F1 81 78<br />

At the end of the iteration (after hashing with i==49999) I see:

00000000 7D C5 97 D9 01 2A A3 E0 B8 56 3B 56 69 00 06 10<br />
00000010 CC C3 A6 D4<br />

Next the last hash generated by the iterator has to be hashed with four zero byte (what the documentation calls “block 0″). In the iterator the hash is appended to the iterator count then hashed. Here the four zero bytes are appended to the hash. Anyway here’s my result.

00000000 A6 65 59 03 FD 23 94 C8 83 1E 71 62 D7 8B 42 55<br />
00000010 51 B9 14 E4<br />

One of the clues given above is to include step 4(a) of the key derivation algorithm in all cases. After this step I see:

00000000 AC 7C 92 51 7C 31 2F B0 9F E9 32 E9 C0 62 D9 12<br />
00000010 38 29 30 35<br />

AES128 has a block size of 16 (0x10) bytes so take just the first 16 (0x10) bytes:

00000000 AC 7C 92 51 7C 31 2F B0 9F E9 32 E9 C0 62 D9 12<br />

Now you have a key that should successfully decrypt the payload (though don’t forget the first 8 bytes specify the length of the unencrypted content and should be removed before the payload is decrypted.

However you can follow the algorithm in 2.3.4.9 to verify the key you’ve generated.

The first step is to decode the 16 (0x10) byte encrypted verifier which starts at 000000C4 in thehex  dump of encrypyion info stream above. After using the Rijndael manager cipher, setting a padding mode of none and specifying a block size of 16 (0x10) bytes or 128 (0x80) bits I see the following decrypted verifier:

00000000 11 92 99 99 FF 00 11 11 22 33 77 88 88 99 CC CC<br />

Using the same cipher and decrypting the verifier hash at 000000D8 in the hex dump above I get:

00000000 A6 D5 6B D6 51 2C E2 01 AC 0E 82 E1 EE 43 79 32<br />
00000010 6D 1C 1C BB<br />

The final step is to hash the decrypted verifier (generated in the step before last) using SHA1.

00000000 A6 D5 6B D6 51 2C E2 01 AC 0E 82 E1 EE 43 79 32<br />
00000010 6D 1C 1C BB

The last two results should be identical (and in my case they are) which confirms the key is valid and can be used to decode the payload.

And that, as they say, is all there is to it.

Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


Other Posts

Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.

Reader Comments

Thanks. It’s worked for me.

Do you know a way of determining whether the user has set a Modify password as well as an Open password for a docx?

Giles, take a look at the \settings\documentProtection element which is a root element of WordprocessingML Document Settings part.

See section 2.15.1.28 on page 1158 of “Open Office XML Part 4 – Markup Language Reference”

Hi, im trying to decrypt office documents for fun but i cannot pass over the first step.

I get
SALT={0x0D, 0xFC, 0xC0, 0xB7, 0x1D, 0xB7, 0x13, 0x8D, 0x2B, 0x43, 0xF9, 0x76, 0x0E, 0x68, 0xAC, 0x90};
and
PASSWORD={0x00, 0x70, 0x00, 0x61, 0x00, 0x73, 0x00, 0x73, 0x00, 0x77, 0x00, 0x6f, 0x00, 0x72, 0x00, 0x64};

concat them and hash the result, but i dont get the result you had doing, presumably, the same, wich is:
00000000 A1 21 9D 6D 2D 77 A1 92 EA 2F A2 E6 E3 7B C8 60
00000010 CF EF 5F DE

I dont know where the mistake is.
Thanks.

Olav, I’ve updated the article to include the code. Hopefully you will be able to use the code to decrypt the document then compare what I do and what you are doing to identify any differences.

Good luck

Thanks, i didnt considered to encode the password as ‘reversed’ unicode
0x70 0x00 instead of 0x00 0x70.
Thanks.

Well done.

But I’m having a hard time decrypting a file with NO password.
I’ve read the specification (part 2.15.1.28) and microsoft says: “First, the password shall be hashed using the following algorithm: If the password is empty, return 0.”

I tried, unsuccessfully:
– with password = “0”
– with encrypted key = byte[0]
– with encrypted key = byte[1]{0}
– with GeneratePasswordHashUsingSHA1 which returns “0”
– with GeneratePasswordHashUsingSHA1 which returns byte[0]
– with GeneratePasswordHashUsingSHA1 which returns byte[1]{0}
– …

I’m lost, what should I do?

If there’s no password, you don’t unencrypt the file. What I mean is that the file will not be encrypted without a password. Instead you just read the contents.

Thanks for your answer but the file seems to be encrypted.
I can get the encryption information and when I get the table of bytes and I cut the data (the first 8 bytes and the end bytes), Package.Open() returns an exception.

Source code added in method “DecryptInternal”:
long length = BitConverter.ToInt64(encryptedPackage, 0); if (!string.IsNullOrEmpty(password))
{
// Encryption key generation
// Password verification
// Decrypt
}
else
{
Array.Copy(encryptedPackage, 8, encryptedPackage, 0, encryptedPackage.Length – 8);
}

byte[] result = encryptedPackage;
if (encryptedPackage.Length > length)
{
result = new byte[length];
Array.Copy(encryptedPackage, result, result.Length);
}

return result;

I’ve just encrypted an Excel 2007 file but supplied no password. The file saved by Excel is not encrypted. I can tell this because the first two bytes of the file are ‘P’ and ‘K’, the signature of a Zip file. Encrypted files are OLE storage files and start with the 8 byte signature for these files.

If you have an encrypted file then either it was encrypted by something other than an Office product (why would my version of Office behave differently to yours) or the file is not encrypted.

Sorry, but I forgot to say that I protected the workbook with password but I didn’t set password protection (read/write) on this file.

My file is encrypted because I’m using OLE Storage. Since I didn’t set a password on the file, there is no way to know the desencryption key to decode the file content (I try with the password of workbook protection).

Thanks a lot

Hi J-B

This comment is being added to the blog post so anyone else reading the comments will be able to see if the same solution works.

J-B’s spreadsheet was encrypted but used the ‘internal’ or default password of VelvetSweatshop. This password is used by Excel 2003 when it encrypts a document. It seems to be retained in Excel 2007 when it saves a .xls to a xlsx or .xlsm. As a result, the workbook is encrypted but a password is not required when it is opened in Excel 2007.

However this is because Excel 2007 is silently passing this password to the decryption algorithm it uses. Because Excel does it, we must also do it.

When I use the default password to decrypt J-B’s workbook it works as expected.

[...] on Bill Seddon’s rather excellent notes I’ve implemented decrypting MS XML documents, though only tested it on .xlsx files protected [...]

Hi, I successfully decrypted the OOXML file but I get an exception whenever I try to read anything from the decrypted package. I get “Unable to perform a read operation in write-only mode”. The file has read/write privileges and the Package.FileOpenAccess value is “Read | Write”. Any ideas?

Thanks, Joe

Hi Joe,

How do you know you decrypted the stream successfully? It could be that the call to decrypt completed without error but the resulting stream contained garbage. OOXML documents are .zip files so if the decryption is successful the first two characters of the stream you try to open should be ‘PK’. What do you see?

It could also be the package encrypted is an binary file (the Office 2007 equivalent of the 2003 file format which can’t be opened with the packaging API (because they are OLE Storage files).

If the first two bytes are not PK then check to see if the first 8 bytes are the OLE storage signature bytes.

Hi Bills,

True speaking am a bit lost with MS-OFFCRYPTO is it a program or somethg that we need to modify, i dont know

Can u please help me urgently; as i have an excel doc which has been encrypted in excel 2007 n i have forgotten the password.

i hve downloaded the file liquidity.oleStorage.dll where should i use it or if better i send the file an d decrypt the password.

Hi

MS-OFFCRYPTO is Microsoft’s paper specification of the encryption methods used in Office products.

The downloads on the blog are an implementation of some of the specifications using C# and are only useful to other programmers working with Microsoft’s .NET framework.

You don’t say if you are programmer.

However, it likely will not matter. You will not be able de-crypt a 2007 workbook encypted using the 2007 encryption options if you have lost or forgotten the password.

Bill

Hi Bill,

I was thinking that may be you will be able to help for my excel document as it is very important for me lots of data will lost.

Can you suggest me a software that i can u to decrypt my excel doc.

This looks like just what I’m looking for, but the code I download doesn’t have the OfficeCrypto.OpenEncryptedOfficeFile entry point described. Am I missing some code?
Thanks

Do you know the encrypted and verify algorithm about office 2010?

@kasumi

It’s the same as described here.

There’s no difference between encryption in 2007 (explicitly mentioned here) and 2010.

However *both* are different to 2003.

Thanks Bill,
I think the minor difference is Spin count became 100,000 times in Office 2010. However, the final verification uses EncryptedVerifierHashInput and EncryptedVerifierHashValue are new to 2007, do you know what are those values implemented, or is there any example can be found? thanks a lot.

This is a great project but Office 2010 uses Agile Encryption by default. This means the EncryptionInfo data is actually held in an XML string. I’ve parsed this correctly but have got horribly stuck trying to implement password verification for Agile. Any chance you’d look at implementing this?

@Daniel

Where have you read this term ‘Agile Encryption”? There are two types of encryption available to Office 2007 & 2010: the default and DRM style.

All styles are documented in the MS-OFFCRYPT document authored by David LeBlanc.

If you are seeing an encryption style which is not implemented here, it’s likely that the administrators have applied a group policy to make Excel/Word/etc. use an alternative encryption style.

An example is the use of ADRMS which uses certificates generated by AD and tied to a Microsoft certificate chain rather than passwords as the basis of encryption.

If DRM-style encryption is used, life is simpler because you are ale to use the .NET Packaging API and the EncryptedPackageEnvelope class to decypt the files.

I’ll also add that the encyption scheme Office uses is extensible so it may be the document has been encrypted using a third party tool. However for Office to be able to open the document, the structure of the streams and files inside the OLE Storage file MUST follow the rules documented by David LeBlanc.

Hi Bill

Have a look at 2.3.4.10 in the MS-OFFCRYPTO doc (page 39) for Agile Encryption

Incidentally, also under this heading the notes on the EncryptionVersionInfo field state that the struct version will be 4.4. Opening one of these spreadshees using ooxmlcrypto results in a memory overflow because it’s reading values incorrectly for that version. David LeBlanc’s sample code on codeplex(?) checks for major version 2 or 3 and minor version 2 and just throws an exception if this isn’t met.

I’ve created a password protected Excel doc on 3 pc’s in Excel 2010 and they’ve all been persisted with a v4.4 EncryptionInfo struct The cipher alg is AES (128) and the hash alg is still SHA-1 but the implementation is quite different.

Our organisation doesn’t use third party encryption/ADRMS so I had assumed this was the default mechanism for 2010.

Bill, i dug up this forum where Microsoft have elaborated on the spec for agile. The developer who asked the question came right with their explanation – thought you might want to see it.

http://social.msdn.microsoft.com/Forums/en-US/os_binaryfile/thread/5cd6fb36-fec8-4f44-bdfd-3178c11f6768/

A worked example showing decryption using the Office Agile method is implemented in the code attached to this post

http://www.lyquidity.com/devblog/?p=85

I am trying to convert your code into C. Is their a way to directly jump to EncryptionInfo stream in a file? (essentially I want to bypass your Storage class). Thanks!

Sure. The file is in a COM Structured Storage (aka OLE Storage) format. You can use this core Windows API to read the stream. My storage class assembly is a P/Invoke wrapper around this API. The headers are in the Windows SDK and the API is documented on MSDN here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa380015(v=vs.85).aspx

I am trying to port your program to C running under *Linux*. So I won’t have access to Windows library. I have found a library(libgsf) for dealing with OLE streams.

My question is given an encrypted file, is EncryptionInfo the only stream I need to extract in order to test the password?

Thanks again!

@webie

Yes, you need only the EncryptionInfo stream. From this file you will be able to extract the hashes and salts you need in order to follow the instructions contained in the MS-OFFCRYPTO documentation to test the password and/or generate the decryption key. Of course you will also need the encrpyted payload stream itself.

An Office document also contains a bunch of other storages and streams but you do not need these. They are there to support MS-DRM and describe, in DRM terms, the techniques used to encrypt the payload. For example, in one stream you will see it reference the cipher algorithm used. In this way, an organization is able to implement their own encryption/decryption protocols.

However this is just noise because all Office documents are encrypted using the information contained in MS-OFFCRYPTO

Thanks! I was able to implement password checking for Office 2007 using libgsf (for extracting EncryptionInfo) and OpenSSL (for SHA1 and AES) under Linux.

Hello, I need decrypt password.
http://zalil.ru/33441501
I need this, and send you 100$ after decrypt. All methods, visa and etc.
Need you help! Big thanks!
My E-mail:while@xakep.ru

This is *not* a tool to decrypt passwords. It allows a programmer to decrypt an encrypted Office document *IF YOU ALREADY HAVE THE PASSWORD*.