Reverse Engineering Evernote Penultimate (or: When is a picture not a picture?)

In this post Alex Caithness takes a look at “Penultimate” on the iPad and discovers that a picture paints a thousand words… but only once you work out how that picture is stored.

Recently we came up against an iPad note-taking application by the name of “Penultimate”. As a user it’s actually really quite a nice app to use: you can create multiple “notebooks” each with as many pages as you require on which you can jot down notes, draw diagrams or just doodle with a stylus or your finger. For a user it’s nice, simple and intuitive, for an intrepid forensic analyst, it’s not quite as straight-forward…

Penultimate Hello World Screenshot

Penultimate Screenshot

After installing the application on a test iPad, creating a couple of notebooks and having a doodle in each of them, I performed an extraction of the device. Poking through the application’s folder, this is the layout of the files which we can acquire with a backup-style acquisition:

com.cocoabox.penultimate
└───Library
 ├───Preferences
 │     com.cocoabox.penultimate.plist
 │ 
 └───Private Documents
 │     notebookList
 │ 
 └───notebooks
 │     notebookListBackup
 │ 
 ├───23FDA4FF-CE5E-4353-8D9A-B0C3E3E8AAE7
 │     100F53F2-BD1F-4046-8104-714E42264DAE
 │ 
 └───DD47D226-0CB3-460F-A0F9-3A7E2A795B3D
       BBDC9CD6-2685-4AFF-BCB4-D370F975A3F1
       BEB76173-1F62-472B-B0DC-F69EF039A59F
       C1766626-9BD2-44F9-A2DD-D7C4F23367A8

Now, ideally what we’d liked to have seen at this point are some image files, a nice friendly JPEG, GIF or PNG or two would have been just lovely. But no, it’s not going to be that easy (and, in fairness, if it had been I wouldn’t be bothering with this blog would I?).

So what do we actually have? Well, we can see the familiar “Library” folder in which we find the (also familiar) “Preferences” folder which contains a property list, the contents of which are mostly mundane (though we do have application launch counts and dates which may be of interest in some situations). More interesting is the “Private Documents” folder and the files and folders which are contained within it.

In the root of the Private Documents folder is a binary property list file named “notebookList”. It is a ‘NSKeyedArchiver’ plist, which means, as we discussed in a previous blog, it needs some additional processing in order for us to see the actual structure of the data. Luckily I had PIP and the “ccl_bplist” Python module to hand, both of which can perform that transformation. With the data structure untangled I proceeded with digging around to see what, if anything, we could extract from the “notebookList”.

Among some less interesting data, the root object contains a list of the notebook objects which are stored by the device, under the “notebooks” key.

notebookList in PIP

notebookList in PIP

Each of the notebook objects (which are represented by dictionaries in the property list) contain the keys: ‘name’, ‘title’, ‘pageNames’, ‘created’, ‘modified’,  ‘changeCount’, ‘blankPages’, ‘creatingDeviceId’, ‘editingPageName’, ‘pageDataIsCooked’, ‘versionMajor’, ‘versionMinor’, ‘pagePaperStyles’, ‘paperStyle’, ‘imported’ , ‘coverColor’, ‘originalName’.

The first few there are of immediate interest: the value of “title” does indeed contain the title of the notebooks that I created during my testing and the “created” and “modified” timestamps also ring true. The values of the “name” keys are familiar as well, albeit for a different reason; in my test data we see the “name” values “DD47D226-0CB3-460F-A0F9-3A7E2A795B3D” and “23FDA4FF-CE5E-4353-8D9A-B0C3E3E8AAE7” – matching the directory names that were extracted underneath the “notebooks” folder (see above). Beyond this, the “pageName” key contains a list of values which also match the names of files in each of the “name” directories.

So, with the “notebookList” file we have some useful metadata and a helpful guide to how the other files are organised, but there’s still no sign of the content of the notes themselves. Delving deeper into the folder structure, our next stop is one of the files which was named in the “pageName” list mentioned above.

Opening one of the “page” files we find another “NSKeyedArchiver” property list. After unravelling the structure of the file we find a top-level object containing further metadata (including a “blankDate” which appears to match the “created” timestamp reported in the “notebookList” and the dimensions of the note) along with a list of “layers”. Each of the “layer” objects (again represented by dictionaries) have keys for the layer’s colour (more on that later) the layer’s dimensions and a list of “layerRects” – sections of the layer where the user has drawn their notes; and that’s where we finally find the image itself.

Sort of.

Structure of the page object

Structure of the page object

Each of the “layerRects” objects are represented by (and this shouldn’t be a surprise by now) a dictionary. There are two keys: firstly “rect” which contains a string of two sets of “x,y” co-ordinates: the bounds of this layerRect on the page. The second key is “values” which requires a little extra explanation. As noted in my previous post on NSKeyedArchiver files, their function is to represent a serialisation of objects; the object under the “values” key is a “CBMutableFloatArray” which is programmer talk for a list of floating-point numbers. With this in mind we quite reasonably expect to see just that – a list of floating point numbers; but no, instead we get a data field containing binary data! As nothing else in this data had been straight-forward this was disappointing, but by this point didn’t surprise me. Floating-point numbers (like any numerical data) can be represented quite happily in binary, so I set about trying to turn the binary data into a list of floating-point numbers. I extracted one of these data fields and with a nice Python one-liner did just that:

>>> struct.unpack("<18f", data)
(282.0, 589.0, 5.506037712097168, 281.8666687011719, 588.2999877929688, 5.506037712097168, 281.73333740234375, 587.5999755859375, 5.506037712097168, 281.5999755859375, 586.9000244140625, 5.506037712097168, 281.4666442871094, 586.2000122070312, 5.506037712097168, 281.33331298828125, 585.5, 5.506037712097168)

So now we can see the numbers, but what did they mean? The meaning becomes clearer if we group them into threes:

282.0, 589.0, 5.506037712097168
281.8666687011719, 588.2999877929688, 5.506037712097168
281.73333740234375, 587.5999755859375, 5.506037712097168
281.5999755859375, 586.9000244140625, 5.506037712097168
281.4666442871094, 586.2000122070312, 5.506037712097168
281.33331298828125, 585.5, 5.506037712097168

What we have here is sets of x-y co-ordinates and another value (the meaning of which will become clearer later). So, here, finally is our drawing; stored as a list of co-ordinates rather than any conventional graphics format – the combination of the points in the “layerRects” in each of the layer objects gives you the full picture.

But how to present this data? A list of co-ordinates like this is of little to no use; we want to see the image itself. So I got to thinking: how does the application treat these co-ordinates? Well, when the user draws in the application they are essentially ‘painting’: moving a circular ‘brush’ around the screen making up the lines in the drawing, so if I was able to plot a circle at each of these co-ordinates perhaps I would see the picture? This thought process also gave me an idea as to what the third value in each of the co-ordinate groupings might represent: part of what makes the drawings look natural is that the “line” that is drawn is not of a uniform width, it grows and shrinks with the speed at which the finger or stylus moves, giving an impression of weight – perhaps the final value related to this?

So how to plot these co-ordinates? First of all I looked at various Python imaging libraries (PIL and PyGame both came on my radar), but there were issues with both (especially with PIL which still lacks a proper Py3k port) so I turned my attention to an alternative solution: SVG. SVG (Scalable Vector Graphics) is a graphics format specified by the W3C; it uses XML to describe 2D graphics. Because SVG just uses XML, no imaging libraries would be required; I could simply generate textual mark-up describing where each of my circular points should be plotted on the screen. Taking the data above extracted above as an example, the mark-up would be along the lines of:

<!DOCTYPE svg>
<svg height="865" version="1.1" viewBox="0 0 718 865" 
     width="718" xmlns="http://www.w3.org/2000/svg">
    <circle cx="282.0" cy="589.0" 
            fill="#000000" r="5.506037712097168"/>
    <circle cx="281.8666687011719" cy="588.2999877929688" 
            fill="#000000" r="5.506037712097168"/>
    <circle cx="281.73333740234375" cy="587.5999755859375" 
            fill="#000000" r="5.506037712097168"/>
    <circle cx="281.5999755859375" cy="281.5999755859375" 
            fill="#000000" r="5.506037712097168"/>
    <circle cx="281.4666442871094" cy="586.2000122070312" 
            fill="#000000" r="5.506037712097168"/>
    <circle cx="281.33331298828125" cy="585.5" 
            fill="#000000" r="5.506037712097168"/>
</svg>

Each “circle” tag has an x and y co-ordinate (“cx” and “cy”) a fill colour (here expressed as an html colour code – black in this case) and a radius (“r”). Running some tests gave some good output; but things weren’t quite right, firstly, and most importantly, the image was mirrored in the y-axis, presumably caused by a difference of opinion between Penultimate and SVG as to where the y-axis originates, an easy fix (just subtract the ‘y’ value from the height of the image and use that value instead). Also, the lines in the writing all looked very chubby, making writing very tricky to read. Theorising that this was caused by Penultimate storing diameters and SVG requiring radii, I halved the value which improved things, but comparing the output with what I could see on the screen things still weren’t quite looking right so on a whim I halved the value again which made things look right (I’m not entirely sure why it should be the case that you have to quarter the value – it may be to do with penultimate adding a ‘feathered’ fade to the brush which increases it’s diameter though).

I created a proof-of-concept Python script to check that my thinking was correct and I was pleased to see that the output now matched what I could see on the iPad’s screen, save for the fact that the iPad was in glorious Technicolor and my script’s output was monochrome. I mentioned previously that the “layer” objects contained colour information under the unsurprisingly named “color” key.

The object stored in the “color” key has values for red, blue and green – each one a value between 0 and 1. Those readers who have had even the briefest dalliance with graphic manipulation programs will be familiar with colour-sliders: combining red, green and blue in different proportions in order to generate different colours, which is what these values represented. My SVG generated output was working with HTML colour codes which are made up of 3 bytes, each representing an amount of red, green and blue, this time using values from 0x00 to 0x0FF, to get colour into my output, all I had to do is multiply 0xFF by the correct value from the “layer” object’s colour fields and recombine those values into an HTML colour code. I modified my script and now the output reflected both the form and colour displayed by the App on the iPad.

Comparison between iPad and script output 1

Comparison between iPad and script output (1)

Comparison between iPad and script output 2

Comparison between iPad and script output (2)

Working on extracting data from Penultimate was a particularly enjoyable experience as it required a combination of a number of different concepts and led to the use of a number of different technologies to create a solution to automate the extraction of the data, which when it all boiled down was satisfyingly simple.

And as to the question: when is a picture not a picture? Well, quite simply: when it’s a series of serialised floating-point co-ordinate triplets representing points on a page, broken up into rectangles on a layer on a page which is stored in a NSKeyedArchiver property list file (obviously!).

If you have any questions, comments or queries regarding this post, as always you can contact us by dropping an email to research@ccl-forensics.com or by leaving a comment below.

Update: As requested I’ve uploaded the script for research purposes. You can find it here: http://pastebin.com/VYenpXUi 

Alex Caithness, CCL Forensics

Advertisements

New version of PIP (XML and plist parser) coming soon

Our ever-popular XML and plist parsing tool, PIP, is coming of age with a new, improved version.

Currently in beta-test, and with the updated version available free to current PIP users (following its official release, obviously!), we’ll be announcing more details over the coming weeks.

As a teaser, this is what you can expect from the new version (v1.1):

  • Improved GUI – the layout of the application has been updated to improve work-flow
  • New Tree View – view your data graphically to see, at-a-glance, the structure of your data
  • Automatic XPath Building – Now you can use the Tree View to show PIP the data that you are interested in and PIP will generate the XPath automatically. This even works with Apple’s Property List ‘dictionary’ structures.
  • Import/Export Batch Jobs – Set-up a batch job of XPaths for a particular folder structure (iOS Library or Application folders for example) and then export the batch so that you, or anyone else in your lab can re-use it when you next come across the same data
  • Command line version – version 1.1 of PIP comes with the “pipcmd” command-line utility, allowing you to integrate PIP into tool chains and other automated tasks
For more information about XML and plist parsing, please visit http://www.ccl-forensics.com/pip or email us at pip@ccl-forensics.com.

Digital forensic software – grab it while it’s hot!

CCL-Forensics is offering its software at introductory prices for just one more week, so take a look at what’s on offer and squeeze as much into tight budgets as you can.

The tried-and-tested software, developed by analysts, for analysts, has been used extensively in the field by CCL-Forensics’ own investigators and by many other digital investigators from around the world.

From March 31st, prices will be increasing, so take advantage of the lower rates now.

Leading research and development in digital forensics

CCL-Forensics’ research and development team has produced a series of forensic software tools to aid them in digital investigations.

epilog allows investigators to recover deleted data from the widely-used database format, SQLite. Whatever the type of device – computers, mobile phones, SatNavs or others – epilog can be used to recover information, regardless of the type of data stored.

PIP allows analysts to present often-complex data from XML files quickly and efficiently. The tool also parses data from Apple’s property list (plist) files – both in XML and binary format. It can be used to look at computers, mobile phones and SatNavs.

dunk! can uncover potential new web activity evidence from locally-stored web cookies, putting web evidence into context and adding an extra dimension to investigations. It also parses Google Analytics cookies, showing how often, from where, and how a user arrived at a particular site, as well as presenting any search terms used to find the page.

Find out more

For more information about what CCL-Forensics can offer or to purchase the software tools, please visit our website, call us on 01789 261200 or email info@ccl-forensics.com.

Free Python module for processing plist files – get ’em while they’re hot, they’re lovely!

As anyone who has examined an iOS device (or an OSX device for that matter) will know, property list files are a major source of potential evidence. Being one of the main data storage formats they might contain anything from configuration details to browsing history to chat logs.

We’ve examined property lists in detail previously, covering their file formats and the challenges they might present (you can download the white paper here). We have also released our tool PIP, which can be used to parse data from plists and other XML data in a structured way.

As regular readers might have gathered I’m pretty keen on the Python scripting language and while the built-in libraries do support reading from the XML property list format (available through the plistlib module) there is no support for the binary property list format which is increasingly becoming the standard format.

Recently I had a task where I needed to parse a large number of binary format property list files inside a script I was writing. It was theoretically possible to have the script export them all to a single location and then perhaps run PIP separately but I really wanted to have everything self-contained and besides, I needed to make use of the data extracted from the plist files later on in the script.

One of the beauties of Python is how quickly you can go from concept to “product”, so I decided that rather than waste time finding a work-around I would craft a proper solution. After a few hours’ work I had a fully functioning module for dealing with binary plist files in Python and today we’re excited to be releasing the code so that other practitioners and coders can make use of it.

You can get the source from our Google Code page (while you’re at it you can also download our module for dealing with BlackBerry IPD backup files here.

I designed the module to provide the parsed data in as native a format as possible (see Table 1) so when writing your code you do not have to deviate from the normal Pythonic constructs – the data structure returned contains everyday Python objects. The data structure returned is also vastly interchangeable with that returned by plistlib’s “readPlist” function (the exception being that plistlib wraps “data” fields in a “Data” object rather than giving direct access to the underlying “bytes” object).

Table 1: Converted data types returned by ccl_bplist

The module only has one function that you need to know about in order to use it: “load()”. This function takes a single argument which should be a binary file-like object* and it returns the python representation of the data in the property list.

In addition to the ccl_bplist module, the Google Code repository contains an example of the module in use, parsing the “IconState.plist” from an iOS device auditing the Apps and folders present on the Springboard home screen.

Icon state screenshot

Script results

We really hope that the community will be able to make use of this module and if you have any questions please leave a comment or email us at research@ccl-forensics.com.


*When working from a file on disk you should use the open() function with “b” in the mode string e.g.: open(file_name, “rb”).

There may be other times when the bplist has come from another source, e.g. a BLOB field in a database or even a property list embedded in another’s data field. In these cases you can wrap a bytes object in a BytesIO object found in the “io” module e.g.: io.BytesIO(some_data).

Parsing XML and Plist files the cool way

You probably know that XML is a common format for storing data. It’s found on a wide range of devices and platforms, including PCs, smartphones and sat navs. XML is text-based, but not often user-friendly when served raw, as you can see:

Not so user friendly...

Because of its non-user-friendly nature, analysts or investigators often have to manually manipulate large amounts of data in order to understand its meaning and structure.

There’s a similar situation with Apple’s property list (plist) files, which can be stored in XML or binary format. Either way, they’re not terribly easy to use.

So – what makes it easier for analysts?

XPath is a query language designed for getting data out of XML files in a structured way and we’ve developed a piece of software called PIPwhich takes advantage of this in order to simplify the presentation of the often-complex data stored in the files.

The power to create XPath queries was placed right at the centre of PIP so that if you come up against unfamiliar data PIP empowers you to write a query which you can then reuse. However, being analysts ourselves, we have already encountered a number of situations where we have used PIP and as such, PIP comes preloaded with a substantial library of XPath queries for many common files.

PIP doesn’t just make raw data easier to read, though; it saves a considerable amount of time for analysts. PIP can be used to parse individual files; however where it really comes into its own and saves significant amounts of time is where it allows you to process many files at once.

For example: PIP processed 263 Facebook application files from an iPhone image in four seconds, returning 1,800 records (including profile views, chat history, photo views with comments, and URLs).

The say a (moving) picture is worth a thousand words – so take a look at our video and see what PIP can do for you.

Alex Caithness

PIP Developer