Epilog customers: a software tease

Here at CCL-Forensics, we like to tease our software customers from time to time with the promise of future goodies.

The R&D team has been beavering away on a number of projects recently, including making improvements and adjustments to our existing software.

Our epilog users will doubtless be excited to learn that version 1.1 is nearly ready for release. It’s being beta-tested as you read this, so it should soon be winging its way to existing users as a free upgrade, and will be available for new users to purchase.

So what’s new?

Well, first off: epilog 1.1 includes a database rebuilder. For analysts with tools and scripts designed only to operate on live data, this will be a sanity saver. It’s an integrated solution for rebuilding recovered records into a copy of the live database, enabling deleted data to be parsed or processed.

It also allows the user to choose whether to include the current live records, options to disable triggers and remove constraints from the database schema to tailor the rebuilding.

We’ve been keeping up with new developments in the world of SQLite. Version 3.7 of the SQLite library introduced a new journal format called “Write Ahead Log” or WAL. The new version of epilog will permit WAL file parsing. It differs from the traditional journal mechanism in that it writes new data into a separate file when specifically asked to by the database engine, rather than backing up data to a rollback journal.

In epilog 1.1 the requirement for an “associated database” when conducting a raw data or disk image search has been removed, and instead the user can provide the database page seize and text encoding manually (the option to use an associated database is still available for when it’s more convenient). There are also extra options for improving results when reading from raw dumps from flash chips.

Epilog 1.1 will now mark in grey records that have been recovered but which are truncated; this allows the user to make more informed decisions about the data. We’ve also improved the signature search algorithm to remove the need for “in the case of multiple concurrent deletion” signatures.

New export modes have been added, allowing users to output to a flat tab separated values (tsv) file. The “INSERT export” has been overhauled to make it more convenient to use.

And finally, what was formerly the “Table Analysis” feature has been upgraded to “Database and Table Details” and now reports further information regarding the database structure and parameters.

So, we’ve been pretty busy working on epilog and have taken on board the feedback we’ve received. We’re always happy to receive comments and suggestions, so please feel free to get in touch either by leaving a comment below, or emailing epilog@ccl-forensics.com.

Advertisements

An analyst enthuses about Python. No, not that one. The geeky stuff.

You’ll have to excuse me for a moment while I climb up onto my soapbox because this blog is going to be a preachy one. Today I want to evangelise on a subject very dear to my heart: the scripting language known as Python.

“But I’m not a programmer Alex, I’m a digital forensic analyst*!”

I know, and I’m not for one moment suggesting that you should be looking at a change of career, but just as EnCase, FTK, TSK, XRY, Cellebrite, Oxygen and their ilk are essential tools of our trade, which we keep clipped to our utility belt at all times, a scripting language like Python should also feature in the list of tools we are proficient at using.

And there surely are other scripting languages out there such as Ruby, JavaScript and Perl (which is a good language as long as you like to have code that looks like you’ve held the shift key down and head-butted the keyboard repeatedly), but for me Python has the perfect combination of power, expressiveness and ease of use that makes it so suitable.

“But why should I trouble myself with learning another tool when the off-the-shelf tools do so much?”

The answer is simple: because laziness is a virtue.

Allow me to explain my reasoning: with the best will in the world these tools cannot, and should not, be expected to do everything. When one of these tools has a gap in their capabilities we are faced with the prospect of completing the task manually. These tasks will all have a certain level of complexity, time-intensiveness and mundaneness, which, according to “Caithness’ Law” all increase exponentially with proximity to the task’s deadline.

So you grit your teeth, clench your fists and get down to it, derive the solution and pull the requisite all-nighters to get the case out the door. At this point the way I see it is you have three options: you sacrifice a goat to the dark gods of digital forensics in order for this problem to never rear its ugly head again; you resign yourself to a fate of repeating this task until whatever hellish application or system that created this artefact goes out of circulation; or you get lazy and automate the task so that neither you nor any of your colleagues ever have to go through that pain again.

And that’s when it’s so useful to have a scripting language available to you.

I’m not going to attempt to teach you to program in Python in a single blog post as that would be both arrogant and misguided, but I do want to give you an example of a simple Python script I wrote a while back to automate a boring but necessary task that saves me time on a day-to-day basis.

When examining an image of an iOS device, inevitably one of the most interesting areas of the file system is the “mobile/Applications” folder where all the third-party applications store their data. The folder contains a number of folders (one for each app installed) which are named, not with the application’s name, but rather with a UID string.

Applications folder in an iOS device

In order to find out which folder contains which application you have to dive inside each one in turn and look for the “.app” folder which gives you the name of the app.

Inside the applications folder

As you can imagine, even with a modest number of applications this is a needlessly time-consuming exercise and when faced with an iPad belonging to a real app-collector it can put you into a catatonic state. Therefore, to ease the tedium of trawling the application folder I knocked together a little script which would audit the folders automatically.

I can sense that at this stage you’re itching to take a look at some actual, honest-to-goodness Python code, but first let’s consider the algorithm that we want to express. We have a folder full of folders, and inside each of those folders is a folder named “ApplicationName.app” where ApplicationName is the name of (you guessed it) the application. So I would suggest that we want to express an algorithm along the lines of:

  • Accept the path of the  “mobile/Application” directory as input to our script
  • Get a list of the folders held in this directory
  • For each of these folders look inside and find the *.app folder
  • Output the ugly UID folder name alongside the friendly *.app folder name

OK, looks simple enough – let’s see how that looks as a Python script:

The script

The first thing to note about this script is that there are a lot of lines which begin with a hash symbol (#); these are “comments”. Comments are just notes left in the code by the programmer to help someone reading the script understand the code – they are completely ignored when the script is executed. This means that almost half of the code isn’t python at all; in fact there are only nine lines of actual code here!

So, we know what algorithm is being expressed here; let’s take a quick look at what the code is doing line by line:

import sys
import os
import os.path

These lines are bringing extra functionality into our script. Python comes pre-installed with a number of modules which add functionality to your scripts. These modules include regular expressions, hashing, database handling, JSON, decoding of binary data, file archiving and compression and loads more – far too much to list here. If Python was to get prepared to use all of this cool stuff at the start of every script it would take a long time to get started, so instead we use “import statements” to let Python know which modules we want to use in our script.

So what are we importing? Firstly “sys” contains system-specific functionality, some of it fairly low-level, but we are simply going to use it to get our command line arguments. Next up, “os” contains operating system functionality; in this script we’ll be using it to get a list of a folder’s contents. Finally “os.path” contains functionality for path manipulation; it’s used for joining paths together and checking whether a path leads to a file or a directory.

root_path = sys.argv[1]

“sys.argv” is a list of command line arguments. The number in the square brackets tells us which item in the list we’re interested in. In Python, lists are “zero-indexed” meaning that the first item is numbered “0”; the second is “1” and so on. The first item (index 0) in the “sys.argv” list will be the name of our script, followed by any other arguments we pass to it at the command line. That means that this line gets the first command line argument after the script’s filename and assigns it to a variable “root_path” so that we can use it later in our script.

for app_folder in os.listdir(root_path):

This line is starting a loop. There are two types of loops in Python; here we are using the “for” loop which take the form:

“for each item in a sequence”

The code inside the loop takes place once for each item in the sequence. In our case, the sequence is provided by

os.listdir(root_path)

which gives a list of the contents of the folder we were provided by the command line argument.

One of my favourite things about Python is that good code layout is actually part of the language syntax. If you look at the listing above you can see how the code after our for loop is started is indented, which means that the indented code is taking place inside the loop. If we wanted code to run after the loop has finished we would simply remove the indent at that line.

    app_folder_path = os.path.join(root_path, app_folder)

Later on in the script we’re going to need the full path of the app’s folder so here we use some of the functionality in “os.path” to join the path we were supplied at the command line to the current app’s folder as served up by our for loop. We then store this complete path in a variable named “app_folder_path”:

    for app_folder_content in os.listdir(app_folder_path):

Here’s another for loop. Again we’re using “os.listdir” but this time we’re getting the contents of our current app’s directory, inside which we’re going to look for the “.app” folder.

        if app_folder_content.endswith(".app"):
            print(app_folder_content + ":\t" + app_folder)

Inside this for loop we check each of the files and folders in the application’s directory looking for one which ends with that magic “.app” extension. If we find one, we print the details out to the screen.

And that’s it, just nine lines of straight-forward code! So now we can run the script in a command window. Running the script we see the following output:

Script output

This shows us at-a-glance which application is found in each of the folders, a boring task which never has to be completed by hand again and just lets the analyst get on with actually analysing the data.

Obviously there’s scope to automate lots of other tasks, whether it’s parsing raw binary data untouched by other tools, reading information from databases and generating reports, moving files into a folder structure based on their content or any other task which is currently consuming more time than it needs to when performing it by hand. Building up a library of scripts to perform these tasks for you can make you a more efficient, and more importantly, a happier analyst.

If this post has whetted your appetite, you can download the newest version of Python from www.python.org which also gives a number of suggestions for learning resources. You can also download the presentation slides and annotated code examples (which include file reading and writing, parsing cookie files, processing SQLite databases and more) that I presented for F3 last year which relate more directly to digital forensics from here.

I hope this post has encouraged some of you to check Python out, if you have any questions then please leave a comment or you can contact me at acaithness@ccl-forensics.com.

Alex Caithness, Python fan at CCL-Forensics

* Or your preferred synonym.

Absence of evidence is not necessarily evidence of absence…

In what is probably the first published and peer-reviewed research paper of its kind, the title of this blog is, essentially, what I and my colleagues have argued.

Digital Investigation Journal has published our research in its December 2011 edition under the title: “Historic cell site analysis – Overview of principles and survey methodologies”. In the paper, we make a number of scientifically-justified recommendations on how cell site analysis surveys should be carried out.

It is important to note, though, that there will never be a perfect survey technique; survey methods are constantly evolving as technology moves on, and our analysts are continually searching for the best way to gather the appropriate evidence. Not to mention the fact that it very much depends on the case in question and what is required from the survey.

We outlined a number of advantages and disadvantages of several cell site survey techniques and explained the problems that can be encountered when interpreting survey data. These problems can cause practitioners to draw inappropriate conclusions from their results and can end up causing arguments in court.

Location samples

Advantages:

A five-minute static location sample would enable cells with a timing offset of less than five minutes to be selected over others. It is also a quick and efficient method for data collection and analysis.

Disadvantages:

As with spot samples, five-minute location samples showed significant variability in results with small changes of position and between equipment, but to a slightly lesser degree. Even with large amounts of data from a single location, it is still less likely that other legitimately-serving cells will be reselected.

Again, any piece of equipment may not monitor all “valid” cell IDs as serving, and conversely many neighbour cells were detected that never provided service. It would be inappropriate to exclude a cell from serving at a given location using this method because it had not been detected in the survey.

Area surveys

Advantages:

These showed the least variability in results between pieces of equipment.

This method optimises the chance of cell reselection, minimising the effects of restricted BA lists and non-dominance. It is not infallible, but there is a much clearer indication of which cells genuinely provide service at and in the immediate area of a location or property.

Area surveys provide a wider picture of the general network configuration around the area of the location, enabling comment as to possible network changes if further work is required at a later date.

Disadvantages:

It takes longer to generate data and there is more data produced, making it more complicated to analyse – and this adds time (and therefore cost) to the examination.

Large quantities of measurements are not routinely obtained at a single specific location (although this can be done) for cells with a timing offset to be selected.

More possible cells will be identified (some may be false positives for that specific location, especially if they are “small” cells at the edge of the sampled area). The experiments we conducted suggest that there would be an estimated increase of around 20 per cent because of this. However, they may become relevant to an investigation later because they are detected in the local area even if not at the specific target location.

It is rare that a specific point is highlighted as being where a call was believed to have been made from.

Cell surveys

The advantage of this type of survey for a “normal” voice call or text is that the size of the area served by the cell can be demonstrated, which can be extremely useful in highlighting the limitations of the cell site evidence (if the cell provides service over a large and relevant area) or in emphasising its importance (e.g. if the service area is very small).

With hundreds of data points, absence of a specific cell from the serving and neighbour information can be a good indicator a given cell would not be expected to provide service at the location.

So what’s the best way?

There is no perfect way to conduct cell site analysis surveys, but we need to be sure we can rely on survey data to indicate whether a cell does, or – more importantly – does not, legitimately serve at a location. Different investigators will have their own way of interpreting call record data from phone companies, and we have successfully challenged some of these in court.

For example: if a suspect claims that he was in a particular area at a particular time, and was using his phone at the time, this can be verified by analysing the coverage area of the mast that he claims he was using, or visiting the location to see if it serves there.

However, by simply turning up at the scene and taking a “spot sample” as to whether the mast serves that point is not sufficient. We are aware of investigators who do this, but it is quite feasible that by carrying out a more comprehensive analysis including, for example, approaching the scene from different directions, a different picture could be revealed.

The implications of this could be far reaching in terms of the verdict delivered during a criminal court case – the difference between “guilty” and “not guilty”.

We reviewed a number of survey techniques to determine the most reliable method for collecting RF survey data for historic cell site cases. Results from experiments have demonstrated that area surveys around a location of interest provide the most consistent method for detecting serving cells at a location.

Area surveys were also more reliable for excluding cell IDs from a location, and for assessing possible network changes if further surveys take place later.

We hope that this paper will set the standard for surveys in cell site analysis, and ensure that the criminal justice system is built upon a sound base of scientific evidence. Only by keeping up with changes in technology, and reviewing what other cell site experts are doing, can practitioners ensure that they undertake the most appropriate analyses. There is enormous scope for more research, and we hope to see much more published for peer review.

Matt Tart

Cell Site Analysis Expert

Cracking Android PINs and passwords

In a previous blog post we described a method to retrieve an Android pattern lock from the raw flash of a device. However, since version 2.2 (known as “Froyo”) Android has provided the option of a more traditional numeric PIN or alphanumeric password (both are required to be 4 to 16 digits or characters in length) as an alternative security measure.

The very act of writing the last blog got us thinking whether it was possible to use a similar approach to recovering the PINs and passwords.

Our first port of call was to return to the Android source code to confirm how the data was being stored (see listing 1). Both the numeric PIN and alphanumeric passwords were found to be processed by the same methods in the same way, both arriving as a text string containing the PIN or password.

As with the pattern lock the code is sensibly not stored in the plain, instead being hashed before it is stored. The hashed data (both SHA-1 and MD5 hash this time) are stored as an ASCII string in a file named password.key which can be found in the same location on the file system as our old friend gesture.key, in the /data/system folder.

However, unlike the pattern lock, the data is salted before being stored. This makes a dictionary attack unfeasible – but if we can reliably recover the salt it would still be possible to attempt a brute force attack.

/*
     * Generate a hash for the given password. To avoid brute force attacks, we use a salted hash.
     * Not the most secure, but it is at least a second level of protection. First level is that
     * the file is in a location only readable by the system process.
     * @param password the gesture pattern.
     * @return the hash of the pattern in a byte array.
     */
     public byte[] passwordToHash(String password) {
        if (password == null) {
            return null;
        }
        String algo = null;
        byte[] hashed = null;
        try {
            byte[] saltedPassword = (password + getSalt()).getBytes();
            byte[] sha1 = MessageDigest.getInstance(algo = "SHA-1").digest(saltedPassword);
            byte[] md5 = MessageDigest.getInstance(algo = "MD5").digest(saltedPassword);
            hashed = (toHex(sha1) + toHex(md5)).getBytes();
        } catch (NoSuchAlgorithmException e) {
            Log.w(TAG, "Failed to encode string because of missing algorithm: " + algo);
        }
        return hashed;

Listing 1

Source: com/android/internal/widget/LockPatternUtils.java.

The salt which is added to the data before hashing is a string of the hexadecimal representation of a random 64-bit integer. Necessarily, this number must then be stored, and the source code showed that the Android.Settings.Secure content provider was being used to store the value under the lockscreen.password_salt key.

On the Android file system the backing store for this content provider is found in an SQLite database settings.db in the /data/data/com.android.providers.settings/databases directory (see fig. 1).

Fig 1: Salt as stored in the settings.db SQLite database

Once we knew how these two essential pieces of data were being stored we were able to consider how they might be recovered from a raw flash dump. In the case of the hashes, our approach was similar to the pattern lock. Knowing that:

  • The dump was broken into chunks of 2048 bytes (2032 for storing the data, the remaining 16 used for YAFFS2 file system tags)
  • The passcword.key file contains two hashes encoded as an ASCII string:  an SHA-1 hash (40 hexadecimal digits long) followed by a MD5 hash (32 hexadecimal digits long) which would make the file 72 bytes long in total starting at the top of the chunk
  • The hashes only contain the characters 0-9 and A-F
  • The remaining 1960 bytes in the data portion of the chunk will be zero bytes

Recovering the salt required a little extra thought. The salt is stored in an SQLite database which, because of the structure of SQLite data, meant that it would be all-but-impossible to predict where in the chunk the data might be stored. Worse still, we couldn’t even be sure of the length of the data as it was stored as a string. However, having a deeper understanding of the SQLite file format allowed us to derive a lightweight and reliable way to recover the salt.

Fig 2: Raw data making up the Salt field in the settings.db database

Figure 2 shows the raw data for the salt record in the settings.db. An SQLite record is made up of two parts, a record header and the record data. The record header is made up of a sequence of values; the first value gives the length of the record header in bytes (including this value), the following values (serial type codes in SQLite parlance) define the data types which follow in the record data.

In our case our serial type codes represent two data types: a null and two strings. The null is unimaginatively represented by the zero-byte highlighted by the red box (if you take a look at Figure 1 you may notice that the first field is displayed as a numeric value 34; this column is defined in the schema as the type INTEGER PRIMARY KEY which means that the value is not actually stored in the record itself, hence being replaced by null. The reasons for this are out of the scope of this blog post, but if you’re particularly interested Alex is more than happy to explain, at length, another time!).

The other two values (highlighted by yellow and green boxes respectively) define strings; in serial type codes a string is represented by a value larger than thirteen, the length of the string can be found by applying the following formula where x is the value of the serial type code:

(x – 13)/2

We can test this by considering the second field: we know that this field will always contain the string: “lockscreen.password_salt” which is 24 characters (and bytes) long. The serial type code associated with this data is the value highlighted with the yellow box: 0x3D which is 61 in decimal. Applying the formula: 61 – 13 gives us 48, divided by 2 gives us 24 which is the length of our string.

The field containing the salt is also a string, but its length can vary depending on the value held. We know from the source code that it is a 64 bit integer being stored which gives us a range of -9223372036854775808 to 9223372036854775808 which, allowing for the negative sign means that the value, stored as a string, takes between 1 and 20 characters. Reversing the formula, the second field’s serial type code must be odd and fall between 15 and 53 (0x0F and 0x35).

Using this information we can create a set of rules which should allow us to search for this record in the raw dump so that we can extract the salt:

  • Starting with the record header:
    • First field is always null (and therefore a zero byte)
    • The next field is always a string of length 24 (serial type code 0x3D)
    • The third field is always a string with length 1-20 (odd serial type codes 0x0F-0x35)
  • Moving on to the record data:
    • The first null field doesn’t feature – it’s implicit in the record header
    • The first field to actually appear will always be the string: ”lockscreen.password_salt”
    • The next field will be a string of a positive or negative integer.

This allows us to define a regular expression that should reliably capture this record:

\x00\x3d[\x0f\x11\x13\x15\x17\x19\x1b\x1d\x1f\x21\x23\x25\x27\x29\x2b\x2d\x2f\x31\x33\x35]lockscreen.password_salt-?\d{1,19}

Understanding the record structure also means that once we have captured the record we can ensure that we extract the whole salt value; we can simply read the appropriate serial type code and apply the formula to get the length of the salt’s string.

Satisfied that we could reliably extract the data we needed to recover the PINs or passcodes we crafted a couple of Python scripts – one to find and extract the data in the flash dump, and the other to brute force the hashes recovered (using the salt). On our test handset (an Xperia X10i running Android 2.3) we set a number of PINs and passcodes and found that even on a fairly modest workstation (Python’s hashing modules are gratifyingly efficient) PINs of up to 10 digits could be recovered within a few hours.

The passwords obviously have a much larger key-space so took longer, but the attack still seems feasible for shorter passwords and the script can easily be modified to only use Latin characters and digits rather than any other special characters or work from a password dictionary which could expedite the process.

Fig 3: Running the BruteForceAndroidPin script standalone

Once again we are happy to be releasing the scripts so that other practitioners can make use of this method. The scripts can be downloaded here: http://ccl-forensics.com/view_category/8_other-software-and-scripts.html.

Alex Caithness and Arun Prasannan of the R&D team

Unlocking Android Pattern Locks

What do you do if you have to examine an Android device which has a pattern lock enabled, but USB debugging is not initialised?

If a physical level acquisition can be performed, our technique can be used to retrieve the lock pattern from the device.

The lock pattern is entered by the user joining the points on a 3×3 grid in their chosen order. The pattern must involve a minimum of 4 points and each point can only be used once (used points can be crossed subsequently in the pattern but they will not be registered).

Each point is indexed internally by Android from 0 to 8. So the pattern in Figure 1 would be understood as “0 3 6 7 8 5 2”.

Figure 1


Storing the pattern “in the plain” would constitute a significant security flaw. Android instead stores an SHA-1 hash of the pattern interpreted as a string of bytes. In order to represent the pattern in our example, the byte string: 0x00030607080502 would be hashed to produce: 618b589aa98dfee743e7120913a0665a4a5e8317.

This behaviour can be confirmed by taking a look at the Android source code shown below:

/*
 * Generate an SHA-1 hash for the pattern. Not the most secure, but it is
 * at least a second level of protection. First level is that the file
 * is in a location only readable by the system process.
 * @param pattern the gesture pattern.
 * @return the hash of the pattern in a byte array.
 */
private static byte[] patternToHash(List pattern) {
    if (pattern == null) {
        return null;
    }

    final int patternSize = pattern.size();
    byte[] res = new byte[patternSize];
    for (int i = 0; i < patternSize; i++) {
        LockPatternView.Cell cell = pattern.get(i);
        res[i] = (byte) (cell.getRow() * 3 + cell.getColumn());
    }
    try {
        MessageDigest md = MessageDigest.getInstance("SHA-1");
        byte[] hash = md.digest(res);
        return hash;
    } catch (NoSuchAlgorithmException nsa) {
        return res;
    }
}

Source: com/android/internal/widget/LockPatternUtils.java.

The pattern lock hash is stored as a byte string in a file named gesture.key which can be found in the /data/system folder on the device’s file system.

Although storing the pattern as a hash is a sensible approach as it is a “one-way” operation from the original data to the hash, Android does not apply a salt to the original data; this, combined with the fact that there is a finite number of possible patterns, makes it fairly trivial to create a dictionary of all valid hashes. If you can gain access to the gesture.key file you can easily recover the original pattern using a dictionary attack – it would take seconds.

However; sensibly, the gesture.key file is stored in an area of the device’s file system which is not accessible in normal operation, requiring root access to the device in order to extract the file; of course if you have root access you also have access to every other file on the device as well, so recovering the pattern becomes far less useful. (http://www.oxygen-forensic.com/download/pressrelease/PR_FS2011_ENG_2011-10-31.pdf)

There are situations, though, where root access cannot be gained (for example, where USB debugging cannot be activated on the device) which spurred us on to investigate whether the pattern lock could be easily recovered from a raw read of the device’s storage, the likes of which could be made by using JTAG or chip-off techniques.

One possible solution would be to rebuild the entire file system from a raw dump, and from this identify the relevant file; however, time constraints ruled this out. Instead we decided to examine whether it was possible to identify the contents of the gesture.key file among the rest of the data in the dump. We set a known pattern lock on a test handset (an HTC Wildfire in this case) and acquired the contents of its flash memory by using JTAG.

Figure 3: JTAG test pads on an HTC Wildfire

Loading the dump into a hex viewer, we noted that it was organised into chunks of 2048 bytes: the first 2032 bytes are allocated for the data in files while it appears that the remaining 16 bytes are used to store metadata (tags) relating to the YAFFS2 file system.  We then searched for the SHA-1 hash which corresponds to our known lock pattern. This data was found to be stored as expected, occupying the first 20 bytes of the block; interestingly we also found that the remaining 2012 bytes were zero-bytes – an observation which was repeated in our other tests.

These observations left us with a set of rules which we believed would allow us to identify a gesture.key file in the raw dump without having to rebuild the entire file system:

  • The dump is organised into blocks of 2048 bytes each
  • The gesture.key file should contain a 20-byte long SHA-1 hash at the start of a block
  • The 20-byte sequence at the start of the block must appear in our dictionary of hashes
  • The following 2012 bytes should all be zero-bytes

Bearing these rules in mind, we wrote a simple Python script which reads a dump 2048 bytes at a time, disregarding blocks which do not fit our required pattern of 20 arbitrary bytes followed by 2012 zero-bytes. If a block does match these criteria, the script then attempts to look up the prospective hash (the first 20 bytes) in our pre-compiled dictionary which was enough to remove any false positives and recover the correct pattern.

One potential shortcoming of this technique is that in its current form it will also identify any previously set patterns which are still present in flash memory. Due to the nature of flash media and the YAFFS file system, which is still found to be used by the majority of Android devices, the gesture.key file does not get over-written in place. Rather the new version is written to a new block with a higher sequence number.

Despite this potential issue, we have used this technique a number of times since with great success and are happy to be making public the scripts for generating the dictionary and for searching the data for research purposes.

GenerateAndroidGestureRainbowTable.py creates an SQLite database file (AndriodLockScreenRainbow.sqlite) which contains a list of hashes for all possible patterns.

Android_GestureFinder.py takes one argument: the flash dump file, and expects AndriodLockScreenRainbow.sqlite to be present in the same folder.

C:\wildfire> Android_GestureFinder.py WILDFIRE_JTAG.bin

[0, 3, 6, 7, 8, 5, 2] 618b589aa98dfee743e7120913a0665a4a5e8317

The scripts can be downloaded from our website: /www.ccl-forensics.com/Software/other-software-a-scripts.html. We hope that researchers and analysts will find the technique useful, and we welcome any comments or suggestions.

Alex Caithness and Arun Prasannan of the R&D team

Forensic software tools – get ‘em while they’re hot, they’re lovely!

The R&D team at CCL-Forensics are a busy bunch. Over the past couple of years, they’ve developed a number of forensic software tools to examine the evidence that standard tools can’t reach.

Here’s a quick overview of what’s on offer. Follow the links to find out more, or give us a shout by phone (01789 261200) or email (info@ccl-forensics.com) – we’re always happy to talk geek with like-minded practitioners.

epilog allows investigators to recover deleted data from SQLite databases, a widely-used format in many devices including mobile phones, computers and SatNavs). Many off-the-shelf tools will only allow you to view live records.

PIP is our XML and plist parsing tool. It allows investigators to present often-complex data from XML files quickly, efficiently, and in a user-friendly format. Apple’s property list files – both XML and binary formats – present no obstacle to PIP at all.

dunk! is a splendidly-named tool for digging around in cookies. Unlike standard tools, it analyses known cookie types to uncover potential new evidence and help give context to other browser artefacts. This includes showing the path the user took to arrive at a particular webpage by parsing Google Analytics cookies, revealing a wealth of information previously unavailable to practitioners.

rubus  is FREE! We like to give a little love back to the community, so with this in mind, we made our BlackBerry backup deconstruction tool available. Not having found a tool that would do the job, we made our own – enabling analysts to reverse engineer BlackBerry backup data stored in .ipd files.

The tools all went through beta-testing first, and were pronounced ready to unleash upon the world. Since then, they’ve been subject to an introductory pricing period, and have been bought and used successfully around the world.

Now that we’re confident in the tools we’ve developed, we’re also confident in their value to our customers. So with that in mind, if you haven’t bought the tools already, you may want to think about doing so! The introductory pricing period finishes at the end of March – and although they’ll still be extremely good value for money, they will be a little more expensive.

We’ve had useful feedback from our customers in the past, which has helped us to further develop our tools, and we always welcome comments and suggestions on our software. Feel free to comment below, or get in touch with us in more traditional ways!