New Epilog Signature files released

Epilog Signature files allow users to add specific support for new databases they encounter and although they are designed so that Epilog’s users can create their own signatures when the need arises, CCL-Forensics are committed to updating and releasing a sets of signatures, pre-written and ready to use.

In this new release we have had a real focus on smartphones adding support for:
• iOS6
• Android 4.0 (Ice Cream Sandwich)
• Android 4.1 (Jelly Bean)
• Android 3rd Party Applications
• iOS 3rd Party Applications
• Skype

We always welcome suggestions for signatures that you’d like to see added to the signature collection so please get in touch on epilog@ccl-forensics.com

For more information on epilog please visit our website – www.cclgroupltd.com/Buy-Software/

Advertisements

The Forensic Implications of SQLite’s Write Ahead Log

By Alex Caithness, CCL-Forensics

SQLite is a popular free file-based database format which is used extensively both on desktop and mobile operating systems (it is one of the standard storage formats available on both Android and iOS). This article sets out to examine the forensic implications, both pitfalls and opportunities, of a relatively new feature of the database engine: Write Ahead Log.

Before we begin, it is worth taking a moment to describe the SQLite file format. Briefly, records in the database are stored in file which in SQLite parlance is called the ‘Database Image’. The database image is broken up into “pages” of a fixed size (the size is specified in the file header). Each page may have one of a number of roles, such as informing the structure of the database, and crucially holding the record data itself. The pages are numbered internally by SQLite starting from 1.

Historically SQLite used a mechanism called “Rollback Journals” for dealing with errors occurring during use of the database. Whenever any data on a page of the database was to be altered, the entire page was backed up in a separate journal file. At the conclusion of a successful transaction the journal file would be removed; conversely if the transaction was interrupted for any reason (crash, power cut, etc.) the journal remained. This means that if SQLite accesses a database and finds that a journal is still present something must have gone wrong and the engine will restore the database to its previous state using the copies of pages in the journal, avoiding corrupted data.

From version 3.7.0 of the SQLite engine an alternative journal mechanism was introduced called “Write Ahead Log” (ubiquitously shortened to “WAL”). WAL effectively turned the journal mechanism on its head: rather than backing up the original pages then making changes directly to the database file, the database file itself is untouched and the new or altered pages are written to a separate file (the Write Ahead Log). These altered or new pages will remain in the WAL file, the database engine reading data from the WAL in place of the historic version in the main database. This continues until a “Checkpoint” event takes place, finally copying the pages in the WAL file into the main database file. A Checkpoint may take place automatically when the WAL file reaches a certain size (by default this is 1000 pages) or performed manually by issuing an SQL command (“PRAGMA wal_checkpoint;”) or programmatically if an application has access to the SQLite engine’s internal API.

Initial state: No pages in the WAL

Initial state: No pages in the WAL

Page Altered - new version written to WAL

Page 3 is altered. The new version of the page is written to the WAL and the database engine uses this new version rather than the old version in the database file itself.

Checkpoint

A checkpoint operation takes place and the new version of the page is written into the database file.

It is possible to detect whether a database is in WAL mode in a number of ways: firstly this information is found in the database file’s header; examining the file in a hex editor, the bytes at file offset 18 and 19 will both be 0x01 if the database is using the legacy rollback journal or 0x02 if the database is in WAL mode. Secondly you can issue the SQL command “PRAGMA journal_mode;” which will return the value “wal” if the database is in WAL mode (anything else indicates rollback journal). However, probably the most obvious indication of a database in WAL mode is the presence of two files named as “<databasefilename>-wal” and “<databasefilename>-shm” in the same logical directory as the database (eg. if the database was called “sms.db” the two additional files would be “sms.db-wal” and “sms.db-shm”).

The “-wal” file is the actual Write Ahead Log which contains the new and updated database pages, its structure is actually fairly simplistic. The “-wal” file is made up of a 32 byte file header followed by zero or more “WAL frames”. The file header contains the following data:

Offset Size Description
0 4 bytes File signature (0x377F0682 or 0x377F0683)
4 4 bytes File format version (currently 0x002DE218 which interpreted as a big endian integer is 3007000)
8 4 bytes Associated database’s page size (32-bit big endian integer)
12 4 bytes Checkpoint sequence number (32-bit big endian integer which is incremented with every checkpoint, starting at 0)
16 4 bytes Salt-1 Random number, incremented with every checkpoint *
20 4 bytes Salt-2 Random number, regenerated with every checkpoint
24 4 bytes Checksum part 1 (for the first 24 bytes of the file)
28 4 bytes Checksum part 2 (for the first 24 bytes of the file)

* In testing it was found that although the official (and at the time of writing, up to date) command line version of SQLite v3.7.11 behaved correctly, when using SQLite Expert v3.2.2.2.2102 this value appeared to be regenerated after each checkpoint (which is assumed by the author to be incorrect behaviour)

The WAL Frames that follow the header consist of a 24 byte header followed by the number of bytes specified in the file header’s “page size” field which is the new or altered database page. The Frame Header takes the following form:

Offset Size Description
0 4 bytes Database page number (32-bit big endian integer)
4 4 bytes For a record that marks the end of a transaction (a commit record) this will be a 32-bit big endian integer giving the size of the database file in pages, otherwise 0.
8 4 bytes Salt-1, as found in the WAL header at the time that this Frame was written
12 4 bytes Salt-2, as found in the WAL header at the time that this Frame was written
16 4 bytes Checksum part 1 – cumulative checksum up through and including this page
20 4 bytes Checksum part 2 – cumulative checksum up through and including this page

There are a number of potential uses and abuses for the WAL file in the context of digital forensics, but first, the behaviour of SQLite while in WAL mode should examined. A number of operations were performed on a SQLite database in WAL mode. After each operation the database file along with its “-shm” and “-wal” files were copied, audited and hashed so that their states could be examined.

Step 1: Create empty database with a single table:

8a9938bc7252c3ab9cc3da64a0e0e06a *database.db
b5ad3398bf9e32f1fa3cca9036290774 *database.db-shm
da1a0a1519d973f4ab7935cec399ba58 *database.db-wal

1,024       database.db
32,768      database.db-shm
2,128       database.db-wal

WAL Checkpoint Number:  0
WAL Salt-1: 3046154441
WAL Salt-2: 220701676

Viewing the database file using a hex editor we find a single page containing the file header and nothing else. As noted as well as creating a database file, a table was also created, however this data was written to the WAL in the form of a new version of this page. The WAL contains two frames, this new version of the first page in addition to a second frame holding an empty table page. When accessing this database through the SQLite engine this information is read from the “-wal” file transparently and we see the empty table, even though the data doesn’t appear in the database file itself.

Step 2: Force a checkpoint using PRAGMA command:

dd376606c00867dc34532a44aeb0edb6 *database.db
1878dbcefc552cb1230fce65df13b8c7 *database.db-shm
da1a0a1519d973f4ab7935cec399ba58 *database.db-wal

2,048       database.db
32,768      database.db-shm
2,128       database.db-wal

WAL Checkpoint Number:  0
WAL Salt-1: 3046154441
WAL Salt-2: 220701676

Using the pragma command mentioned above, the database was “checkpointed”. Accessing the database through SQLite we see no difference to the data but examining the files involved, we can clearly see that the database file has changed (it has different hash) furthermore it has grown. Looking inside the database file we can see the two pages from the “-wal” file have now been written into the database file itself and SQLite will be reading this data from here rather than the “-wal” file.

The WAL Checkpoint number and salts were not changed at this point, as we will see they are altered the next time that the WAL is written to.

Another interesting observation is that the “-wal” file was left completely unchanged during the checkpoint process – a fact that will become extremely important in the next step.

Step 3: Insert a single row:

dd376606c00867dc34532a44aeb0edb6 *database.db
6dc09958989a6c0094a99a66531f126f *database.db-shm
e9fc939269dbdbfbc157d8c12be720ed *database.db-wal

2,048       database.db
32,768      database.db-shm
2,128       database.db-wal

WAL Checkpoint Number:  1
WAL Salt-1: 3046154442
WAL Salt-2: 534753839

A single row was inserted into the database using a SQL INSERT statement. Once again we arrive at a situation where the database file itself has been left untouched, evidenced by the fact that the database file’s hash hasn’t altered since the last step.

The “-wal” file hasn’t changed size (so still contains two WAL frames) but clearly the contents of the file have changed. Indeed, examining the file in a hex editor we find that the first frame in the file contains a table page containing the newly inserted record as we would expect. What is interesting is that the second frame in the file is the same second frame found in the file in the previous two steps. After a checkpoint the “-wal” file is not deleted or truncated, it is simply reused, frames being overwritten from the top of the file.

Examining the Frame’s headers we see the following:

Frame Page Number Commit Size Salt-1 Salt-2
1 2 2 3046154442 534753839
2 2 2 3046154441 220701676

Both frames relate to the same page in the database but their salt values differ. As previously noted these two salt values are copied from the WAL file header as they are at the time of writing. Salt-2 is regenerated upon each checkpoint, but key here is Salt-1 which is initialised when the WAL is first created and then incremented upon each checkpoint. Using this value we can show that the page held in second frame of the WAL is a previous version of page held in the first frame: we can begin to demonstrate a timeline of changes to the database.

Step 4: Force a checkpoint using PRAGMA command:

704c633fdceceb34f215cd7fe17f0e84 *database.db
a98ab9ed82393b728a91aacc90b1d788 *database.db-shm
e9fc939269dbdbfbc157d8c12be720ed *database.db-wal

2,048       database.db
32,768      database.db-shm
2,128       database.db-wal

WAL Checkpoint Number:  1
WAL Salt-1: 3046154442
WAL Salt-2: 534753839

Once again a checkpoint was forced using the PRAGMA command.  As before the updated pages in the WAL were written into the database file and this operation had no effect on the contents of the “-wal” itself. Viewing the database using the SQLite engine shows the same data as in the previous step.

Step 5: Insert a second row, Update contents of the first row:

704c633fdceceb34f215cd7fe17f0e84 *database.db
d17cf8f25deaa8dbf4811b4d21216506 *database.db-shm
ed5f0336c23aef476c656dd263849dd0 *database.db-wal

2,048       database.db
32,768      database.db-shm
2,128       database.db-wal

WAL Checkpoint Number:  2
WAL Salt-1: 3046154443
WAL Salt-2: 3543470737

A second row was added to the database using a SQL INSERT statement and the previously added row was altered using an UPDATE statement.

Once again, and as is now fully expected, the database file is unchanged, the new data has been written to the WAL. The WAL contains two frames: The first holds a table page containing the original record along with our newly added second record; the second frame holds a table page containing the updated version of our original record along with the new, second record. Examining the frame headers we see the following:

Frame Page Number Commit Size Salt-1 Salt-2
1 2 2 3046154443 3543470737
2 2 2 3046154443 3543470737

In this case both frames contain data belonging to the same page in the database and the same checkpoint (Salt-1 is the same for both frames); in this case the order of events is simply detected by the order in which the frames appear in the file – they are written to the file from the top, down.

Step 6: Insert a third row:

704c633fdceceb34f215cd7fe17f0e84 *database.db
5ac6d9e56e6bbb15981645cc6b4b4d6b *database.db-shm
672a97935722024aff4f1e2cf43d83ad *database.db-wal

2,048       database.db
32,768      database.db-shm
3,176       database.db-wal

WAL Checkpoint Number:  2
WAL Salt-1: 3046154443
WAL Salt-2: 3543470737

Next, a third row was added to the database using an INSERT statement. Viewing the database logicaly with the SQLite engine we see all three records. While database file remains unchanged, the “-wal” file now contains 3 frames: the first two are as in the previous step with the third and final new frame holding a table page with all three records. The frame headers contain the following information:

Frame Page Number Commit Size Salt-1 Salt-2
1 2 2 3046154443 3543470737
2 2 2 3046154443 3543470737
3 2 2 3046154443 3543470737

We now have three versions of the same page, as before the sequence of events is denoted by the order they occur in the file.

Step 7: Force a checkpoint using PRAGMA command:

04a16e75245601651853fd0457a4975c *database.db
05be4054f8e33505cc2cd7d98c9e7b31 *database.db-shm
672a97935722024aff4f1e2cf43d83ad *database.db-wal

2,048       database.db
32,768      database.db-shm
3,176       database.db-wal

WAL Checkpoint Number:  2
WAL Salt-1: 3046154443
WAL Salt-2: 3543470737

As we have observed before the checkpoint results in to the up-to-date records being written into the database, the “-wal” file is unaffected.

Step 8: Delete A Row:

04a16e75245601651853fd0457a4975c *database.db
dca5c61a689fe73b3c395fd857a9795a *database.db-shm
3b518081a5ab4a7be6449e86bb9c2589 *database.db-wal

2,048       database.db
32,768      database.db-shm
3,176       database.db-wal

WAL Checkpoint Number:  3
WAL Salt-1: 3046154444
WAL Salt-2: 2798791151

Finally in this test, the second record in the table (the record added in Step 5) was deleted using an SQL DELETE statement. Accessing the database using the SQLite engine shows that the record is no longer live in the database.

As per expectations the database file is unaffected by this operation, the changes instead being written to the WAL. The “-wal” file contains three frames:  the first frame holds a table page with the second record deleted (the data can still be seen, and could be recovered using a tool such as Epilog, however the metadata on the page shows that the record is not live). The remaining two pages are identical to the final two frames in the previous step. Examining the frame headers we see the following:

Frame Page Number Commit Size Salt-1 Salt-2
1 2 2 3046154444 2798791151
2 2 2 3046154443 3543470737
3 2 2 3046154443 3543470737

Here we once again see three frames all containing data from the same database page, this time the most recent version of the page is found in frame 1 as it has the highest Salt-1 value; the other two frames have a lower Salt-1 value and are therefore older revisions; as they both share the same Salt-1 value we apply the “position in file” rule, the later in the file the frame occurs, the newer it is. So in order of newest to oldest the frames are ordered: 1, 3, 2.

Summarising the findings in this experiment:

  • Altered or new pages are written to the WAL a frame at a time, rather than the database file
  • The most up-to-date pages in the WAL are written to the database file on a Checkpoint event – this operation leaves the “-wal” file untouched
  • After a Checkpoint, the “-wal” file is reused rather than deleted or truncated,  with new frames
  • Multiple frames for the same database page can exist in the WAL, their relative ages can be derived by first examining the frame header’s Salt-1 value with newer frames having higher values. Where multiple frames have the same Salt-1, their age is determined by their order in the WAL, with newer frames occurring later

Pitfalls and Opportunities

The most obvious opportunity afforded by the Write Ahead Log is the potential for time-lining of activity in database. To prove the concept, a small Python script was written which would automate the analysis of the frames in a WAL file and provide a chronology of the data; a sample output is shown below:

Header Info:
    Page Size: 1024
    Checkpoint Sequence: 3
    Salt-1: 3046154444
    Salt-2: 2798791151

Reading frames...

Frame 1 (offset 32)
    Page Number: 2
    Commit Size: 2
    Salt-1: 3046154444
    Salt-2: 2798791151

Frame 2 (offset 1080)
    Page Number: 2
    Commit Size: 2
    Salt-1: 3046154443
    Salt-2: 3543470737

Frame 3 (offset 2128)
    Page Number: 2
    Commit Size: 2
    Salt-1: 3046154443
    Salt-2: 3543470737

Unique Salt-1 values:
    3046154443
    3046154444

Chronology of frames (oldest first):

Page Number: 2
    Frame 2
    Frame 3
    Frame 1

With further work it should be possible to display a sequence of insertions, updates and deletions of records within a database – a feature which is a top priority for the next update of Epilog. Even without the ability to timeline, it is clear that deleted records can be stored and recovered from the WAL (functionality already present in Epilog).

One behaviour which hasn’t been described in full so far is that a database file in WAL mode isolated from its associated “-wal” file is, in almost all circumstances, a valid database in its own right. For example, consider the test database above as it is at the end of Step 8. If the database file was moved to another directory, as far as the SQLite database engine is concerned this is a complete database file. If this isolated database file was queried, the data returned will be that which was present at the last checkpoint (in our test case, this would be the 3 live records present at the checkpoint performed in step 7).

This raises an important consideration when working with a SQLite database contained in a disk image or other container (eg. a TAR archive): if the database file is extracted from the image or container without its associated WAL files, the data can be out-of-date or incomplete. The other side of the coin is that the “full up-to-date” version of the data (viewed with the WAL present) may lack records present in the isolated database file because of deletions pending a checkpoint. There is, then, an argument for examining databases both ways: complete with WAL files and isolated as it may be possible to obtain deleted records “for free”.

Summing Up

The Write Ahead Log introduced in SQLite 3.7 may afford digital forensics practitioners new opportunities to extract extra data and behaviour information from SQLite databases; however the mechanism should be understood to get the most of the new opportunities and avoid confusion when working with the databases.

If you have any comments or questions, please leave a comment below or get in touch directly at research@ccl-forensics.com.

References:

SQLite File Format

Write Ahead Log

Alex Caithness, CCL-Forensics

Epilog version 1.1 to the launch pad

The long-awaited upgrade to epilog has arrived.

It is available as a free upgrade for existing epilog users and can be purchased by new users from our website.

Read on to find out what’s new – and take a look at our explanatory video on our YouTube channel…

Well, first off: epilog 1.1 includes a database rebuilder. For analysts with tools and scripts designed only to operate on live data, this will be a sanity saver. It’s an integrated solution for rebuilding recovered records into a copy of the live database, enabling deleted data to be parsed or processed.

It also allows the user to choose whether to include the current live records, options to disable triggers and remove constraints from the database schema to tailor the rebuilding.

We’ve been keeping up with new developments in the world of SQLite. Version 3.7 of the SQLite library introduced a new journal format called “Write Ahead Log” or WAL. The new version of epilog will permit WAL file parsing. It differs from the traditional journal mechanism in that it writes new data into a separate file when specifically asked to by the database engine, rather than backing up data to a rollback journal.

In epilog 1.1 the requirement for an “associated database” when conducting a raw data or disk image search has been removed, and instead the user can provide the database page seize and text encoding manually (the option to use an associated database is still available for when it’s more convenient). There are also extra options for improving results when reading from raw dumps from flash chips.

Epilog 1.1 will now mark in grey records that have been recovered but which are truncated; this allows the user to make more informed decisions about the data. We’ve also improved the signature search algorithm to remove the need for “in the case of multiple concurrent deletion” signatures.

New export modes have been added, allowing users to output to a flat tab separated values (tsv) file. The “INSERT export” has been overhauled to make it more convenient to use.

And finally, what was formerly the “Table Analysis” feature has been upgraded to “Database and Table Details” and now reports further information regarding the database structure and parameters.

The epilog team is always happy to receive comments and suggestions, so please feel free to get in touch either by leaving a comment below, or emailing epilog@ccl-forensics.com.

Epilog customers: a software tease

Here at CCL-Forensics, we like to tease our software customers from time to time with the promise of future goodies.

The R&D team has been beavering away on a number of projects recently, including making improvements and adjustments to our existing software.

Our epilog users will doubtless be excited to learn that version 1.1 is nearly ready for release. It’s being beta-tested as you read this, so it should soon be winging its way to existing users as a free upgrade, and will be available for new users to purchase.

So what’s new?

Well, first off: epilog 1.1 includes a database rebuilder. For analysts with tools and scripts designed only to operate on live data, this will be a sanity saver. It’s an integrated solution for rebuilding recovered records into a copy of the live database, enabling deleted data to be parsed or processed.

It also allows the user to choose whether to include the current live records, options to disable triggers and remove constraints from the database schema to tailor the rebuilding.

We’ve been keeping up with new developments in the world of SQLite. Version 3.7 of the SQLite library introduced a new journal format called “Write Ahead Log” or WAL. The new version of epilog will permit WAL file parsing. It differs from the traditional journal mechanism in that it writes new data into a separate file when specifically asked to by the database engine, rather than backing up data to a rollback journal.

In epilog 1.1 the requirement for an “associated database” when conducting a raw data or disk image search has been removed, and instead the user can provide the database page seize and text encoding manually (the option to use an associated database is still available for when it’s more convenient). There are also extra options for improving results when reading from raw dumps from flash chips.

Epilog 1.1 will now mark in grey records that have been recovered but which are truncated; this allows the user to make more informed decisions about the data. We’ve also improved the signature search algorithm to remove the need for “in the case of multiple concurrent deletion” signatures.

New export modes have been added, allowing users to output to a flat tab separated values (tsv) file. The “INSERT export” has been overhauled to make it more convenient to use.

And finally, what was formerly the “Table Analysis” feature has been upgraded to “Database and Table Details” and now reports further information regarding the database structure and parameters.

So, we’ve been pretty busy working on epilog and have taken on board the feedback we’ve received. We’re always happy to receive comments and suggestions, so please feel free to get in touch either by leaving a comment below, or emailing epilog@ccl-forensics.com.

An analyst enthuses about Python. No, not that one. The geeky stuff.

You’ll have to excuse me for a moment while I climb up onto my soapbox because this blog is going to be a preachy one. Today I want to evangelise on a subject very dear to my heart: the scripting language known as Python.

“But I’m not a programmer Alex, I’m a digital forensic analyst*!”

I know, and I’m not for one moment suggesting that you should be looking at a change of career, but just as EnCase, FTK, TSK, XRY, Cellebrite, Oxygen and their ilk are essential tools of our trade, which we keep clipped to our utility belt at all times, a scripting language like Python should also feature in the list of tools we are proficient at using.

And there surely are other scripting languages out there such as Ruby, JavaScript and Perl (which is a good language as long as you like to have code that looks like you’ve held the shift key down and head-butted the keyboard repeatedly), but for me Python has the perfect combination of power, expressiveness and ease of use that makes it so suitable.

“But why should I trouble myself with learning another tool when the off-the-shelf tools do so much?”

The answer is simple: because laziness is a virtue.

Allow me to explain my reasoning: with the best will in the world these tools cannot, and should not, be expected to do everything. When one of these tools has a gap in their capabilities we are faced with the prospect of completing the task manually. These tasks will all have a certain level of complexity, time-intensiveness and mundaneness, which, according to “Caithness’ Law” all increase exponentially with proximity to the task’s deadline.

So you grit your teeth, clench your fists and get down to it, derive the solution and pull the requisite all-nighters to get the case out the door. At this point the way I see it is you have three options: you sacrifice a goat to the dark gods of digital forensics in order for this problem to never rear its ugly head again; you resign yourself to a fate of repeating this task until whatever hellish application or system that created this artefact goes out of circulation; or you get lazy and automate the task so that neither you nor any of your colleagues ever have to go through that pain again.

And that’s when it’s so useful to have a scripting language available to you.

I’m not going to attempt to teach you to program in Python in a single blog post as that would be both arrogant and misguided, but I do want to give you an example of a simple Python script I wrote a while back to automate a boring but necessary task that saves me time on a day-to-day basis.

When examining an image of an iOS device, inevitably one of the most interesting areas of the file system is the “mobile/Applications” folder where all the third-party applications store their data. The folder contains a number of folders (one for each app installed) which are named, not with the application’s name, but rather with a UID string.

Applications folder in an iOS device

In order to find out which folder contains which application you have to dive inside each one in turn and look for the “.app” folder which gives you the name of the app.

Inside the applications folder

As you can imagine, even with a modest number of applications this is a needlessly time-consuming exercise and when faced with an iPad belonging to a real app-collector it can put you into a catatonic state. Therefore, to ease the tedium of trawling the application folder I knocked together a little script which would audit the folders automatically.

I can sense that at this stage you’re itching to take a look at some actual, honest-to-goodness Python code, but first let’s consider the algorithm that we want to express. We have a folder full of folders, and inside each of those folders is a folder named “ApplicationName.app” where ApplicationName is the name of (you guessed it) the application. So I would suggest that we want to express an algorithm along the lines of:

  • Accept the path of the  “mobile/Application” directory as input to our script
  • Get a list of the folders held in this directory
  • For each of these folders look inside and find the *.app folder
  • Output the ugly UID folder name alongside the friendly *.app folder name

OK, looks simple enough – let’s see how that looks as a Python script:

The script

The first thing to note about this script is that there are a lot of lines which begin with a hash symbol (#); these are “comments”. Comments are just notes left in the code by the programmer to help someone reading the script understand the code – they are completely ignored when the script is executed. This means that almost half of the code isn’t python at all; in fact there are only nine lines of actual code here!

So, we know what algorithm is being expressed here; let’s take a quick look at what the code is doing line by line:

import sys
import os
import os.path

These lines are bringing extra functionality into our script. Python comes pre-installed with a number of modules which add functionality to your scripts. These modules include regular expressions, hashing, database handling, JSON, decoding of binary data, file archiving and compression and loads more – far too much to list here. If Python was to get prepared to use all of this cool stuff at the start of every script it would take a long time to get started, so instead we use “import statements” to let Python know which modules we want to use in our script.

So what are we importing? Firstly “sys” contains system-specific functionality, some of it fairly low-level, but we are simply going to use it to get our command line arguments. Next up, “os” contains operating system functionality; in this script we’ll be using it to get a list of a folder’s contents. Finally “os.path” contains functionality for path manipulation; it’s used for joining paths together and checking whether a path leads to a file or a directory.

root_path = sys.argv[1]

“sys.argv” is a list of command line arguments. The number in the square brackets tells us which item in the list we’re interested in. In Python, lists are “zero-indexed” meaning that the first item is numbered “0”; the second is “1” and so on. The first item (index 0) in the “sys.argv” list will be the name of our script, followed by any other arguments we pass to it at the command line. That means that this line gets the first command line argument after the script’s filename and assigns it to a variable “root_path” so that we can use it later in our script.

for app_folder in os.listdir(root_path):

This line is starting a loop. There are two types of loops in Python; here we are using the “for” loop which take the form:

“for each item in a sequence”

The code inside the loop takes place once for each item in the sequence. In our case, the sequence is provided by

os.listdir(root_path)

which gives a list of the contents of the folder we were provided by the command line argument.

One of my favourite things about Python is that good code layout is actually part of the language syntax. If you look at the listing above you can see how the code after our for loop is started is indented, which means that the indented code is taking place inside the loop. If we wanted code to run after the loop has finished we would simply remove the indent at that line.

    app_folder_path = os.path.join(root_path, app_folder)

Later on in the script we’re going to need the full path of the app’s folder so here we use some of the functionality in “os.path” to join the path we were supplied at the command line to the current app’s folder as served up by our for loop. We then store this complete path in a variable named “app_folder_path”:

    for app_folder_content in os.listdir(app_folder_path):

Here’s another for loop. Again we’re using “os.listdir” but this time we’re getting the contents of our current app’s directory, inside which we’re going to look for the “.app” folder.

        if app_folder_content.endswith(".app"):
            print(app_folder_content + ":\t" + app_folder)

Inside this for loop we check each of the files and folders in the application’s directory looking for one which ends with that magic “.app” extension. If we find one, we print the details out to the screen.

And that’s it, just nine lines of straight-forward code! So now we can run the script in a command window. Running the script we see the following output:

Script output

This shows us at-a-glance which application is found in each of the folders, a boring task which never has to be completed by hand again and just lets the analyst get on with actually analysing the data.

Obviously there’s scope to automate lots of other tasks, whether it’s parsing raw binary data untouched by other tools, reading information from databases and generating reports, moving files into a folder structure based on their content or any other task which is currently consuming more time than it needs to when performing it by hand. Building up a library of scripts to perform these tasks for you can make you a more efficient, and more importantly, a happier analyst.

If this post has whetted your appetite, you can download the newest version of Python from www.python.org which also gives a number of suggestions for learning resources. You can also download the presentation slides and annotated code examples (which include file reading and writing, parsing cookie files, processing SQLite databases and more) that I presented for F3 last year which relate more directly to digital forensics from here.

I hope this post has encouraged some of you to check Python out, if you have any questions then please leave a comment or you can contact me at acaithness@ccl-forensics.com.

Alex Caithness, Python fan at CCL-Forensics

* Or your preferred synonym.

SQLite analysis for forensic practitioners

epilog‘s developers have put together a one-day training course to help you to get the best possible results from digital investigations involving SQLite databases.

The course covers the basics of epilog and demonstrates how to deal with SQLite logically, as well as covering how to optimise results and advanced use of the tool. It will help you to get more from your investigations.

For example, the iPhone web cache is stored in an SQLite database. In a recent case, epilog recovered and presented nearly 5,000 entries from the web cache, where only 400 live (visible) entries were shown – including both textual and binary data. The tool streamlined the process by identifying the tables from which the data originated, and then allowed the investigator to use the “export to insert statements” functionality to make these records live again. This enabled the deleted cached records to be parsed and processed.

Our training course will teach you how to do this, and much more.

It takes place on February 7, 2012, at our offices in Stratford-upon-Avon. It’s a one-day course, costing just £250+VAT per person – a bargain in anyone’s book. Call us now on +44 (0)1789 261200 or email info@ccl-forensics.com for more information or to book a place.

Alex Caithness

epilog developer

Updated signature files for epilog

CCL-Forensics’ developers are constantly adding new files to increase the capability of epilog, and the latest signature files are now available for download, free of charge.

These new signature files now contain support for Apple iOS5.

We also welcome suggestions for additions to future signature file releases. Please email us at epilog@ccl-forensics.com.

What is epilog?

For those who don’t know, epilog is a software tool which allows investigators to recover deleted data from the widely-used database format, SQLite. Take a look at the first of our epilog videos:

It was developed by CCL-Forensics and – put simply – it gives investigators access to more data which could prove crucial in an investigation.

Many devices (whether mobile phones, computers, satnavs or other devices) store data in the SQLite database format.

Data stored in this type of database can provide a huge evidential opportunity for investigators. Many “off-the-shelf” tools can be used to view the live records in the database, but epilog  extracts deleted and de-referenced data from the database files or across a disc image or hex dump.

epilog’s three recovery algorithms can be used on any SQLite database, regardless of the type of data stored. However, epilog signatures can be used to tailor its behaviour to a particular database.

Included with the initial release of epilog were signatures including:

  • Android (SMS, call logs, calendars, address book and others)
  • iPhone (SMS, emails, calendar, and others)
  • Smartphone third party applications (including Yahoo Messenger, eBuddy chat and others)
  • Safari (internet history and cache and others)
  • Mozilla (cookies, internet history, form data and others)
  • Chrome (internet history)