You’ll have to excuse me for a moment while I climb up onto my soapbox because this blog is going to be a preachy one. Today I want to evangelise on a subject very dear to my heart: the scripting language known as Python.
“But I’m not a programmer Alex, I’m a digital forensic analyst*!”
I know, and I’m not for one moment suggesting that you should be looking at a change of career, but just as EnCase, FTK, TSK, XRY, Cellebrite, Oxygen and their ilk are essential tools of our trade, which we keep clipped to our utility belt at all times, a scripting language like Python should also feature in the list of tools we are proficient at using.
“But why should I trouble myself with learning another tool when the off-the-shelf tools do so much?”
The answer is simple: because laziness is a virtue.
Allow me to explain my reasoning: with the best will in the world these tools cannot, and should not, be expected to do everything. When one of these tools has a gap in their capabilities we are faced with the prospect of completing the task manually. These tasks will all have a certain level of complexity, time-intensiveness and mundaneness, which, according to “Caithness’ Law” all increase exponentially with proximity to the task’s deadline.
So you grit your teeth, clench your fists and get down to it, derive the solution and pull the requisite all-nighters to get the case out the door. At this point the way I see it is you have three options: you sacrifice a goat to the dark gods of digital forensics in order for this problem to never rear its ugly head again; you resign yourself to a fate of repeating this task until whatever hellish application or system that created this artefact goes out of circulation; or you get lazy and automate the task so that neither you nor any of your colleagues ever have to go through that pain again.
And that’s when it’s so useful to have a scripting language available to you.
I’m not going to attempt to teach you to program in Python in a single blog post as that would be both arrogant and misguided, but I do want to give you an example of a simple Python script I wrote a while back to automate a boring but necessary task that saves me time on a day-to-day basis.
When examining an image of an iOS device, inevitably one of the most interesting areas of the file system is the “mobile/Applications” folder where all the third-party applications store their data. The folder contains a number of folders (one for each app installed) which are named, not with the application’s name, but rather with a UID string.
Applications folder in an iOS device
In order to find out which folder contains which application you have to dive inside each one in turn and look for the “.app” folder which gives you the name of the app.
Inside the applications folder
As you can imagine, even with a modest number of applications this is a needlessly time-consuming exercise and when faced with an iPad belonging to a real app-collector it can put you into a catatonic state. Therefore, to ease the tedium of trawling the application folder I knocked together a little script which would audit the folders automatically.
I can sense that at this stage you’re itching to take a look at some actual, honest-to-goodness Python code, but first let’s consider the algorithm that we want to express. We have a folder full of folders, and inside each of those folders is a folder named “ApplicationName.app” where ApplicationName is the name of (you guessed it) the application. So I would suggest that we want to express an algorithm along the lines of:
- Accept the path of the “mobile/Application” directory as input to our script
- Get a list of the folders held in this directory
- For each of these folders look inside and find the *.app folder
- Output the ugly UID folder name alongside the friendly *.app folder name
OK, looks simple enough – let’s see how that looks as a Python script:
The first thing to note about this script is that there are a lot of lines which begin with a hash symbol (#); these are “comments”. Comments are just notes left in the code by the programmer to help someone reading the script understand the code – they are completely ignored when the script is executed. This means that almost half of the code isn’t python at all; in fact there are only nine lines of actual code here!
So, we know what algorithm is being expressed here; let’s take a quick look at what the code is doing line by line:
These lines are bringing extra functionality into our script. Python comes pre-installed with a number of modules which add functionality to your scripts. These modules include regular expressions, hashing, database handling, JSON, decoding of binary data, file archiving and compression and loads more – far too much to list here. If Python was to get prepared to use all of this cool stuff at the start of every script it would take a long time to get started, so instead we use “import statements” to let Python know which modules we want to use in our script.
So what are we importing? Firstly “sys” contains system-specific functionality, some of it fairly low-level, but we are simply going to use it to get our command line arguments. Next up, “os” contains operating system functionality; in this script we’ll be using it to get a list of a folder’s contents. Finally “os.path” contains functionality for path manipulation; it’s used for joining paths together and checking whether a path leads to a file or a directory.
root_path = sys.argv
“sys.argv” is a list of command line arguments. The number in the square brackets tells us which item in the list we’re interested in. In Python, lists are “zero-indexed” meaning that the first item is numbered “0”; the second is “1” and so on. The first item (index 0) in the “sys.argv” list will be the name of our script, followed by any other arguments we pass to it at the command line. That means that this line gets the first command line argument after the script’s filename and assigns it to a variable “root_path” so that we can use it later in our script.
for app_folder in os.listdir(root_path):
This line is starting a loop. There are two types of loops in Python; here we are using the “for” loop which take the form:
“for each item in a sequence”
The code inside the loop takes place once for each item in the sequence. In our case, the sequence is provided by
which gives a list of the contents of the folder we were provided by the command line argument.
One of my favourite things about Python is that good code layout is actually part of the language syntax. If you look at the listing above you can see how the code after our for loop is started is indented, which means that the indented code is taking place inside the loop. If we wanted code to run after the loop has finished we would simply remove the indent at that line.
app_folder_path = os.path.join(root_path, app_folder)
Later on in the script we’re going to need the full path of the app’s folder so here we use some of the functionality in “os.path” to join the path we were supplied at the command line to the current app’s folder as served up by our for loop. We then store this complete path in a variable named “app_folder_path”:
for app_folder_content in os.listdir(app_folder_path):
Here’s another for loop. Again we’re using “os.listdir” but this time we’re getting the contents of our current app’s directory, inside which we’re going to look for the “.app” folder.
print(app_folder_content + ":\t" + app_folder)
Inside this for loop we check each of the files and folders in the application’s directory looking for one which ends with that magic “.app” extension. If we find one, we print the details out to the screen.
And that’s it, just nine lines of straight-forward code! So now we can run the script in a command window. Running the script we see the following output:
This shows us at-a-glance which application is found in each of the folders, a boring task which never has to be completed by hand again and just lets the analyst get on with actually analysing the data.
Obviously there’s scope to automate lots of other tasks, whether it’s parsing raw binary data untouched by other tools, reading information from databases and generating reports, moving files into a folder structure based on their content or any other task which is currently consuming more time than it needs to when performing it by hand. Building up a library of scripts to perform these tasks for you can make you a more efficient, and more importantly, a happier analyst.
If this post has whetted your appetite, you can download the newest version of Python from www.python.org which also gives a number of suggestions for learning resources. You can also download the presentation slides and annotated code examples (which include file reading and writing, parsing cookie files, processing SQLite databases and more) that I presented for F3 last year which relate more directly to digital forensics from here.
I hope this post has encouraged some of you to check Python out, if you have any questions then please leave a comment or you can contact me at firstname.lastname@example.org.
Alex Caithness, Python fan at CCL-Forensics
* Or your preferred synonym.