Cell site blog: ‘consistent’ data, or data ‘not inconsistent’

By Dr Iain Brodie, Senior Cell Site Expert

As cell site experts we are often asked to consider whether cell site data is ‘consistent’ with a specific scenario, in the knowledge that our words can have a significant impact on how a jury thinks.

For example, a typical question put to us might be:  is the data for a particular mobile phone ‘consistent’ with it having been at the scene of a particular incident which occurred, say, in the centre of Birmingham at 12:00 on a particular day?

If the cell site data for the phone shows that it connected via a cell site in the centre of Birmingham which serves the scene at 12:00, then in my opinion it is clear that the data is consistent with the phone having been at the scene. This does not mean that I think the phone necessarily WAS at the scene, as the cell ID used will cover an extended area and, of course, locations that are not the scene. Given the unpredictable ways in which phones are used, however – what data there is supports the contention that the phone was at the scene.

The situation is equally clear cut if, at 12:00, the phone connected via a cell in central London. It is physically impossible for the phone to have connected to a cell in London whilst located in Birmingham, so (if the records from the network are correct) such data would be in conflict with or inconsistent with the phone having been in central Birmingham at 12:00.

If, at 12:00 and 12:01 say, the phone connected via cells in central Coventry, the scenario is slightly different. It is not, under all circumstances, physically impossible for the phone to have connected to a cell in Coventry whilst located in Birmingham. But in all normal circumstances – given the huge number of other more likely cells in Birmingham for the phone to have used, I would still say that this data was in conflict with the phone having been in central Birmingham at 12:00. Such an opinion could be reinforced by carrying out further work if required, but in general such further work would not be required.

But imagine the data was less clear cut. For example, now my phone’s call data records show a cell site in Coventry connected to by the phone at 11:00, a cell site in Solihull at 11:30, a cell site in eastern Birmingham at 11:45 and a cell site in Wolverhampton connected to at 12:30.

In my opinion this data is again ‘consistent’ with the phone having been at the central Birmingham scene at 12:00, as the logical journey of the phone would have been close to the scene. Indeed there are not many plausible routes other than the phone passing close to the scene at 12:00 that could generate such data – although again, I do not believe the data means that the phone definitely was at the scene (and nowhere else) at 12:00.

If, however, the call data for the cell in Wolverhampton was not so. All we would have was call data consistent with movement of a phone towards the centre of Birmingham, but even less evidence that the phone was in the centre of Birmingham. Such a scenario presents quite a grey area for evidence of opinion. Some experts may say the data is still consistent with the phone being in the centre of Birmingham at 12:00, whilst it may be argued that there is, in fact, NO data consistent with the phone being in the centre of Birmingham at 12:00.

I would say that the data is consistent with the phone having travelled towards the centre of Birmingham in the times leading up to 12:00, although there is no data showing it had been used in central Birmingham.

A final scenario would be where the phone connected to a cell site in Coventry at 11:00 and again to the same cell site in Coventry at 12:45. In this scenario it is quite POSSIBLE that the phone had time to travel to the centre of Birmingham and back, but there is no data that would lead me to expect that this had been the case. Here I would use the phrase ‘the data is ‘not inconsistent’ with the phone having been in the centre of Birmingham at 12:00 but there was no data indicating it had done so’.

This may seem like semantics. However, in a case where I gave evidence for the defence earlier this year (in Birmingham Crown Court as it happens), the prosecution expert asserted that there was cell site evidence ‘consistent’ with the defendant’s phone having been travelling away from a location of a crime at a particular time, when the cell site used for all of the relevant calls provided service at his home address. The prosecution expert’s use of the word ‘consistent’ here was challenged and the challenge was accepted by the court.

The judge, Justice John Royce in summing up said:

‘although the data is not in conflict with such a theory <that the defendant was at the relevant scene>.  The data <for the time in question.> is not consistent with being at the site.  It could possibly be that the phone was en route however from the site to the defendant’s home…’

the prosecution has been driven to trying to construct theories because of the absence of solid evidence.  They have tried to make bricks with but a few straws, and have done so with admirable skill and ingenuity.  But is this sufficient evidence to be left to the jury?  Could a jury, on this evidence, properly directed, safely convict?  The conclusion to which I am driven is that they could not. Accordingly, I shall direct the jury to return not guilty verdicts’

Had the prosecution expert’s semantics not been challenged, the outcome may have been different resulting, possibly, in a miscarriage of justice.

Advertisements

July 2012 cell site blog: The top five (potential!) pitfalls in cell site analysis.

By Nicholas Patrick-Gleed, Cell Site Analyst

This month’s cell site blog takes on a slightly different style.  The team here at CCL-Forensics has been discussing the most common potential pitfalls encountered in the world of cell site evidence, and thought it would be a useful exercise to commit some of them to the blogosphere.  So, rather than focusing on a particular topic, we’ll look at the top five (as we see them) issues which need to be at the forefront when planning and, more importantly, carrying out a cell site investigation.

We’ve touched on some of these in previous blogs, but they form a concise summary of some of the ‘issues’ we have seen experts (almost) experience.

This Month’s Topic: Five things to be wary of in cell site analysis

1. Exhibits without interpretation

When working for the defence, we regularly see prosecution evidence which can best be described as “exhibits without interpretation”.  A good example of this is a series of maps plotted by an intelligence analyst, who has carried out a series of instructions based on some call data records, but presented them without any explanation of what they mean.  This not only causes confusion and delay within the criminal justice system (the defence will, no doubt, ask for the explanation at some point – so it may as well be provided at the outset) but also means that an opportunity could be missed as part of the investigation stage.  Simply ‘blindly’ plotting information on a map is hardly investigative – but we have seen it more than once.  What is the point of an exhibit without context?

From the prosecution’s perspective this is an obvious potential pitfall – as it means that the evidence does not include something which could enhance the prosecution’s case.

There have also been occasions where the defence leaves it until the 11th hour before ‘complaining’ that the person who has produced the exhibit is not an expert – and the judge could rule that the prosecution needs to carry out more expert analysis.

It’s simply not worth chancing these situations.  Moral of the story: produce exhibits which mean something; it makes for a smoother investigation.

2. Who’s who on the call data records?

Cell site is full of idiosyncrasies.  It’s what keeps us experts on our toes.  But there are small variations between networks and circumstances can lead to major confusion.  The best example of this is when you are analysing a call data record, and the person is in contact with someone on the same network.  There are occasions when both parties cell IDs appear on the same CDR – which can immediately confuse things.  Furthermore, and lets use the ‘3’ network as an example here, if an incoming call to the subject phone is unsuccessful, then the cell ID for the person making the call still appears on the CDR.  This is particularly a problem, as the CRD doesn’t differentiate between the A and B phone (in columns) and so this needs to be taken into account.  It’s pretty easy to spot if there’s a day’s worth of cell IDs in London, and one in Edinburgh – but when both parties are geographically close, then vigilance is the watchword.

This is especially the case if the person plotting the calls is not trained in these nuances – as they may easily go unnoticed.

Moral of the story: be thorough.

3. Timely surveys

Networks change and evolve.  Nothing new there, but the sooner the survey is carried out after the incident in question, the better.  It means the results will be more accurate and better reflect what happened.

We previously touched on our use of historic data, which may help to counteract this problem – and this is a benefit of the robust methodology which CCL uses.  But, timeliness is still a big potential pitfall for a number of reasons.

One of the biggest is the evolution of “Everything Everywhere” – or the merger of
T-Mobile and Orange as most people still know it.  This means that “Everything Everywhere” now has many more channels available than each of their competitors – and consolidating cells seems like a sensible thing to do.  If there are two cells covering the same approximate area, it seems only prudent to use just one of them and either deactivate the other, or reallocate it to, say, the new 4G networks, which have been in the news recently.  This clearly impacts on the survey, especially if the cell in question is no longer transmitting.

Moral of the story: Consider the impact of the T-Mobile and Orange merger before surveying.  What are you expecting to see – and what are you expecting NOT to see?

4. Getting the whole picture – not just a small slice

Cell site is all about focusing on a phone’s movements around the time of a crime, right?  Wrong.  Yes, this is often the best place to start, but it can also be vitally important to look at the patterns of usage within the data as a whole, rather than just isolating and concentrating on a small piece of evidence.

There may be no evidence of a phone being in an area of interest at a particular time, but the best advice here it to stop, look around and think.

There may be behaviour patterns, where the time in question shows some deviation from the norm. There may be evidence elsewhere of the use of ‘clean’ and ‘dirty’ phones.  There may be evidence someone ‘casing the joint’ before the crime, which goes against the usual pattern of usage.

One just doesn’t see these when points are blindly plotted on a map.  The solution is to have as much data available as possible at the outset of a cell site assignment (or as much as can be reasonably requested under RIPA).

At the end of the day, it depends on what question you are trying to answer, but the moral of this story is: Don’t just rely on data from the time of the incident.  More complex investigations need more data.

5. Surveying techniques

Quite honestly, this is something of a bugbear of ours, and a topic which we have covered numerous times.  With that in mind, I won’t go into any major detail, but just summarise something which we think all cell site experts should adopt.  (And we’ve had this published in a peer-reviewed journal, so it’s more than just a passing fad!)

Movement is key to getting an accurate overall picture of how a phone interacts with cells.  The concept of ‘dragging’ a cell can be key to determining if a cell provides coverage at a location.  Driving to a location from a number of directions can result in a different cell providing coverage, depending on which direction you arrive from.  This is because the phone has a tendency to “hold onto” a cell, rather than chopping and changing – (to reduce the risk of a dropped call).  Spot samples (i.e. turning up at a location, surveying without moving, and then leaving, is hardly comprehensive).  This is about so much more than simply dotting the i’s and crossing the t’s.

While we’re on the subject, it’s worth touching on tracking frequencies.  Network Operators, typically use two or three 3G frequencies at their cell sites.  When moving geographically, a phone may use a new cell which uses a different frequency than the original one.  This created a potential pitfall when surveying, as the expert needs to be mindful of how many frequencies are available, and ensure the most appropriate survey is therefore carried out.  The moral of this part of the story: remember there is more than one available frequency – and be as thorough as the investigation requires.

I hope you’ve enjoyed our whistle-stop tour through the potential pitfalls of cell site analysis – and as, ever, we’re always keen to hear your thoughts on the matter.  If you would like to discuss any aspect of cell site analysis, please don’t hesitate to drop us a line at cellsite@ccl-forensics.com

Next month

Next month, Dr Iain Brodie analyses comments made by a judge during a recent case, and highlights what the criminal justice system REALLY wants from cell site experts.

Cell site blog – Never mind the quality, feel the width

Thoughts and observations on how ‘more’ could mean ‘less’ in the presentation of cell site analysis.

By Matthew Tart, Cell Site Analyst

This month – we look at quality over quantity in cell site analysis – with particular emphasis on a recent example where a pile (literally) of maps could easily have left jurors’ heads spinning.  And cost the prosecution a considerable sum.

This month’s topic: Getting the balance right in cell site analysis 

This blog starts with a case we were involved in recently, involving a high profile crime with a number of defendants.  On this occasion we were working for the defence, but this story acts as a useful pointer for the prosecution by illustrating techniques that experts used by the Crown should – and should not – be doing.  We’ll focus on a method used by a large number of cell site analysts (but not ourselves) which is not necessarily robust or stand up to close scrutiny.

Q: What were the details of the case?

A: The prosecution were investigating the probability of a suspect being at a crime scene – a pub in an inner city location.  At the time of the crime, one of the suspects (we were working for that suspect’s defence solicitor in this case) made a phone call.  The call data records showed that this phone call was made on what we’ll call ‘cell A’, which was on a mast near the crime scene – but also near to his home address which was about 500m away.

The suspect’s alibi was that he was at home at the time of the crime and the phone call.

The prosecution’s outsourced expert carried out ‘spot samples’ (i.e. turned up at a location with a piece of equipment) at both the crime scene and the alibi location.  Their report showed a different cell serving at each location.  Cell A was shown as best serving at the crime scene – but not at the alibi location.

Q: So what did we do differently?

A: We carried out a much more extensive survey i.e. a drive survey at the home address and the surrounding area.  This was carried out with regard to the cell of interest (Cell A), and we used multiple pieces of equipment and repeatedly moved in and out of the area.  We found that cell A provided coverage north, south, east and west of both locations (crime and alibi scene), and based upon this, could not distinguish between the mobile phone being at either location.  The evidence was simply not strong enough to suggest one or the other.

Q:  So, were both sides saying something different?

A: Yes and no. Before the court date, the prosecution’s outsourced expert asked for a copy of our defence report, which we provided.  We then discussed the contents with the expert over the phone, who claimed that he wouldn’t expect cell A to provide coverage at the home address.  After looking at our evidence, he admitted that our assertion that the cell served at both addresses was actually the most valid interpretation of the evidence.  This is a worrying admission/u-turn to say the least.  This is despite his evidence not documenting that cell A also serves at that crucial home address.

Q:  The other side claimed that a different cell provided service at the home address.  Did your survey find that cell as well?

A: Yes, but we found four cells which served at the home address.  Cell A, the one the other side claimed – AND two others.

Q: How was this data presented by the prosecution’s expert?

A: In a rather cumbersome, and lengthy fashion, to say the least.  There were a number of suspects, and their report showed the same maps over and over – and over – again.  It showed the locations of interest, calls for varying time periods, and whether the cells used actually covered the locations.  This came to more than 100 (one hundred) maps.  All printed on A3 paper and bound into a daunting, unwieldy piece of physical evidence, which the jury would have to absorb.

I would defy even the most attentive juror to have easily made sense of this massive tome.  Notwithstanding the threatening size of the document, but all the pages were practically the same, or almost identical copies of other similar pages.  You simply wouldn’t be able to take it all in.  Especially as one wouldn’t expect jurors to be familiar with this type of evidence – making it all the more crucial to have it presented in a friendly form.

Q: What would we have done differently?

A: Firstly, not produced a huge weighty un-jury-friendly document. The best way of presenting this evidence (for which we would have had MUCH more survey data, having done more than carry out simple spot samples) would have been a series of two or three detailed maps which can be presented interactively at court with the relevant points being highlighted by the expert in the course of presenting the evidence.  These maps would have covered specifically the period of interest – and would have a secondary, financial, benefit.

By not producing hundreds of maps, we would have saved a considerable amount of time – and therefore cost.  We would estimate that producing this unmanageable number of maps and documents would have potentially cost tens of thousands of pounds. Our approach would almost certainly have been cheaper AND more robust.

Q:  So the lesson here is…

A: …to think about what you need to achieve, and the best way of doing it.  Don’t be held to ransom by an outsourced experts ‘way of doing something’.  Hopefully this example has shown two things.  One, that carrying out spot samples (as we’ve mentioned in previous blogs) may not be the most appropriate way of surveying.  And secondly, that the end product i.e. what the jury see and have to understand, can be something a little more sophisticated than a batch of similar-looking, repetitive – and quite frankly, uninspiring – maps and tables.  Technology has moved on.  So has cell site analysis.  And so has the presentation of evidence in court.

In terms of maps it is quality not quantity that delivers the most impactive conclusions in relation to the possible locations of a mobile phone.

For more information about this – or any aspect of cell site analysis, please contact Matthew Tart (or any of our other cell site analysts) on 01789 261200 or by emailing cellsite@ccl-forensics.com

XML and plist parser – updated version available

We’ve updated PIP – our XML and plist parser.

PIP has already proved incredibly popular, and is used by a number of investigation agencies across the world.

The new version is available for download here (license key purchase required) – and is a free upgrade to those who have already bought PIP.

We’ve listened to feedback received from the family of PIP users, and have introduced the following improvements:

Improved Interface – to improve work-flow, we have updated the application’s layout

New Tree View – see, at-a-glance, the structure of your data

Automatic building of XPaths – The Tree View can now be used to show PIP the data you are interested in – and PIP generates the XPath automatically.  This feature even works with Apple’s Property List ‘dictionary’ structures.

Import/Export Batch Jobs – Set-up a batch job of XPaths for a particular folder structure (iOS Library or Application folders for example) and then export the batch so that you, or anyone else in your lab can re-use it when you next come across the same data

Command line version – version 1.1 of PIP comes with the “pipcmd” command-line utility, allowing you to integrate PIP into tool chains and other automated tasks

To find out more, or to purchase PIP, please visit our PIP download page.

 

Mystery box reveals digital secrets

Arun Prasannan, member of CCL-Forensics’ R&D team. 

Every now and again, an unusual device arrives for analysis at CCL-Forensics, which proves interesting – but above all, significant to an investigation.

Earlier this month, a UK law enforcement agency submitted what can only be described as a ‘black box’.  It was plastic, no bigger than a packet of cigarettes, and from the outside, it had only a slot for a SIM card and a socket for power.

Working closely with the investigating agency, a member of CCL-Forensics’ R&D team carried out an in-depth analysis of what was inside the device, and what data it was capable of storing.

It was initially suspected that it was some kind of tracking device, and when disassembled, it was found to contain a battery, and two separate circuit boards, to one of which was attached a mercury switch which detected movement.  One board contained all the circuitry one would normally expect on a mobile phone, and had everything it needed to connect to a GSM network.  When examined VERY closely, it was labelled (in very small print) with an IMEI number.  From this, we could identify the board, and then research all the available documents about that piece of hardware.

Interestingly, it was a widely used GSM module found in many mobile devices such as GPS trackers, Fax machines and even some phones.

The SIM card was analysed separately, and it was strongly suspected that there was additional data on the board itself.

Our analysts procured a test module, and carried out a comprehensive technical analysis to validate what data it could store.  It was found to have the capacity to store call data (made, received, missed), SMS and contacts – as well as some call timers.  It was also determined that SMS messages could be extracted without changing their status. 

Following this comprehensive research, it was found that the suspect device DID contain a number of phone numbers and call times – which were presented back to the investigator in the case.  This was a level of potentially vital evidence which would have been missed without this very low-level investigation of the device and the data it contained.

It also highlights the talents of CCL-ForensicsR&D department, and the value investigators can derive by not simply opting for a ‘plug and play’ forensic examination.

For more information, please contact us at research@ccl-forensics.com

Cell site analysis and impactive court presentation

The monthly cell site blog is back – and this month, we’ll be looking at what makes for an high impact piece of cell site evidence in court, as well as how going that extra mile at the outset of a cell site investigation can, in the long run, save time, money and bring your case to a speedier, more positive conclusion.

Impactive court presentation

By Dr. Iain Brodie, Cell Site Expert

Let’s consider a real case which CCL-Forensics investigated on behalf of a UK police force.  We’ll change some of the location and crime details for the sake of confidentiality, and to help with legalities.  The story goes like this:  there was an aggravated burglary at a house in a semi-rural location, and following enquiries, a man was arrested.  It was crucial for the prosecution to demonstrate the man was at the scene and not merely in the vicinity.

The prosecution claimed that the man in custody had made a number of phone calls to an accomplice, waiting outside the property, while the crime was in progress.  They obtained the call data records (CDRs) from the phone company which the phone (attributed to the individual) was connected to at the time.

CCL-Forensics cell site experts looked at the calls at the pertinent time, and could see that there were indeed incoming and outgoing calls – as well as a number of texts.  These events on the CDR used three different cell IDs (mobile phone mast sectors), but all took place over the period of a number of minutes.

In order to determine whether the suspect was likely to have been at the scene, surveys were carried out of the entire coverage areas of these three cells.  CCL-Forensics performed a number of drive surveys, looking at areas where the cells in question would initiate a mobile phone call.  Once these drive surveys had been carried out, for each of the three masts, they were uploaded onto our mapping system and the so-called ‘derived service areas’ were plotted.

The result was instantly compelling.  Like a neat Venn diagram, the areas overlapped, with that overlap area covering a comparatively small area.  Well within this area, was the crime scene.  It was, to a certain extent, a ‘textbook’ piece of evidence.  The fact that a number of cells were used at the time could easily be down to the fact that the suspect was moving around the house, and receiving a different dominant signal from different elevations of the property.

The question you may well ask, is why not just carry out a ‘spot sample’ at the crime location?  Surely this would have yielded the same result.  The reason for this was down to the case conference CCL-Forensics held with the investigating officer, where it was felt that a more robust survey was required to pre-empt any possible challenge from the defence.  This turned out to be a very wise move, as in the weeks after the survey was carried out, the defence put forward an alibi location which was only a comparatively short distance from the crime scene.

When this point was plotted on the same map (without the need to go out and re-survey), it does indeed show that one of the cells served (for initiating a call) at this location – but not all three.  The alibi location was therefore rejected, and based on the compelling evidence from cell site analysis, the suspect was found guilty.

The map shows the coverage areas, along with the overlap, which ultimately proved to be the pivotal piece of evidence in court.  When presented to the jury in this way, the impact is immeasurable. 

Image

The remit here was to find an effective balance between doing the bare minimum, and doing too much – incurring unnecessary costs.  Had a simple ‘spot sample’ been carried out in the first instance, it would have been necessary to return to the scene to carry out similar exercises at the alibi location – incurring delay and cost.  As it transpired, this was not necessary, as the measurements had already been taken.  In addition to this, the way the evidence was presented, showing the relevance of the small area where the cells’ service overlapped, proved to be an invaluable method of demonstrating the point to the jury.  Cell site evidence, when not presented in an impactive way, can be confusing in court – and at worst, can overwhelm those sitting on the jury.  This was an elegant, easily understandable piece of evidence – and it worked.

This enhanced service was agreed by collaboration of the cell site expert with the customer force at the initial case conference. This has shown the value of providing expert advice from the start of the analysis.

The power of the evidence more than justified ‘going that extra mile’ – and it ultimately saved the expense of carrying out at least one additional survey.  I hope this goes to show that a tailored investigation, based on the intelligence of the case and the requirements of the investigating officer, can be a much more powerful approach than a ‘one size fits all’ turn-up-and-survey approach.

If you would like more information about cell site analysis and its use in cases of this type, please contact me or any of my colleagues by emailing info@ccl-forensics.com.  As ever, please keep the feedback to these articles coming in.  We do enjoy reading your comments and opinions.

Keep posted as next month we will look at another aspect of cell site that will make or breaks a prosecution.

Geek post: NSKeyedArchiver files – what are they, and how can I use them?

If you have spent any time investigating iOS or OSX devices you will probably have come across files (usually property lists) making reference to NSKeyedArchiver. Certainly, in my experience working with iOS, these files can often contain really interesting data (chat data for example) but the data can appear at first glance unstructured and difficult to work with.

In this blog post I aim to explain where these files come from, how they are structured and how you can get the most out of them.

Remember, remember…

NSKeyedArchiver is a class in the Mac programming API which is designed to serialise data. Data serialisation is the process through which data belonging to a program currently held in memory is written to a file on disk, so that it may be reloaded into memory at some point in the future.

This process can be achieved in a number of ways depending on the requirements of the programmer; however, NSKeyedArchiver provides a convenient way for entire “objects” in memory to be serialised, so it is a widely-used method.

Let’s take a moment to consider what is meant by an “object” in terms of programming (don’t worry; I’m not going to get too programmery). Many modern programming languages allow for (or are entirely based upon) the Object Oriented Programming paradigm. Put very generally this means that they are based around data structures (objects) which contain both data fields and built-in functionality (usually known as “methods”).

So let’s imagine that we were writing the code to define a “Person” object: we might define data fields such as: “Name”; “Age”; “Height”; and “Weight” – but we might also want to give it functionality. For example: “Speak” and “Wave”.

Obviously, a “Car” object would be different from a “Person” – it would have data fields like: “Make”; “Model”; and “Fuel-Type”. It might also have a data field for “Driver” which would hold a reference to a “Person” object.

A “Road” object might have a data fields for: “Name”; “Length”; and “Speed-Limit” along with a data field containing a collection of “Car” objects (each having a reference to a “Person” object in their “Driver” data field).

Similar (and often far more complicated) data structures might be represented in a chat application: a “Message-List” object containing a collection of “Message” objects containing fields for “Sent-Time”, “Message-Text” and “Sender”, which itself contains a reference to a “Contact” object which contains fields for “Nickname”, “Email-Address” and so on.

It’s these kinds of data structures that NSKeyedArchiver converts from objects in memory and stores as a file which can subsequently be loaded back into the memory to rebuild the structure.

So what must NSKeyedArchiver store in order to achieve this? Well, there are two requirements: it has to store details of the type of object it’s serialising; and it has to store the data held in the objects (and the correct structure of the data).

NSKeyedArchiver property lists

NSKeyedArchiver serialises the object data into a binary property list file, the basic layout of which is always the same:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>$archiver</key>
        <string>NSKeyedArchiver</string>
        <key>$objects</key>
        <array>
            <null/>
            <string>Alex</string>
            <dict>
                <key>Name</key>
                <dict>
                    <key>CF$UID</key>
                    <integer>1</integer>
                </dict>
            </dict>
        </array>
        <key>$top</key>
        <dict>
        <key>root</key>
        <dict>
            <key>CF$UID</key>
            <integer>2</integer>
        </dict>
        </dict>
        <key>$version</key>
        <integer>100000</integer>
    </dict>
</plist>

Listing 1: Overview of an NSKeyedArchiver XML property list file.

In Listing 1 we have an example of an NSKeyedArchiver property list (converted to XML for ease of reading).  At the top level, every NSKeyedArchiver file is made up of a dictionary with four keys.

Two of the keys simply provide metadata about the file: the “$archiver” key should always be followed by a string giving the name of the archiver used to create this file, which should obviously always be “NSKeyedArchiver” and the “$version” key should be followed by an integer giving the version of the archiver (100000 appears to be the only valid value).

The other two keys (“$objects” and “$top”) contain the data that has been serialised and its structure.

The “$objects” key is followed by an array containing all the pieces of data involved in the serialisation, but stored flat with little or no structure. Each of these pieces of data can be understood as being enumerated starting from zero. Within this array of objects may be data which contains the structure shown in Listing 2:

<dict>
        <key>CF$UID</key>
        <integer>0</integer>
</dict>

Listing 2: an example of the CF$UID data type.

The CF$UID data type in Listing 2 is a dictionary with a single key (“CF$UID”) which is followed by an integer number (this layout is what you will see when the property list is represented in XML; in the raw binary format the “UID” data type is a separate entity which doesn’t require the dictionary structure).

These data types represent a reference to another entity in the “$objects” array. The number of the CF$UID gives the position of the array. Consider the snippet shown in Listing 3:

    <key>$objects</key>
    <array>
        <null/>
        <string>Alex</string>
        <dict>
            <key>Name</key>
            <dict>
                <key>CF$UID</key>
                <integer>1</integer>
            </dict>
        </dict>
    </array>

Listing 3: an example of a “$objects” array.

Listing 3 shows an “$objects” array containing three pieces of data. Indexing them starting from 0 we have:

  1. A null
  2. A string containing the value “Alex”
  3. A dictionary

The dictionary at index 2 contains a single key: “Name”. The following value is a “CF$UID” data type referencing index 1 in the “$objects” array so we could consider the data to be equivalent to Listing 4:

    <key>$objects</key>
    <array>
        <null/>
        <string>Alex</string>
        <dict>
            <key>Name</key>
            <string>Alex</string>
        </dict>
    </array>

Listing 4: the “$objects” array from Listing 3 “unpacked”.

This example is very simplistic; in reality the structure revealed by unpacking the object array can be extremely deeply nested with objects containing references to objects containing references to objects…and so on.

The observant among you may be thinking “this seems like a very inefficient way to represent the data”, and for this example you’d certainly be right! However, in most real-life cases the complex data held in these files contains many repeating values which, when arranged this way, only have to be stored once but can be referenced in the “$objects” array multiple times.

The “$top” key is our entry point to the data, so it is the data held at this key that represents the total structure of the object that has been serialised. This key will be followed by a single dictionary which again contains a single key “root”. The “root” key will be followed by a single CF$UID data type which will be a reference the top level object in the “$objects” array.

Returning to the example in Listing 1 the “root” is referencing the object at index 2 in the objects array. So expanding this, our complete data structure is shown in Listing 5:

    <key>$top</key>
    <dict>
        <key>root</key>
        <dict>
            <key>Name</key>
            <string>Alex</string>
        </dict>
    </dict>

Listing 5: Expanded “$top” object, showing complete data structure.

A sense of identity

So far we have only seen examples of basic data stored in this structure where the type of data is implicit but in most files you are likely to encounter you will see additional data relating to the type of the objects being stored.

Listing 6 shows an unpacked “$top” object from a “CHATS2.plist” file produced by the iOS application “PingChat”:

    <key>$top</key>
    <dict>
    <key>root</key>
    <dict>
        <key>$class</key>
        <dict>
            <key>$classes</key>
            <array>
                <string>NSMutableDictionary</string>
                <string>NSDictionary</string>
                <string>NSObject</string>
            </array>
            <key>$classname</key>
            <string>NSMutableDictionary</string>
        </dict>
        <key>NS.keys</key>
        <array>
            <string>pingchat</string>
        </array>
        <key>NS.objects</key>
        <array>
            <dict>
                <key>$class</key>
                <dict>
                    <key>$classes</key>
                    <array>
                        <string>NSMutableArray</string>
                        <string>NSArray</string>
                        <string>NSObject</string>
                    </array>
                    <key>$classname</key>
                    <string>NSMutableArray</string>
                </dict>
                <key>NS.objects</key>
                <array>
                    <dict>
                        <key>$class</key>
                        <dict>
                            <key>$classes</key>
                            <array>
                                <string>BubbleItem</string>
                                <string>NSObject</string>
                            </array>
                            <key>$classname</key>
                            <string>BubbleItem</string>
                        </dict>
                        <key>state</key>
                        <integer>1</integer>
                        <key>image</key>
                        <string>$null</string>
                        <key>msg</key>
                        <string>Yo</string>
                        <key>author</key>
                        <string>testingtesting</string>
                        <key>time</key>
                        <dict>
                            <key>$class</key>
                            <dict>
                                <key>$classes</key>
                                <array>
                                    <string>NSDate</string>
                                    <string>NSObject</string>
                                </array>
                                <key>$classname</key>
                                <string>NSDate</string>
                            </dict>
                            <key>NS.time</key>
                            <real>307828812.649871</real>
                        </dict>
                    </dict>
                </array>
            </dict>
        </array>
    </dict>
    </dict>

Listing 6: Expanded “$top” object taken from a PingChat “CHATS2.plist” file.

In Listing 6 we can begin to see how complex the serialised data can become (and this is a simpler example). However, if you keep your cool and realise that the data is still well-structured it is possible to parse the data into something more meaningful.

One new data structure we encounter for the first time here is the “$class” structure. “$class” isn’t part of the data itself, but rather information about which type of object has been serialised. This information is obviously important when the program that serialised the data comes to deserialise it, but we can also use it to give us clues about the meaning of the data; consider the snippet in Listing 7:

<dict>
    <key>$class</key>
    <dict>
        <key>$classes</key>
        <array>
            <string>NSMutableArray</string>
            <string>NSArray</string>
            <string>NSObject</string>
        </array>
        <key>$classname</key>
        <string>NSMutableArray</string>
    </dict>
    <key>NS.objects</key>
    <array>
        <dict>
            <key>$class</key>
            <dict>
                <key>$classes</key>
                <array>
                    <string>BubbleItem</string>
                    <string>NSObject</string>
                </array>
                <key>$classname</key>
                <string>BubbleItem</string>
            </dict>
            <key>state</key>
            <integer>1</integer>
            <key>image</key>
            <string>$null</string>
            <key>msg</key>
            <string>Yo</string>
            <key>author</key>
            <string>testingtesting</string>
            <key>time</key>
            <dict>
                <key>$class</key>
                <dict>
                    <key>$classes</key>
                    <array>
                        <string>NSDate</string>
                        <string>NSObject</string>
                    </array>
                    <key>$classname</key>
                    <string>NSDate</string>
                </dict>
                <key>NS.time</key>
                <real>307828812.649871</real>
            </dict>
        </dict>
    </array>
</dict>

Listing 7: Snippet of a single object in a PingChat “CHATS2.plist” file.

Let’s take a look at the objects involved here and what the “$class” sections can tell us about the data held. The “$class” structure takes the form of a dictionary containing two keys. The “$classname” section is fairly straightforward; it simply gives us the name of the type of object we’re dealing with.

So in the case of the first “$class” structure encountered, we find that the object is of type “NSMutableArray”, which a quick Google search tells us is a “Modifiable array of objects” – so the data held in this object is going to take the form of an array or list.

The other key in the “$class” structure is “$classes”; this is a little more subtle and requires a little more explanation of one of the key concepts in most object-oriented programming languages: inheritance.

Think back to the explanation of the “Person” object. A “Person” object had the fields: “Name”; “Age”; “Height”; and “Weight” and the functionality to “Speak” and “Wave”. Now imagine that we wanted to create a new type of object: “DigitalForensicAnalyst”. We would want this new type of object to have some specialised functionality: “Image”; “Analyse”, and so on.

However, a “DigitalForensicAnalyst” is a “Person” too – they have a name, they can speak and wave. Now, programmers are stereotyped as being lazy (because they are) so it is unlikely that after spending all that time writing and debugging the code to represent a “Person” that they are going to duplicate all that hard work when it comes to creating a “DigitalForensicAnalyst”.

Instead they would have the “DigitalForensicAnalyst” object inherit the functionality from “Person”; this means that only the new functionality of “Image” and “Analyse” need be created from scratch, all of the other functionality comes free thanks to this inheritance.

Coming back to the “$classes” key, this will be followed by an array containing the names of all of the types that this object inherits from. So in the case of our first “$class” structure we can see that the “NSMutableArray” inherits functionality from both “NSArray” and “NSObject” which may give us further  hints about what data might be held.

So, this “NSMutableArray” is going to contain a collection of other objects, and looking at the rest of this object’s structure we find a key “NS.Objects” which is followed by an array containing just that collection. This array has only one item, another object containing a “$class” definition, so let’s take a look. This time our “$classname” is particularly useful: “BubbleItem” – making reference to the “speech bubbles” displayed on screen; and indeed we find message details (author, message text, timestamp) in the object’s data fields.

There’s got to be an easier way…

NSKeyedArchiver files can contain really key evidence but even a data-sadist like me is going to lose their mind converting all of these files into a usable format by hand; so how can we speed things up?

Well our PIP tool has a “Unpack NSKeyedArchiver” feature which reveals the object structure so that you can use XPaths to parse the file (either by using one of many already in the included XPath library or by writing your own).

Also, if you are a Python fan (if not, why not?) I have updated our ccl_bplist python module, which you can get here, with a “deserialise_NsKeyedArchiver” function to unpack the “$top” object.

I hope you found this blog post useful. As always, if you have any questions or suggestions, leave a comment below or contact the R&D team at research@ccl-forensics.com.

Alex Caithness, Lazy Data-Sadist, CCL-Forensics

Special thanks to Arun Prasannan for assisting the BlogKeeper by rendering the code readable.