July 2012 cell site blog: The top five (potential!) pitfalls in cell site analysis.

By Nicholas Patrick-Gleed, Cell Site Analyst

This month’s cell site blog takes on a slightly different style.  The team here at CCL-Forensics has been discussing the most common potential pitfalls encountered in the world of cell site evidence, and thought it would be a useful exercise to commit some of them to the blogosphere.  So, rather than focusing on a particular topic, we’ll look at the top five (as we see them) issues which need to be at the forefront when planning and, more importantly, carrying out a cell site investigation.

We’ve touched on some of these in previous blogs, but they form a concise summary of some of the ‘issues’ we have seen experts (almost) experience.

This Month’s Topic: Five things to be wary of in cell site analysis

1. Exhibits without interpretation

When working for the defence, we regularly see prosecution evidence which can best be described as “exhibits without interpretation”.  A good example of this is a series of maps plotted by an intelligence analyst, who has carried out a series of instructions based on some call data records, but presented them without any explanation of what they mean.  This not only causes confusion and delay within the criminal justice system (the defence will, no doubt, ask for the explanation at some point – so it may as well be provided at the outset) but also means that an opportunity could be missed as part of the investigation stage.  Simply ‘blindly’ plotting information on a map is hardly investigative – but we have seen it more than once.  What is the point of an exhibit without context?

From the prosecution’s perspective this is an obvious potential pitfall – as it means that the evidence does not include something which could enhance the prosecution’s case.

There have also been occasions where the defence leaves it until the 11th hour before ‘complaining’ that the person who has produced the exhibit is not an expert – and the judge could rule that the prosecution needs to carry out more expert analysis.

It’s simply not worth chancing these situations.  Moral of the story: produce exhibits which mean something; it makes for a smoother investigation.

2. Who’s who on the call data records?

Cell site is full of idiosyncrasies.  It’s what keeps us experts on our toes.  But there are small variations between networks and circumstances can lead to major confusion.  The best example of this is when you are analysing a call data record, and the person is in contact with someone on the same network.  There are occasions when both parties cell IDs appear on the same CDR – which can immediately confuse things.  Furthermore, and lets use the ‘3’ network as an example here, if an incoming call to the subject phone is unsuccessful, then the cell ID for the person making the call still appears on the CDR.  This is particularly a problem, as the CRD doesn’t differentiate between the A and B phone (in columns) and so this needs to be taken into account.  It’s pretty easy to spot if there’s a day’s worth of cell IDs in London, and one in Edinburgh – but when both parties are geographically close, then vigilance is the watchword.

This is especially the case if the person plotting the calls is not trained in these nuances – as they may easily go unnoticed.

Moral of the story: be thorough.

3. Timely surveys

Networks change and evolve.  Nothing new there, but the sooner the survey is carried out after the incident in question, the better.  It means the results will be more accurate and better reflect what happened.

We previously touched on our use of historic data, which may help to counteract this problem – and this is a benefit of the robust methodology which CCL uses.  But, timeliness is still a big potential pitfall for a number of reasons.

One of the biggest is the evolution of “Everything Everywhere” – or the merger of
T-Mobile and Orange as most people still know it.  This means that “Everything Everywhere” now has many more channels available than each of their competitors – and consolidating cells seems like a sensible thing to do.  If there are two cells covering the same approximate area, it seems only prudent to use just one of them and either deactivate the other, or reallocate it to, say, the new 4G networks, which have been in the news recently.  This clearly impacts on the survey, especially if the cell in question is no longer transmitting.

Moral of the story: Consider the impact of the T-Mobile and Orange merger before surveying.  What are you expecting to see – and what are you expecting NOT to see?

4. Getting the whole picture – not just a small slice

Cell site is all about focusing on a phone’s movements around the time of a crime, right?  Wrong.  Yes, this is often the best place to start, but it can also be vitally important to look at the patterns of usage within the data as a whole, rather than just isolating and concentrating on a small piece of evidence.

There may be no evidence of a phone being in an area of interest at a particular time, but the best advice here it to stop, look around and think.

There may be behaviour patterns, where the time in question shows some deviation from the norm. There may be evidence elsewhere of the use of ‘clean’ and ‘dirty’ phones.  There may be evidence someone ‘casing the joint’ before the crime, which goes against the usual pattern of usage.

One just doesn’t see these when points are blindly plotted on a map.  The solution is to have as much data available as possible at the outset of a cell site assignment (or as much as can be reasonably requested under RIPA).

At the end of the day, it depends on what question you are trying to answer, but the moral of this story is: Don’t just rely on data from the time of the incident.  More complex investigations need more data.

5. Surveying techniques

Quite honestly, this is something of a bugbear of ours, and a topic which we have covered numerous times.  With that in mind, I won’t go into any major detail, but just summarise something which we think all cell site experts should adopt.  (And we’ve had this published in a peer-reviewed journal, so it’s more than just a passing fad!)

Movement is key to getting an accurate overall picture of how a phone interacts with cells.  The concept of ‘dragging’ a cell can be key to determining if a cell provides coverage at a location.  Driving to a location from a number of directions can result in a different cell providing coverage, depending on which direction you arrive from.  This is because the phone has a tendency to “hold onto” a cell, rather than chopping and changing – (to reduce the risk of a dropped call).  Spot samples (i.e. turning up at a location, surveying without moving, and then leaving, is hardly comprehensive).  This is about so much more than simply dotting the i’s and crossing the t’s.

While we’re on the subject, it’s worth touching on tracking frequencies.  Network Operators, typically use two or three 3G frequencies at their cell sites.  When moving geographically, a phone may use a new cell which uses a different frequency than the original one.  This created a potential pitfall when surveying, as the expert needs to be mindful of how many frequencies are available, and ensure the most appropriate survey is therefore carried out.  The moral of this part of the story: remember there is more than one available frequency – and be as thorough as the investigation requires.

I hope you’ve enjoyed our whistle-stop tour through the potential pitfalls of cell site analysis – and as, ever, we’re always keen to hear your thoughts on the matter.  If you would like to discuss any aspect of cell site analysis, please don’t hesitate to drop us a line at cellsite@ccl-forensics.com

Next month

Next month, Dr Iain Brodie analyses comments made by a judge during a recent case, and highlights what the criminal justice system REALLY wants from cell site experts.

Geek post: NSKeyedArchiver files – what are they, and how can I use them?

If you have spent any time investigating iOS or OSX devices you will probably have come across files (usually property lists) making reference to NSKeyedArchiver. Certainly, in my experience working with iOS, these files can often contain really interesting data (chat data for example) but the data can appear at first glance unstructured and difficult to work with.

In this blog post I aim to explain where these files come from, how they are structured and how you can get the most out of them.

Remember, remember…

NSKeyedArchiver is a class in the Mac programming API which is designed to serialise data. Data serialisation is the process through which data belonging to a program currently held in memory is written to a file on disk, so that it may be reloaded into memory at some point in the future.

This process can be achieved in a number of ways depending on the requirements of the programmer; however, NSKeyedArchiver provides a convenient way for entire “objects” in memory to be serialised, so it is a widely-used method.

Let’s take a moment to consider what is meant by an “object” in terms of programming (don’t worry; I’m not going to get too programmery). Many modern programming languages allow for (or are entirely based upon) the Object Oriented Programming paradigm. Put very generally this means that they are based around data structures (objects) which contain both data fields and built-in functionality (usually known as “methods”).

So let’s imagine that we were writing the code to define a “Person” object: we might define data fields such as: “Name”; “Age”; “Height”; and “Weight” – but we might also want to give it functionality. For example: “Speak” and “Wave”.

Obviously, a “Car” object would be different from a “Person” – it would have data fields like: “Make”; “Model”; and “Fuel-Type”. It might also have a data field for “Driver” which would hold a reference to a “Person” object.

A “Road” object might have a data fields for: “Name”; “Length”; and “Speed-Limit” along with a data field containing a collection of “Car” objects (each having a reference to a “Person” object in their “Driver” data field).

Similar (and often far more complicated) data structures might be represented in a chat application: a “Message-List” object containing a collection of “Message” objects containing fields for “Sent-Time”, “Message-Text” and “Sender”, which itself contains a reference to a “Contact” object which contains fields for “Nickname”, “Email-Address” and so on.

It’s these kinds of data structures that NSKeyedArchiver converts from objects in memory and stores as a file which can subsequently be loaded back into the memory to rebuild the structure.

So what must NSKeyedArchiver store in order to achieve this? Well, there are two requirements: it has to store details of the type of object it’s serialising; and it has to store the data held in the objects (and the correct structure of the data).

NSKeyedArchiver property lists

NSKeyedArchiver serialises the object data into a binary property list file, the basic layout of which is always the same:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>$archiver</key>
        <string>NSKeyedArchiver</string>
        <key>$objects</key>
        <array>
            <null/>
            <string>Alex</string>
            <dict>
                <key>Name</key>
                <dict>
                    <key>CF$UID</key>
                    <integer>1</integer>
                </dict>
            </dict>
        </array>
        <key>$top</key>
        <dict>
        <key>root</key>
        <dict>
            <key>CF$UID</key>
            <integer>2</integer>
        </dict>
        </dict>
        <key>$version</key>
        <integer>100000</integer>
    </dict>
</plist>

Listing 1: Overview of an NSKeyedArchiver XML property list file.

In Listing 1 we have an example of an NSKeyedArchiver property list (converted to XML for ease of reading).  At the top level, every NSKeyedArchiver file is made up of a dictionary with four keys.

Two of the keys simply provide metadata about the file: the “$archiver” key should always be followed by a string giving the name of the archiver used to create this file, which should obviously always be “NSKeyedArchiver” and the “$version” key should be followed by an integer giving the version of the archiver (100000 appears to be the only valid value).

The other two keys (“$objects” and “$top”) contain the data that has been serialised and its structure.

The “$objects” key is followed by an array containing all the pieces of data involved in the serialisation, but stored flat with little or no structure. Each of these pieces of data can be understood as being enumerated starting from zero. Within this array of objects may be data which contains the structure shown in Listing 2:

<dict>
        <key>CF$UID</key>
        <integer>0</integer>
</dict>

Listing 2: an example of the CF$UID data type.

The CF$UID data type in Listing 2 is a dictionary with a single key (“CF$UID”) which is followed by an integer number (this layout is what you will see when the property list is represented in XML; in the raw binary format the “UID” data type is a separate entity which doesn’t require the dictionary structure).

These data types represent a reference to another entity in the “$objects” array. The number of the CF$UID gives the position of the array. Consider the snippet shown in Listing 3:

    <key>$objects</key>
    <array>
        <null/>
        <string>Alex</string>
        <dict>
            <key>Name</key>
            <dict>
                <key>CF$UID</key>
                <integer>1</integer>
            </dict>
        </dict>
    </array>

Listing 3: an example of a “$objects” array.

Listing 3 shows an “$objects” array containing three pieces of data. Indexing them starting from 0 we have:

  1. A null
  2. A string containing the value “Alex”
  3. A dictionary

The dictionary at index 2 contains a single key: “Name”. The following value is a “CF$UID” data type referencing index 1 in the “$objects” array so we could consider the data to be equivalent to Listing 4:

    <key>$objects</key>
    <array>
        <null/>
        <string>Alex</string>
        <dict>
            <key>Name</key>
            <string>Alex</string>
        </dict>
    </array>

Listing 4: the “$objects” array from Listing 3 “unpacked”.

This example is very simplistic; in reality the structure revealed by unpacking the object array can be extremely deeply nested with objects containing references to objects containing references to objects…and so on.

The observant among you may be thinking “this seems like a very inefficient way to represent the data”, and for this example you’d certainly be right! However, in most real-life cases the complex data held in these files contains many repeating values which, when arranged this way, only have to be stored once but can be referenced in the “$objects” array multiple times.

The “$top” key is our entry point to the data, so it is the data held at this key that represents the total structure of the object that has been serialised. This key will be followed by a single dictionary which again contains a single key “root”. The “root” key will be followed by a single CF$UID data type which will be a reference the top level object in the “$objects” array.

Returning to the example in Listing 1 the “root” is referencing the object at index 2 in the objects array. So expanding this, our complete data structure is shown in Listing 5:

    <key>$top</key>
    <dict>
        <key>root</key>
        <dict>
            <key>Name</key>
            <string>Alex</string>
        </dict>
    </dict>

Listing 5: Expanded “$top” object, showing complete data structure.

A sense of identity

So far we have only seen examples of basic data stored in this structure where the type of data is implicit but in most files you are likely to encounter you will see additional data relating to the type of the objects being stored.

Listing 6 shows an unpacked “$top” object from a “CHATS2.plist” file produced by the iOS application “PingChat”:

    <key>$top</key>
    <dict>
    <key>root</key>
    <dict>
        <key>$class</key>
        <dict>
            <key>$classes</key>
            <array>
                <string>NSMutableDictionary</string>
                <string>NSDictionary</string>
                <string>NSObject</string>
            </array>
            <key>$classname</key>
            <string>NSMutableDictionary</string>
        </dict>
        <key>NS.keys</key>
        <array>
            <string>pingchat</string>
        </array>
        <key>NS.objects</key>
        <array>
            <dict>
                <key>$class</key>
                <dict>
                    <key>$classes</key>
                    <array>
                        <string>NSMutableArray</string>
                        <string>NSArray</string>
                        <string>NSObject</string>
                    </array>
                    <key>$classname</key>
                    <string>NSMutableArray</string>
                </dict>
                <key>NS.objects</key>
                <array>
                    <dict>
                        <key>$class</key>
                        <dict>
                            <key>$classes</key>
                            <array>
                                <string>BubbleItem</string>
                                <string>NSObject</string>
                            </array>
                            <key>$classname</key>
                            <string>BubbleItem</string>
                        </dict>
                        <key>state</key>
                        <integer>1</integer>
                        <key>image</key>
                        <string>$null</string>
                        <key>msg</key>
                        <string>Yo</string>
                        <key>author</key>
                        <string>testingtesting</string>
                        <key>time</key>
                        <dict>
                            <key>$class</key>
                            <dict>
                                <key>$classes</key>
                                <array>
                                    <string>NSDate</string>
                                    <string>NSObject</string>
                                </array>
                                <key>$classname</key>
                                <string>NSDate</string>
                            </dict>
                            <key>NS.time</key>
                            <real>307828812.649871</real>
                        </dict>
                    </dict>
                </array>
            </dict>
        </array>
    </dict>
    </dict>

Listing 6: Expanded “$top” object taken from a PingChat “CHATS2.plist” file.

In Listing 6 we can begin to see how complex the serialised data can become (and this is a simpler example). However, if you keep your cool and realise that the data is still well-structured it is possible to parse the data into something more meaningful.

One new data structure we encounter for the first time here is the “$class” structure. “$class” isn’t part of the data itself, but rather information about which type of object has been serialised. This information is obviously important when the program that serialised the data comes to deserialise it, but we can also use it to give us clues about the meaning of the data; consider the snippet in Listing 7:

<dict>
    <key>$class</key>
    <dict>
        <key>$classes</key>
        <array>
            <string>NSMutableArray</string>
            <string>NSArray</string>
            <string>NSObject</string>
        </array>
        <key>$classname</key>
        <string>NSMutableArray</string>
    </dict>
    <key>NS.objects</key>
    <array>
        <dict>
            <key>$class</key>
            <dict>
                <key>$classes</key>
                <array>
                    <string>BubbleItem</string>
                    <string>NSObject</string>
                </array>
                <key>$classname</key>
                <string>BubbleItem</string>
            </dict>
            <key>state</key>
            <integer>1</integer>
            <key>image</key>
            <string>$null</string>
            <key>msg</key>
            <string>Yo</string>
            <key>author</key>
            <string>testingtesting</string>
            <key>time</key>
            <dict>
                <key>$class</key>
                <dict>
                    <key>$classes</key>
                    <array>
                        <string>NSDate</string>
                        <string>NSObject</string>
                    </array>
                    <key>$classname</key>
                    <string>NSDate</string>
                </dict>
                <key>NS.time</key>
                <real>307828812.649871</real>
            </dict>
        </dict>
    </array>
</dict>

Listing 7: Snippet of a single object in a PingChat “CHATS2.plist” file.

Let’s take a look at the objects involved here and what the “$class” sections can tell us about the data held. The “$class” structure takes the form of a dictionary containing two keys. The “$classname” section is fairly straightforward; it simply gives us the name of the type of object we’re dealing with.

So in the case of the first “$class” structure encountered, we find that the object is of type “NSMutableArray”, which a quick Google search tells us is a “Modifiable array of objects” – so the data held in this object is going to take the form of an array or list.

The other key in the “$class” structure is “$classes”; this is a little more subtle and requires a little more explanation of one of the key concepts in most object-oriented programming languages: inheritance.

Think back to the explanation of the “Person” object. A “Person” object had the fields: “Name”; “Age”; “Height”; and “Weight” and the functionality to “Speak” and “Wave”. Now imagine that we wanted to create a new type of object: “DigitalForensicAnalyst”. We would want this new type of object to have some specialised functionality: “Image”; “Analyse”, and so on.

However, a “DigitalForensicAnalyst” is a “Person” too – they have a name, they can speak and wave. Now, programmers are stereotyped as being lazy (because they are) so it is unlikely that after spending all that time writing and debugging the code to represent a “Person” that they are going to duplicate all that hard work when it comes to creating a “DigitalForensicAnalyst”.

Instead they would have the “DigitalForensicAnalyst” object inherit the functionality from “Person”; this means that only the new functionality of “Image” and “Analyse” need be created from scratch, all of the other functionality comes free thanks to this inheritance.

Coming back to the “$classes” key, this will be followed by an array containing the names of all of the types that this object inherits from. So in the case of our first “$class” structure we can see that the “NSMutableArray” inherits functionality from both “NSArray” and “NSObject” which may give us further  hints about what data might be held.

So, this “NSMutableArray” is going to contain a collection of other objects, and looking at the rest of this object’s structure we find a key “NS.Objects” which is followed by an array containing just that collection. This array has only one item, another object containing a “$class” definition, so let’s take a look. This time our “$classname” is particularly useful: “BubbleItem” – making reference to the “speech bubbles” displayed on screen; and indeed we find message details (author, message text, timestamp) in the object’s data fields.

There’s got to be an easier way…

NSKeyedArchiver files can contain really key evidence but even a data-sadist like me is going to lose their mind converting all of these files into a usable format by hand; so how can we speed things up?

Well our PIP tool has a “Unpack NSKeyedArchiver” feature which reveals the object structure so that you can use XPaths to parse the file (either by using one of many already in the included XPath library or by writing your own).

Also, if you are a Python fan (if not, why not?) I have updated our ccl_bplist python module, which you can get here, with a “deserialise_NsKeyedArchiver” function to unpack the “$top” object.

I hope you found this blog post useful. As always, if you have any questions or suggestions, leave a comment below or contact the R&D team at research@ccl-forensics.com.

Alex Caithness, Lazy Data-Sadist, CCL-Forensics

Special thanks to Arun Prasannan for assisting the BlogKeeper by rendering the code readable.

The idiosyncrasies of mobile phone network providers

All mobile phone network providers have to, by law, hold call data records for at least one year. Problems arise, however, because they all present the records in very different formats and some of the data can seem irrelevant and confusing, giving rise to the risk of information being misinterpreted.

Call data records?

CDRs are similar to itemised phone bills – but they hold much more information. As well as the dates, times and numbers called, CDRs also include IMEI numbers, cell site data and locations.

What do you mean by idiosyncrasies?

Here’s an example: some networks will record the cell ID (the sector of the phone mast) for phone calls which were made, but didn’t connect – others won’t. And some will give the other party’s cell ID if they’re on the same network. This provides a number of opportunities, but these idiosyncrasies also present significant risks if the person analysing them isn’t aware of all these nuances.

As mentioned in a previous blog, the key is to get the call data records into a workable format. This is a massive data manipulation exercise, compounded by the fact that the networks can record locations differently – they may use the postcode, BNG (British National Grid) or latitude and longitude co-ordinates – plus the other idiosyncrasies mentioned above.

Another major issue is that CDRs from some networks’ CDRs can apparently attribute the cell site data from the person at the other end of the call to the subject phone. An unwary analyst may misinterpret this data as coming from the person receiving the call by inadvertently associating the outgoing data with incoming calls and vice versa. This is something I have even seen in court.

Shown below is an example of part of a call data record for phone 07875 477828 for part of 03/12/2010.

Note that the call at 20:54 appears to show cell site information for the phone number 07772 000987 even though this number was not the target of the call data records. An amateur assessment of the call data records – simply looking at all the cell IDs – might conclude that the target phone 07875 477828 had been in the service area for cell
03010 52339 at this time. Such a conclusion would be totally false.

This cell was many kilometres from a particular location of interest in a criminal investigation in one case. Had this issue not been picked up in court, it could easily have led to a miscarriage of justice.

Call data record

Example call data record

What other oddities are thrown up by CDRs?

There are a couple of fairly common issues that can cause problems for even the most experienced analyst.

Text messages may appear on the CDR that the user isn’t aware of. They do exist – but are likely to have been network messages which the user doesn’t actually see. They are “codes” sent by the network under a number of different circumstances to update software, and user details – amongst others. These texts can also be indicative of the user taking the SIM card and/or battery out of the device.

They are not a particularly common occurrence, but can provide additional useful evidence especially if a suspect has deliberately not used their phone in order to stay off the network. For example: a suspect may put a new SIM card into a device at their home, unaware that the phone is then communicating with the network, and therefore leaving location data on the CDR.

As mobile phones become more complex, and smartphones begin to dominate the market, this also provides a useful opportunity for cell site analysts. Not only are calls, texts and network messages registered on the CDR, so is data packet transmission. Every time the phone connects to the internet, whether browsing the web, social networking or being used by apps, this is also recorded. However, unlike a phone call, which records the start and end cell, this data will show the cell ID for where a user started browsing, but not necessarily where they finished.

It gives rise to another question: how long is a browsing session? Some networks record browsing sessions in blocks of time – for example, the cell ID of a session begun at the start of an hour will be recorded – and that will also include a second session begun at the end of that hour. Both locations may not necessarily be separately recorded. This issue again is shown on the call data above at 22:00 and 23:00.

How can the problems caused by these idiosyncrasies be avoided?

There’s no substitute for experience. Cell site analysts with substantial knowledge of these idiosyncrasies can easily circumvent any problems – by spotting them up front.

In addition, converting and sorting the vast amount of data into a properly formatted call sequence table (CST) makes the data uniform and easy to work with.

The CST means that investigators can filter and delete unwanted data easily and without the risk that something will be missed.

CCL-Forensics has developed a tool which takes raw data from the network providers, and converts it into a consistent, workable form – removing the need for extensive manual manipulation. For more information please feel free to get in touch on 01789 261200.

Most importantly, analysts really need to know the individual networks well in order to understand their various oddities and work around them.