(If you’re dying to get stuck in and are only after the links to the Python scripts, they can be found at the bottom of the post!)
After every update to iOS I like to take a file system dump of one of our test iDevices and have a poke around to see what’s changed and what’s new. Recently, on one of my excursions around the iOS file system, I came across something that looked promising that I hadn’t dug into before: a bunch of files with the “.asl” extension which were located on the data partition in “log/DiagnosticMessages”. There were lots of them too –each with a file name referring to a particular date – they went back months!
“Loads of lovely log files!” I thought to myself as I excitedly dropped one of the files into my current text editor of choice (Notepad++ if you’re interested) only to be disappointed by what was clearly a binary file format.
So I headed over to Google and entered some hopeful sounding search queries and came across a very useful blog post (http://crucialsecurityblog.harris.com/2011/06/22/the-apple-system-log-%E2%80%93-part-1/) which described the role of ASL files on OSX and listed some ways for accessing the logs from within OSX, but I was interested in gaining a better understanding of the file format (besides, the nearest Mac to me was on a different floor!).
A little more digging revealed that the code that governed the ASL logging, and the files it generated were part of the Open Source section of OSX, as a result I was able to view the code that was actually responsible for creating the file – my luck was looking up!
The two files I was particularly interested in were “asl.h” (most recent version at time of posting: http://opensource.apple.com/source/Libc/Libc-763.13/include/asl.h) and “asl_file.h” (most recent version at time of posting: http://opensource.apple.com/source/Libc/Libc-763.13/gen/asl_file.h). C header files are great; basically, their purpose is to define the data structures that are subsequently used in the functional code, so when it comes to understanding file formats, quite often they’ll tell you all you need to know without having to try and follow the flow of the actual program. Better yet, these files were pretty well commented. I know that not everyone reading this is going to want to read through a bunch of C code, so I’ll summarise the file format below (all numeric data is big endian):
First, the file Header:
|0||12||String||“ASL DB” followed by 6 bytes of 0x00|
|12||4||32bit Integer||File version (current version is: 2)|
|16||8||64bit Integer||File offset for the first record in the file|
|24||8||64bit Integer||Unix seconds timestamp, appears to be a file creation time|
|32||4||32bit Integer||String cache size (not 100% sure what this refers to, may be maximum size for string entries in the records)|
|36||8||64bit Integer||File offset for the last record in the file|
|44||36||Padding||Should all be 0x00 bytes|
So nothing too ominous there, although all of those pad-bytes at the end of the header suggest redundancy in the file spec in case apple ever fancy changing something. Indeed the fact that the header tells us that we’re on version 2 of the file format suggests that this has already happened.
The records in the file are arranged in a “doubly linked list”, that is, that every record in the file contains a reference (ie. the file offset of) the next and previous records. From a high level, the records themselves are made up of a fixed length data section, followed by a variable length section which allows the storage of additional data in a key-value type structure, finally followed by the offset of the previous record. The table below explains the structure in detail.
NB: The string storage mechanism the records use is a little bit…interesting – I’ll explain in detail later in this post, but for now if you see a reference to an “ASL String”, I mean one of these “interesting” strings!
|2||4||32bit Integer||Length of this record (excluding this and the previous field)|
|6||8||64bit Integer||File offset for next record|
|14||8||64bit Integer||Numeric ID for this record|
|22||8||64bit Integer||Record timestamp (as a Unix seconds timestamp)|
|30||4||32bit Integer||Additional nanoseconds for timestamp|
|34||2||16bit Integer||Level (see below)|
|38||4||32bit Integer||Process ID that sent the log message|
|42||4||32bit Integer||UID that sent the log message|
|46||4||32bit Integer||GID that sent the log message|
|50||4||32bit Integer||User read access|
|54||4||32bit Integer||Group read access|
|58||4||32bit Integer||Reference PID (for processes under the control of launchd)|
|62||4||32bit Integer||Key-Value count: The total number of keys and values in the key-value storage of the record|
|66||8||ASL String||Host that the sender belongs to (usually the name of the device)|
|74||8||ASL String||Name of the sender (process) which send the log message|
|82||8||ASL String||The sender’s facility|
|90||8||ASL String||Log Message|
|98||8||ASL String||The name of the reference process (for processes under control of launchd)|
|106||8||ASL String||The session of the sender (set by launchd)|
|114||8 * Key-Value count||ASL String[Key-Value count]||The key-value storage: A key followed by a value, followed by a key followed by a value… and so on. All keys and values are strings|
The level field mentioned above will have a numerical value which refers to the levels shown below:
As mentioned, the “ASL String” data type is a little odd. The ASL fields above take up 8 bytes, if the most significant bit in the 8 bytes is set (ie is 1), the rest of the most significant byte gives the length of the string, which occupies the remaining 7 bytes (unused bytes are set to 0x00). Conversely, if the top bit in the ASL String data type is not set (ie. Is 0) the entire 8 bytes should be interpreted as a 64bit Integer which gives the file offset where the string can be found. The string will be stored thusly:
|0||2||Padding||Padding bytes 0x00 0x01|
|2||4||32bit Integer||String length|
|6||String length||UTF8 String (nul-terminated)||The string data|
In order to get a better grip of what can be held in these files I decided to create a Python module to read these files and used it to dump out the contents of the ASL files I found on the iPhone.
The first thing that struck me after running the script was the volume of messages: 16161 log messages spanning 10 months – and this was on a test handset which had lay idle for weeks at a time. The second thing was the prevalence of messages sent by the “powerd” daemon, over 87% of the messages had been sent by this process. The vast majority of these messages related to the device waking and sleeping – not through user interaction, but while the device was idle. Most of these “Wake” events occurred 2-5 minute apart, presumably to allow brief data connectivity to receive updates and push messages from apps.
The key thing that interested me about these messages was that they also noted the current battery-charge percentage in their text: this is the sort of data that just begs to be graphed, so I knocked up a little script which utilised the parsing module I had just written to extract just this data and present it in a graph-friendly manner.
After graphing it (you want to use a scatter graph in Excel for this, not line as I discovered after some shouting at my screen) you are left with a graph which gives you some insight into the device’s use.
The graph above shows around 3 weeks of battery usage data from the test handset. As noted previously, this test device would lay idle for days at a time (as suggested by the gentle downward gradients) but there were periods when the handset was in use, as shown by the steeper downward gradients on the 26th and 27th of April, which mostly took place within office hours. You can also clear see the points where the device was plugged in to be charged, suggested by the very steep upward gradients. As noted the power messages occur around ever 2-5 minutes, so the resolution is actually fairly good. The exception to this is while the device is plugged in as it no longer needs to sleep to preserve battery charge; typically I only saw an event when charging began and another when the device was unplugged and the battery began to discharge again.
There are a few other messages in the iOS ASL log that look interesting, but at this time I don’t have enough nice control data to make much of them. One thing that did hearten me somewhat was the fact that on the few extractions I’ve had the opportunity to take a look at from later revisions of iOS 5, there did seem to be some extra processes that were logging messages, so it’s my hope that we’ll see more and more useful data make its way into the ASL logs on iOS.
In addition to looking at iOS ASL files, I thought I’d take a look at some from an OSX installation. Pulling the logs from the “var/log/asl” on Lion (10.7.3) and running the parsing script across the whole directory brought back a far more varied selection of messages.
The number of records returned was actually far less than on iOS, partially due to the iOS “powerd” being so chatty, but more crucially because OSX tidies up its logs on a weekly basis. That’s not to say that you will only recover a week’s worth of logs though – on this test machine I recovered logs spanning 7 months. Rather, OSX has short-term log files (those with file names which begin with a timestamp) which have a shelf-life of a week and long term log files (those with file names which begin with “bb” followed by a timestamp). The “bb” in the long term log’s file name presumably stands for “best before” and the date, which is always in the future, is the date that the file should be cleared out. The short term log files tend to hold more “intimate” entries, often debug messages sent from 3rd party applications; the long term logs err more on the side of system messages. One particularly useful set of messages in the long term log are records pertaining to booting, shutting down, logins and logouts (hibernating, waking and failed logins are recorded too, but they end up in the short-term logs).
(As an aside: one of my favourite things that I discovered when looking through these logs was the action of waking a laptop running OSX by wiggling a finger on the trackpad is recorded in the logs as a “HID Tickle”. Lovely.)
Like I did with the iOS power profiling, I put together a script which extracted these login and power records and timelines them.
A couple of things worth noting beyond the basic boot/shutdown/login records: firstly when the device wakes it records why it happened – this can be quite specific: a USB device, the lid of a laptop being opened, the power button being pressed, etc. Secondly, you can see terminal windows (tty) being opened and closed as opening a terminal window involves logging in to a terminal session (OSX does this transparently, but it’s still logged).
We’ve released the scripts mentioned in this post to the community and they can be downloaded from https://code.google.com/p/ccl-asl/. The “ccl_asl” script is both a command line utility for dumping the contents of ASL files as well as a fully featured class module which you can use to write scripts along the lines of the battery profiler and login timeline scripts.
ASL files are, on the one hand, fairly dry system logs, but on the other, with a little work you can harvest some really insightful behavioural intelligence. As always if you have any questions, comments or suggestions you can contact us on firstname.lastname@example.org or leave a comment below.