Year of Python (YOP) – Week Seventeen


Hello Reader!

So this weeks piece of code is a continuation of the work I started on the index.dat file in YOP Weeks 4 and 5.  My hope is to eventually have one script that will parse out the entire file, but for now I’m doing it piecemeal.  In the previous posts we tackled the header portion of the index.dat file, this time around we’re going to look at the HASH table.

The HASH Tables job in the index.dat file is similar to allocation files you’ll see on Filesystems ($Bitmap for NTFS, FAT in a FAT File system).  It’s job is to record where the valid record entries are within the index.dat file.  There can be more than one HASH Table entry within an index.dat file, and they are normally 4096 bytes in size.  As always Joachim Metz has a detailed writeup on the index.dat file specification here.  Note if that link doesn’t work, try this one and click on the Documentation link.

Now so far all this script will do is parse out the HASH table header, and then parse out all the records in the HASH table itself.  The records themselves are eight bytes long in total, but broken up into two sections of four bytes each.  The first four bytes are the Data piece, the second four bytes are a record pointer.

Now the Data portion can vary depending on what type of record it’s pointing to.  The simple version is it will tell you if the record is not being used, it’s marked for deletion, or it’s pointing to a valid/active record.  The Record Pointer portion holds the offset within the file for where the data portion is located.

So let’s take a look at the code…the first thing I’m doing is defining two functions.  The first function is for the HASH Table “Header”:

def hash_header(parse_header):
    ie_hash_header = parse_header[0:4]
    ie_hash_length = struct.unpack("<I", parse_header[4:8])
    ie_hash_next_table = struct.unpack("<I", parse_header[8:12])
    ie_hash_table_no = struct.unpack("<I", parse_header[12:16])
    print "{}\nHash Table Length: {}\nNext Hash Table Offset: {}\nHash Table No: {}\n".format(ie_hash_header, (ie_hash_length[0] * 128), ie_hash_next_table[0], ie_hash_table_no[0])
    return ie_hash_header, (ie_hash_length[0] * 128), ie_hash_next_table[0], ie_hash_table_no[0]

This part parses out the signature “HASH”, the length of the hash table (which is the value of this offset times 128), the file offset to the NEXT HASH table entry, and finally the HASH Table number, which starts at zero.

The second function will parse out the eight bytes of each hash table record:

def hash_table_records(parse_records):
    ie_hash_data = struct.unpack("<I", parse_records[0:4])
    ie_hash_record_pointer = struct.unpack("<I", parse_records[4:8])
    print "Hash Data: {}\t\tHash Record Pointer: {}".format(hex(ie_hash_data[0]), ie_hash_record_pointer[0])

This function will be the main workhorse portion of the script.

The last main part of the code we will talk about is the section that opens the index.dat file and then starts to parse the HASH Table portion:

with open(index_dat, "rb") as ie_file:
    ie_hash_parser =
    ie_hash_head = ie_hash_parser[20480:20496]
    ie_hash_header = hash_header(ie_hash_head)
    ie_hash_record_start = 20496
    ie_hash_record_end = 20504
    ie_hash_record = ie_hash_parser[ie_hash_record_start:ie_hash_record_end]
    while ie_hash_record_start < (ie_hash_record_start + (int(ie_hash_header[1]) - 12)):
        ie_hash_record_table = hash_table_records(ie_hash_record)
        ie_hash_record_start = ie_hash_record_end
        ie_hash_record_end += 8
        ie_hash_record = ie_hash_parser[ie_hash_record_start:ie_hash_record_end]

Again at this point the code is only decoding the first HASH Table.  Future versions will parse through all the HASH tables within the index.dat file, as well as further decode the Data portion of the HASH table record.  Right now we’re just displaying the “raw” output.

Now the key here is ie_hash_record_start and ie_hash_record_end values.  Since they are eight bytes long each, we have to cycle through each set of eight bytes within the table.  So once we have parsed out eight bytes, the ie_hash_record_end value becomes the ie_hash_record_start value, and a new ie_hash_record_end value is “created” by adding 8 to the old value.  We keep this loop going until the ie_hash_record_start value is greater than the size of the HASH table itself.  Which would be the following formula:

Hash Table Start Byte + (Length of Hash Table – 12)

Why do we subtract 12?  Because we have to account for the HASH Table Header portion which is 12 bytes in length.

Until next time!


No Responses Yet to “Year of Python (YOP) – Week Seventeen”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: