Dive project



What is Dive?

Dive logoThe Dive project is a media content finder which allows to search for data on various storage devices without requiring the devices containing the data itself.

This project has officially been discontinued. Details can be found here.


Where the project started and where we are now — the story so far.

In the late 1990's I had many compact discs containing tools, projects and backup data. Finding data often took some time, because I had to insert the disc from which I thought it contained the files I needed, but sometimes I inserted the wrong disc, so I had to remove it from the drive again and look for the data on another one.

Due to this, I wrote CDCF (Compact Disc Content Finder), a very basic but quite fast interactive command line tool for MS-DOS. The tool consisted of two simple components, a content file builder which stored the content information of a disc into a file and a content finder which allowed searching the created content files for a given search term.

A few days later, a buddy saw me using this tool and was very interested in it, due to the fact, that he also had quite some data discs. At that time CDCF was still pretty alpha version like, so I had to revise it a little to make it more user-friendly. I gave him the revised copy of the tool and after some weeks quite many of my buddies used it.

In 2004 some of them asked me if I could rewrite CDCF as a Windows application with a graphical user interface. I did not have much spare time back then, so I created CDCFWin, a temporary solution that simply had the demanded interface but no new features at all.

Three years later CDCFWin was replaced by DataInventory, a completely new project that did not contain any source code from its predecessors. It also had a new easy-to-use interface as well as more features like excluding data when creating content files or using wildcards when searching for some content.

The DataInventory project worked fine so far, but I did not like the facts that it was slower than both of its predecessors and that it was running on Windows operating systems, only. So, I decided to completely redevelop the tool once again (also without any code from its predecessors) with the primary target to make it faster as well as platform independent.

The new project also needed a name. A buddy suggested to call it MILF (Media I'd Like to Find) just for fun, but due to the ambiguousness of that abbreviation, I did not want use that name. Then, I decided to call it Dive (Data Inventory with Various Enhancements), because it provides the features of its predecessor as well as some new ones.


Further information about the project.

Python poweredThe Dive project is written in Python and platform independent.

Even though, some of its predecessors came with a graphical user interface, it does not have one, yet.

As already mentioned, Dive basically is a media content finder which allows to search data on storage media that is currently not inserted or connected.

The project consists of two components, a content file builder and a content finder. These are portable, which means that you can copy the created content files e. g. onto a memory stick along with the Dive project files to use it from various systems. Details about the components can be found below.

Content files

Before Dive can be used to find any data, it needs to read out the content information from the media whose contents should be searchable and write that information into content files, which (for now) simply are unencrypted plain text files.

These content files can be rudimentarily compared with image files, with the difference, that image files contain actual data instead of just some content information like file and directory names.

They can be created from all kinds of storage media (such as hard disks, flash drives, memory cards, network drives, floppy disks, data CDs, data DVDs, etc.) as long as the media that contains the data is supported and readable by the system.

As soon as the content file has been created, the medium is no longer required to locate data on it, but it is still required to access the data, of course. However, if you afterwards change any data on that medium (e. g. by adding or deleting files), you have to create a new content file, because it will not be updated automatically.

Deep Dive feature

This feature allows to additionally store the content information of certain archive file types, but only if these are not password protected.

Supported archive file types are:

  • ACE (requires the UnACE tool, see the requirements section below for further information)
  • RAR (requires the UnRAR tool, see the requirements section below for further information)
  • TAR (supported natively, also if Bzip2 or Gzip compressed)
  • ZIP (supported natively).

Future plans

The following project features were planned, but never implemented.

  • Additional database support to write the content information into a database instead of files.
  • Additional content information in general (e. g. file size as well as create, access and modify date).
  • Additional content information depending on the file type (e. g. ID3 tags from MP3 files).
  • Feature to automatically update existing content files if the content on a medium has changed.
  • Optional encryption of content files.
  • Platform independent graphical user interface (based on Qt, using PySide).
  • Support for Bzip2 and Gzip compressed files (compressed TAR archives already are supported).


What the project consists of and what each component is for.

Dive Content File Builder

The content file builder stores the content information of a directory or medium into a content file.

Content information can be excluded from the file by using an exclude pattern which supports wildcards as well as regular expression syntax.

Dive Content Finder

The content finder searches the content files for a given search term.

It also allows using a search pattern which supports wildcards as well as regular expression syntax.


Stuff that is required to get Dive running.


Software requirements in general:

  • Python framework (either of them, further information can be found here)
    • Python 2.x (version 2.7 or higher is recommended, may also work with earlier versions)
    • Python 3.x (version 3.2 or higher is recommended, may also work with earlier versions)

Deep Dive feature

Software requirements for the Deep Dive feature:

  • Python framework (either of them, further information can be found here)
    • Python 2.x (version 2.7 or higher, does not work with earlier versions)
    • Python 3.x (version 3.2 or higher, may also work with earlier versions)

Furthermore, depending on the archive file types from which you want to read out any content information, some additional archive extraction tools are required. These are available for various platforms and can be downloaded for free.

The following tools are supported:

  • UnACE (version 2.50 or higher is recommended, may also work with earlier versions)
  • UnRAR (version 4.2.3 or higher is recommended, may also work with earlier versions)


How to use Dive.

The following usage example shows how to execute the Python script on the shell of a Unix-like system.

If you do not know, how to run Python scripts on your operating system, you may click here.

The project also comes with some help files which contain fundamental documentation as well as usage examples for each component of the project.

Usually, each script requires command-line arguments to operate. So, to get an overview of all arguments available, simply run the script with the --help argument. For example:

$ ./dive-builder.py --help
usage: dive-builder.py -d DIR_DESTINATION -f CONTENT_FILE -s DIR_SOURCE [-c]
                       [-e PATTERN_EXCLUDE] [-h] [-i]
                       [--include-ace BIN_UNACE] [--include-rar BIN_UNRAR]
                       [--include-tar] [--include-zip] [--regex]
                       [-r REPLACE_STRING] [--version]

Create a content file from a directory or media.

required arguments:
  -d DIR_DESTINATION, --destination-directory DIR_DESTINATION
                        destination directory (where to create the content
                        file in)
  -f CONTENT_FILE, --content-file CONTENT_FILE
                        name of the content file to create
  -s DIR_SOURCE, --source-directory DIR_SOURCE
                        source directory (from which to gather the contents)

optional arguments:
  -c, --case-sensitive  do not ignore the case of the given exclude pattern
                        pattern to exclude certain files or directories from
                        the content file (case-insensitive, multiple patterns
                        separated via semicolon)
  -h, --help            print this help message and exit
  -i, --ignore-read-errors
                        ignore read errors while gathering content
  --include-ace BIN_UNACE
                        include the content from ACE archive files (requires
                        'unace' binary)
  --include-rar BIN_UNRAR
                        include the content from RAR archive files (requires
                        'unrar' binary)
  --include-tar         include the content from TAR archive files (also
                        supports Bzip2 and Gzip compressed TAR archives)
  --include-zip         include the content from ZIP archive files
  --regex               use regex syntax for the search term instead of just
                        asterisk wildcards and semicolon separators (for
                        details see the section "Regular expression
                        operations" inside the Python documentation)
  -r REPLACE_STRING, --replace-source-directory REPLACE_STRING
                        replace the source directory path with a user-defined
                        string inside the content file
  --version             print the version number and exit

Further information and usage examples can be found inside the documentation
file for this script.

Current state

What is the state of development?

Dead-end sign The Dive project has definately been discontinued.

The main reason for this decision was the fact that today, compact discs containing data and backups have been largely replaced by flash media and external hard disk drives with quite some space, so Dive has lost the most of its importance.

Furthermore, the project is way too immature in modern times. For example, like in all its predecessors, the information is being stored inside plain, unencrypted text files instead of a database.

However, it may still be useful for people without any demands who use e. g. multiple flash media, hard disk drives or whatever.

The latest release is still available and can be downloaded below. Feel free to fork.


Download the project.

Below you can download the latest version of the project for different Python frameworks. Further information about these frameworks can be found here.



Dive logo

Release date:


File size:

< 30 KB

Current state:





MIT License


Python 2.x
Python 3.x


tgz | zip
tgz | zip