The Dive project is a media content finder which allows to search for data on various storage devices without requiring the devices containing the data itself.
This project has officially been discontinued. Details can be found here.
In the late 1990's I had many compact discs containing tools,
projects and backup data. Finding data often took some time,
because I had to insert the disc from which I thought it contained
the files I needed, but sometimes I inserted the wrong disc, so I
had to remove it from the drive again and look for the data on
Due to this, I wrote CDCF (Compact Disc Content Finder), a very basic but quite fast interactive command line tool for MS-DOS. The tool consisted of two simple components, a content file builder which stored the content information of a disc into a file and a content finder which allowed searching the created content files for a given search term.
A few days later, a buddy saw me using this tool and was very interested in it, due to the fact, that he also had quite some data discs. At that time CDCF was still pretty alpha version like, so I had to revise it a little to make it more user-friendly. I gave him the revised copy of the tool and after some weeks quite many of my buddies used it.
In 2004 some of them asked me if I could rewrite CDCF
as a Windows application with a graphical user
interface. I did not have much spare time back then, so I created
CDCFWin, a temporary solution that simply had the
demanded interface but no new features at all.
Three years later CDCFWin was replaced by DataInventory, a completely new project that did not contain any source code from its predecessors. It also had a new easy-to-use interface as well as more features like excluding data when creating content files or using wildcards when searching for some content.
The DataInventory project worked fine so far, but I did
not like the facts that it was slower than both of its
predecessors and that it was running on Windows
operating systems, only. So, I decided to completely redevelop the
tool once again (also without any code from its predecessors) with
the primary target to make it faster as well as platform
The new project also needed a name. A buddy suggested to call it MILF (Media I'd Like to Find) just for fun, but due to the ambiguousness of that abbreviation, I did not want use that name. Then, I decided to call it Dive (Data Inventory with Various Enhancements), because it provides the features of its predecessor as well as some new ones.
The Dive project is written in Python and platform independent.
Even though, some of its predecessors came with a graphical user interface, it does not have one, yet.
As already mentioned, Dive basically is a media content finder which allows to search data on storage media that is currently not inserted or connected.
The project consists of two components, a content file builder and a content finder. These are portable, which means that you can copy the created content files e. g. onto a memory stick along with the Dive project files to use it from various systems. Details about the components can be found below.
Before Dive can be used to find any data, it needs to read out the content information from the media whose contents should be searchable and write that information into content files, which (for now) simply are unencrypted plain text files.
These content files can be rudimentarily compared with image
files, with the difference, that image files contain actual data
instead of just some content information like file and directory
They can be created from all kinds of storage media (such as hard disks, flash drives, memory cards, network drives, floppy disks, data CDs, data DVDs, etc.) as long as the media that contains the data is supported and readable by the system.
As soon as the content file has been created, the medium is no longer required to locate data on it, but it is still required to access the data, of course. However, if you afterwards change any data on that medium (e. g. by adding or deleting files), you have to create a new content file, because it will not be updated automatically.
This feature allows to additionally store the content information of certain archive file types, but only if these are not password protected.
Supported archive file types are:
The following project features were planned, but never implemented.
Additional database support to write the content information into a database instead of files.
Additional content information in general (e. g. file size as well as create, access and modify date).
- Additional content information depending on the file type (e.
tags from MP3
Feature to automatically update existing content files if the content on a medium has changed.
Optional encryption of content files.
The content file builder stores the content information of a directory or medium into a content file.
The content finder searches the content files for a given search term.
Software requirements in general:
Python framework (either of them, further information can be found here)
Python 2.x (version 2.7 or higher is recommended, may also work with earlier versions)
Python 3.x (version 3.2 or higher is recommended, may also work with earlier versions)
Software requirements for the Deep Dive feature:
- Python framework (either of them, further
information can be found here)
- Python 2.x (version 2.7 or higher, does not work with earlier versions)
- Python 3.x (version 3.2 or higher, may also work with earlier versions)
- Python 2.x (version 2.7 or higher, does not work with earlier versions)
Furthermore, depending on the archive file types from which you want to read out any content information, some additional archive extraction tools are required. These are available for various platforms and can be downloaded for free.
The following tools are supported:
UnACE (version 2.50 or higher is recommended, may also work with earlier versions)
UnRAR (version 4.2.3 or higher is recommended, may also work with earlier versions)
The following usage example shows how to execute the Python script on the shell of a Unix-like system.
If you do not know, how to run Python scripts on your operating system, you may click here.
The project also comes with some help files which contain fundamental documentation as well as usage examples for each component of the project.
Usually, each script requires command-line arguments to operate. So, to get an overview of all arguments available, simply run the script with the --help argument. For example:
$ ./dive-builder.py --help
usage: dive-builder.py -d DIR_DESTINATION -f CONTENT_FILE -s DIR_SOURCE [-c]
[-e PATTERN_EXCLUDE] [-h] [-i]
[--include-ace BIN_UNACE] [--include-rar BIN_UNRAR]
[--include-tar] [--include-zip] [--regex]
[-r REPLACE_STRING] [--version]
Create a content file from a directory or media.
-d DIR_DESTINATION, --destination-directory DIR_DESTINATION
destination directory (where to create the content
-f CONTENT_FILE, --content-file CONTENT_FILE
name of the content file to create
-s DIR_SOURCE, --source-directory DIR_SOURCE
source directory (from which to gather the contents)
-c, --case-sensitive do not ignore the case of the given exclude pattern
-e PATTERN_EXCLUDE, --exclude PATTERN_EXCLUDE
pattern to exclude certain files or directories from
the content file (case-insensitive, multiple patterns
separated via semicolon)
-h, --help print this help message and exit
ignore read errors while gathering content
include the content from ACE archive files (requires
include the content from RAR archive files (requires
--include-tar include the content from TAR archive files (also
supports Bzip2 and Gzip compressed TAR archives)
--include-zip include the content from ZIP archive files
--regex use regex syntax for the search term instead of just
asterisk wildcards and semicolon separators (for
details see the section "Regular expression
operations" inside the Python documentation)
-r REPLACE_STRING, --replace-source-directory REPLACE_STRING
replace the source directory path with a user-defined
string inside the content file
--version print the version number and exit
Further information and usage examples can be found inside the documentation
file for this script.
The Dive project has definately been discontinued.
The main reason for this decision was the fact that today, compact discs containing data and backups have been largely replaced by flash media and external hard disk drives with quite some space, so Dive has lost the most of its importance.
Furthermore, the project is way too immature in modern times. For example, like in all its predecessors, the information is being stored inside plain, unencrypted text files instead of a database.
However, it may still be useful for people without any demands who use e. g. multiple flash media, hard disk drives or whatever.
The latest release is still available and can be downloaded below. Feel free to fork.
Below you can download the latest version of the project for different Python frameworks. Further information about these frameworks can be found here.
|File size:||< 30 KB|
|Framework:||Python 2.x||Python 3.x|
|Download:||tgz | zip||tgz | zip|