History Data
Version 2.4
Copyright © 2000-2002
This white paper describes a methodology for organizing, annotating, and protecting your digital photos and audio/video files to maximize their value for reliving memories and reconstructing past events.
Do you enjoy reliving memories and sharing them with others? Have you ever taken pictures, shot video, or made audio recordings? Do you own a computer? If so, History Data is for you.
History Data is a methodology for managing digital recordings of real-world events. It’s a way to organize, annotate, and protect digital photos and audio/video files to maximize their value for reliving memories, telling stories, and reconstructing past events. The term “History Data” can also refer to sets of files managed according to the History Data approach.
We all enjoy stories, especially when those stories involve the events and people in our lives. Over the past century, technological advances such as still photography and video have given us tools to make recordings so we can vividly relive our stories through the sights and sounds of our past. And in recent years the latest advances have given us new, digital versions of traditional recording technologies.
Today you can buy still cameras, audio recorders, and video camcorders that let you make new recordings in digital form. Also available are a variety of scanners and audio/video capture devices that let you convert traditional film and tape recordings into digital content. The digital approach offers many advantages over traditional analog technologies including flexible use, ease of sharing, and compact storage (see Why Digital? on Page 24 for more on this topic).
Do you own a digital camera or camcorder? Do you scan pictures or digitize analog audio/video tapes? Do you purchase Photo CDs, or receive pictures from friends via email? Chances are you’re already building a collection of digital content that tells a story: a history of you and the people in your life.
The good news is that digital technology keeps getting better and cheaper, so over time your collection will grow into a rich history with more and more content of better and better quality. The bad news is that as your collection of content grows to hundreds and eventually thousands of files you may find yourself drowning in a flood of disorganized data.
Your content files are the digital equivalent of traditional photo prints, audio cassettes, and video tapes. But what should you do with them? How should you store them? How should you organize them? How can you annotate them to record dates, places, and names? How can you protect them to make sure they don’t become damaged or unusable over time? How will you find one file among thousands?
The History Data approach helps you get the most out of your picture, sound, and video files while keeping afloat of the rising tide of digital content. Instead of letting files accumulate haphazardly in the digital equivalent of a shoebox, you can use History Data to:
·
Organize content chronologically
Easily locate all files from any given time period. View all types of
content from all sources in chronological order - like a documentary of your
life.
·
Annotate content with descriptive information
Describe people, places, things, equipment, settings, or any other aspect
of your content with extensible attributes. Locate and filter files using
various combinations of attributes.
·
Protect content from damage and incompatibility
Store, access, and share all content and descriptive information without
any special software or proprietary file formats. Back up and convert files to
protect from damage and obsolescence.
The History Data approach is so simple you can implement it entirely by hand. This white paper includes all the information you need to get started and turn your digital content into History Data using standard tools included in all modern operating systems. Get the most out of your content now and in the future, share it with any audience on any platform, and stay organized as your collection grows over time into a rich digital history of your life.
From time to time, you may want to visit the History Data web site, where you’ll find updated versions of this document as well as other useful information. If you use Microsoft Windows, you’ll also be able to download various History Data software tools. The tools aren’t necessary for implementing the History Data methodology, but they’re free and they can help automate key aspects of managing your content.
The History Data methodology (from here on referred to simply as “this plan”) includes recommendations for file naming conventions, folder structure, and maintenance of data over time. This plan is designed to help you achieve the following:
This section gives a quick overview of the History Data methodology. Not much explanation is provided as the concepts introduced here will be covered in detail in the rest of this document.
Digital picture and audio/video content is stored in files. Although your collection may include many different types of file formats, every file has a filename that can be used to describe its contents. Filenames are portable and universally accessible. For these reasons, the metadata used to describe History Data content files is stored right in their filenames. You can use either History Data 1.x filenames (shorter and easier to read) or History Data 2.x filenames (for detailed, extensible annotation):
History Data is different from other kinds of data on your computer, so it’s stored in its own separate folder tree. In addition to grouping all your History Data content into one place, the folder tree helps you organize your content chronologically and lets you quickly locate files from any given time period. The basic idea is to create a folder for each year, and then to create subfolders as needed for months, out-takes, and so on. A sample History Data folder tree might look like this:
|
C:\History\1999 |
Files recorded in 1999 |
|
C:\History\2000 |
Files recorded in 2000 |
|
C:\History\2000\Out |
Out-takes from 2000 |
|
C:\History\2001\01~04 |
January through April 2001 |
|
C:\History\2001\01~04\Alt |
Alternate versions from January through April |
|
C:\History\2001\01~04\Out |
Out-takes from January through April |
|
C:\History\2001\05~07 |
May through July 2001 |
|
C:\History\2002\08~12 |
August through December 2001 |
Digital pictures don’t fade, turn yellow, crack, or accumulate spots and fingerprints the way film and prints do. And unlike analog tapes, digital audio/video recordings don’t suffer from wear, drop-outs, wrinkling, and jamming. But digital data faces its own problems, such as incompatibility, obsolescence, accidental corruption, malicious destruction, and failure of storage media. To protect your digital content for the long haul, the History Data methodology recommends that you:
This plan offers you a choice between two ways of naming your files: the History Data 1.x format or the History Data 2.x format. History Data 1.x filenames are shorter and easier to read, whereas History Data 2.x filenames allow for more detailed annotation of your files. Whichever format you choose, you can still
follow all the other portions of the History Data approach, because both 1.x and 2.x filenames sort chronologically. However, the two types of filenames will not sort properly if mixed together in the same folders, so you should choose just one format and use it for all your History Data files.
Choose History Data 1.x filenames if:
Choose History Data 2.x filenames if:
History Data 1.x filenames are made up of 2 to 4 attributes: Captured Date, Sequence Number (optional), Comment (optional), and Extension.
2002-01-27 @18-37 #001 Birthday party.jpg
2002-01-27 @18-37 #002 Blowing candles.mpg
2002-01-27 @18-45-56 And many more.wav
2002-02-15 @10 #001.tif
2002-02-15 @10 #002.tif
2002-03-31.jpg
The Captured Date attribute is required and should be entered according to the Date/Time format. Basically, you use the form Year-Month-Day @Hour-Minute-Second, with hyphens separating the components and @ (“at sign”) separating the date from the time. Putting this date at the beginning of your filenames means that they will arrange themselves in chronological order when you sort them alphabetically. Captured Date should tell you when the real-world events captured in the file actually happened, which is not necessarily the same as the date your file was digitized or last modified. For example, a 35mm picture shot in 1987 which was then scanned in 2001 should have a Captured Date in 1987. A digital camera picture taken in 1999 then edited in 2002 should have a Captured Date in 1999.
What if you don’t know when the file contents were captured exactly down to the second? You can leave off Second, Minute, Hour, or even Day and Month. You can also use a range such as “between January 1987 and May 1991”. For full details see Date/Time Format on Page 21.
What if you have no idea when the file contents were captured? It’s always possible to come up with some information about when file contents were recorded. In some cases, you’ll have to do some detective work and make an educated guess. For ideas, see Tips for Dating Files on Page 28.
The Sequence Number attribute is normally optional, but you should include Sequence Numbers any time you have two or more files that share the same Captured Date. For example, if you have 3 pictures and you know they were taken on May 25th 2001 but you don’t know what times they were taken, give them a Captured Date of 2001-05-25 and Sequence Numbers of #001, #002, and #003. The Sequence Number attribute ensures that all your files sort chronologically even when some of your files have Captured Dates that aren’t unique.
When included in a History Data 1.x filename, Sequence Number should be preceded by # (“pound sign”) and should be entered according to the Integer format, which basically means it should have a fixed number of digits with leading zeros as needed. For full details see Integer Format on Page 21.
How can you tell which file should be first, which one should be second, and so on? Even if you don’t know the date and time of each file precisely, you can usually figure out what order your files should appear in. If your files were digitized from analog media, look at the frame numbers on your film negatives, the lab codes on the back of your photo prints, or the track sequence of your video or audio tapes. If your files come from a digital camera or audio recorder, the original filenames will usually include serial numbers like DSC00319.JPG, DSC00320.JPG, DSC00321.JPG, etc. For ideas on guessing the sequence of files from their contents see Tips for Dating Files on Page 28. If all else fails, put the files in the order you prefer to see them: imagine you’re putting photos in an album or combining audio/video tracks into an edited sequence.
The Comment attribute is optional. One reason to include a Comment is to describe file contents so that subject matter can be determined without having to open the file. This is especially useful for video and audio files which are hard to preview. Another reason is to annotate subject matter that can’t easily be identified from the contents, such as place where the file was captured, people or things in the subject matter that are not easily recognizable, etc.
When included in a History Data 1.x filename, Comment must begin with a letter and is limited to 150 characters in length. Comment should be entered according to the Text format, which basically means you can use any free-form text you like including punctuation except for certain reserved characters. For full details see Text Format on Page 23.
The required Extension attribute can be considered one of the attributes of a History Data 1.x filename, but it’s really just an ordinary Windows filename extension. The Extension attribute should be entered according to the Extension format, which basically means it should be 3 characters long, it should be preceded by a period, and it should identify the format of the file (JPG, TIF, MPG, WAV, etc.). For full details see Extension Format on Page 23.
History Data 2.x filenames are made up of one or more attributes followed by a standard Windows filename extension. Each attribute captures a different piece of information about your file, such as date of recording, name of person who appears in the picture, type of camera used, etc. Attributes are enclosed by {curly brackets} and consist of an Attribute Label followed by = (“equal sign”) and an Attribute Value.
History Data 2.x filenames can do everything 1.x filenames can do. For example, the filename in the diagram above contains all the same information as the filename in the diagram at the beginning of the History Data 1.x section. But History Data 2.x filenames can also do a lot more.
History Data 2.x filenames can include as few or as many attributes as you need, limited only by total filename length. And each filename can include a different set of attributes depending on the kinds of information you want to associate with that file. See Attribute Types on Page 19 for a list of suggested attribute types. Following is a set of filenames that use different mixes of attributes:
{Date=2002-01-23}.jpg
{Date=2002-01-24}{Subject=John Doe}.jpg
{Date=2002-01-25}{Place=New York}.jpg
{Date=2002-01-26}{Comment=Birthday party}.jpg
{Date=2002-01-27}{Seq#=001}{Subject=John Doe}.jpg
{Date=2002-01-27}{Seq#=002}{Place=New York}.jpg
{Date=2002-01-27}{Seq#=003}{Comment=Birthday party}.jpg
{Date=2002-01-28}{Subject=John Doe}{Place=New York}{Comment=Birthday party}.jpg
Depending on your needs, Attribute Labels can be spelled out in full for ease of comprehension or abbreviated to save space. The only caveat is that all long and short labels should be unique. For example, you wouldn’t want to abbreviate both “Caption” and “Category” as “CA” or you wouldn’t be able to tell them apart. The list of suggested Attribute Types on Page 19 includes short and medium abbreviations for each. Following are 3 equivalent filenames that use different Label abbreviations:
{Captured Date=2002-01-27 @18-37}{Sequence Number=001}{Comment=Birthday party}.jpg
{Date=2002-01-27 @18-37}{Seq#=001}{Comm=Birthday party}.jpg
{DT=2002-01-27 @18-37}{SN=001}{CO=Birthday party}.jpg
You can include multiple Values within a single Attribute by separating them with ; (semicolon) characters. The list of suggested Attribute Types on Page 19 shows which ones logically accommodate multiple values. Following is a WAV sound file which includes a multi-value Subject attribute:
You can include nested sub-attributes that provide information about individual values in a top-level attribute. For example, if you use a Subject attribute with several values naming people in a picture, you can nest Position sub-attributes inside it that specify where each person appears in the frame. Sub-attributes should be enclosed in {curly brackets} just like top-level attributes, and they should immediately follow the value they apply to. Following is a TIF picture file with a Subject attribute that includes nested Position and Activity sub-attributes:
Each type of attribute should contain Values that conform to a specific Value Format. For example, the Captured Date attribute takes a Date/Time value, whereas the Subject attribute accommodates Text values. The list of suggested Attribute Types on Page 19 shows which Value Format each one takes. For a detailed description of each Value Format, see Value Formats on Page 21.
The attributes that appear at the beginning of a filename are important because they determine sort order. There are 4 special attributes in History Data 2.x that help you control sort order: Captured Date, Sequence Number, Component Number, and Version Number. You’ll usually only need to use 1 or 2 of them, but there may be situations in which it will take all 4 to ensure your files sort chronologically. Collectively, these 4 attributes can be referred to as a filename’s Chronology.
The first and most important Chronology attribute is Captured Date. It works exactly the same way in History Data 2.x as it does in 1.x (see Captured Date on Page 5 for details). Every filename should begin with a Captured Date, and its value should be as precise and specific as possible so that it has a good chance of being unique. Files with unique Captured Dates don’t need any other Chronology attributes.
The second of the Chronology attributes is Sequence Number. Again, it works exactly the same way in History Data 2.x as it does in 1.x (see Sequence Number on Page 6 for details). Whenever a set of files share the same Captured Date and aren’t just different components or versions of the same content, all the files should include a Sequence Number that is unique within the set. When included, Sequence Number should immediately follow Captured Date. Examples:
|
{DT=2002-01-24}.tif |
File recorded on 1/24 |
|
{DT=2002-01-25}.mpg |
File recorded on 1/25 |
|
{DT=2002-01-26}{SN=001}.jpg |
First file recorded on 1/26 |
|
{DT=2002-01-26}{SN=002}.tif |
Second file recorded on 1/26 |
|
{DT=2002-01-27}.wav |
File recorded on 1/27 |
|
{DT=2002-01-28}{SN=001}.tif |
First file recorded on 1/28 at unknown time |
|
{DT=2002-01-28}{SN=002}.jpg |
Second file recorded on 1/28 at unknown time |
|
{DT=2002-01-28 @10-29}.tif |
File recorded on 1/28 at 10:29am |
Component Number is new in History Data 2.x. You probably won’t need to use Component Number very often, but it will be useful whenever you have several files which contain components of the same originally captured content. For example, some digital cameras can record a brief audio clip whenever you take a picture, so you could end up with separate JPG and WAV files for every shot you take. These separate files are really just 2 components of the same originally captured content: together, they describe the same set of real-world events. For another example, consider the special cameras used for passport pictures and in many instant photo vending machines which take 4 pictures at the same time from slightly different angles. If you scan these 4 pictures and produce 4 separate TIF files, they don’t have any meaningful sequence: they’re really just 4 components of the same originally captured content.
Whenever a set of files are components of the same originally captured content, all the files should share the same Captured Date (and Sequence Number if present), and each file should have a Component Number which is unique within the set. It doesn’t matter which one comes first, second, and so on since they don’t have any meaningful chronological sequence, so number them in the order you prefer. When included, Component Number should immediately follow Captured Date (and Sequence Number if present). Examples:
|
{DT=2002-01-23}.jpg |
Shot on 1/23 |
|
{DT=2002-01-24}{CN=001}.jpg |
First component of shot on 1/24: picture |
|
{DT=2002-01-24}{CN=002}.wav |
Second component of shot on 1/24: audio clip |
|
{DT=2002-01-25}.jpg |
Shot on 1/25 |
|
{DT=2002-01-26}{SN=001}.jpg |
First shot on 1/26 |
|
{DT=2002-01-26}{SN=002}{CN=001}.jpg |
First component of second shot: picture |
|
{DT=2002-01-26}{SN=002}{CN=002}.wav |
Second component of second shot: audio clip |
|
{DT=2002-01-26}{SN=003}.jpg |
Third shot on 1/26 |
The last Chronology attribute, Version Number, is also new in History Data 2.x. Version Number should be used whenever you have a set of files that are really just different versions of the same originally captured content. For example, you might create several files by scanning a picture at several different resolutions, but all these files are really just different versions of the same original picture. Other examples include files that have been cropped, edited, rotated, filtered, retouched, adjusted, compressed, or changed in any other way while still representing the same set of real-world events.
Whenever a set of files are different versions of the same originally captured content, all the files should share the same Captured Date, Sequence Number (if present), and Component Number (if present), and each file should have a Version Number that is unique within the set. It doesn’t matter which one comes first, second, and so on since they don’t have any meaningful chronological sequence, so number them in the order you prefer. When included, Version Number should immediately follow Captured Date, Sequence Number (if present), and Component Number (if present). Examples:
|
{DT=2002-01-23}.tif |
Shot on 1/23 |
|
{DT=2002-01-24}{VN=001}.tif |
First version of shot on 1/24 |
|
{DT=2002-01-24}{VN=002}.tif |
Second version of shot on 1/24 |
|
{DT=2002-01-25}.tif |
Shot on 1/25 |
|
{DT=2002-01-26}{SN=001}.tif |
First shot on 1/26 |
|
{DT=2002-01-26}{SN=002}{VN=001}.tif |
First version of second shot on 1/26 |
|
{DT=2002-01-26}{SN=002}{VN=002}.tif |
Second version of second shot on 1/26 |
|
{DT=2002-01-26}{SN=003}.tif |
Third shot on 1/26 |
|
{DT=2002-01-26}{SN=004}{CN=001}.tif |
First component of fourth shot |
|
{DT=2002-01-26}{SN=004}{CN=002}{VN=001}.tif |
First version of second component |
|
{DT=2002-01-26}{SN=004}{CN=002}{VN=002}.tif |
Second version of second component |
|
{DT=2002-01-26}{SN=004}{CN=003}.tif |
Third component of fourth shot |
As seen in the preceding sections, History Data 2.x filenames have a lot of features and can take several different forms. The various rules and guidelines for 2.x filenames are summarized here for reference.
· History Data 2.x filenames are made up of one or more Attributes followed by an Extension
· Attributes are enclosed in {curly brackets}
· Attributes consist of a Label followed by = (“equal” sign) and one or more Values
· Labels may be spelled out in full or abbreviated
· Each Attribute has a specific Value Format (Text, Integer, Date/Time, etc.)
· Values should conform to the Value Format for the Attribute they appear in
· Multiple Values within an Attribute are separated by ; (semicolon) characters
· Attributes may appear in different contexts within a filename:
o Top-level attributes provide information about the file itself
o Nested sub-attributes provide information about Values in higher-level attributes
· Nested sub-attributes should immediately follow the Value they are associated with
· Each type of attribute should only be included once at the top level
· Filenames may not exceed 255 characters and may not contain certain reserved characters
· All filenames should begin with Captured Date
· The combination of Captured Date and Sequence Number should be unique
o Files with unique Captured Dates don’t need Sequence Numbers
o When a set of files share the same Captured Date and are not components or versions of the same content, all of them should include a Sequence Number that is unique within the set
· When included, Sequence Number should immediately follow Captured Date
· When a set of files are different components of the same originally captured content:
o All of the files should have the same Captured Date (and Sequence Number if present)
o Each file should include a Component Number that is unique within the set
· When included, Component Number should follow Captured Date (and Sequence Number if present)
· When a set of files are different versions of the same originally captured content:
o All the files should have the same Captured Date, Sequence Number (if present), and Component Number (if present)
o Each file should include a Version Number that is unique within the set
· When included, Version Number should immediately follow Captured Date, Sequence Number (if present), and Component Number (if present)
· All other attribute types are optional
· All other attribute types may appear in any order so long as they don’t precede Captured Date, Sequence Number, Component Number, and Version Number
· All filenames should end with Extension
Following is a minimal History Data 2.x filename for a picture taken at 9:45:29pm on August 3rd, 2000:
{DT=2000-08-03 @21-45-29}.jpg
Following are two files of different types that were both captured in September 2000:
{DT=2000-09}{SN=001}.wav
{DT=2000-09}{SN=002}.mpg
Following are two versions of the same file, one of which has higher resolution and better quality:
{DT=2000-10}{SN=007}{VN=001}{TL=Desert Campfire}{VD=800dpi}{QU=080}.jpg
{DT=2000-10}{SN=007}{VN=002}{TL=Desert Campfire}{VD=200dpi}{QU=050}.jpg
Following are two pictures taken between 1977 and 1982 with different members of the Doe family in different positions:
{DT=1977~1982}{SN=001}{SU=John Doe{PS=Left};Jane Doe{PS=Right}}.jpg
{DT=1977~1982}{SN=002}{SU=Junior Doe{PS=Front};Jane Doe{PS=Back}}.jpg
Following is a picture entitled “Snow River” which shows falling snow and running water in Yosemite National Park taken sometime in 1971 by Ansel Adams using his Hasselblad camera which was scanned by John Doe at 1600 dots per inch using an Epson scanner and which is rated as 75/100 objective quality and 90/100 subjective interest:
{DT=1971}{SN=001}{TL=Snow River}{PL=Yosemite National Park, California}
{AC=Falling snow;Running water}{VD=1600dpi}{ED=John Doe}{QU=075}{IN=090}
{CR=Ansel Adams}{DV=Ansel Adams’ Hasselblad;Epson Scanner}.jpg
Where should you store History Data files? How will you find files from a particular time period? How can you create virtual albums with ad-hoc collections of files? This section will answer all these questions.
History Data files are different from other kinds of data on your computer (see Captured vs. Composed Data on Page 31). Because they’re different, and because most other files don’t sort chronologically by filename, History Data files should be stored in their own separate folder tree. The name and location of the top-level History Data folder aren’t critical, but for convenience and simplicity this plan recommends that you create a folder called “History” in the root of your dataset. For example, if you store your data in “C:\My Documents”, create a folder called:
C:\My Documents\History
Theoretically, you can mix any number of History Data files in the same folder. However, the ease of locating individual files and the performance of thumbnail browser utilities may worsen as the number of files in a folder grows to many hundreds or thousands. Therefore this plan recommends that you break History Data up into subfolders within the top-level “History” folder. Since one of the main goals of this plan is to organize History Data chronologically, first-level subfolders within “History” should group files by time period. The simplest way to do this is to set up a subfolder for each year for which you have content. For example, if you have content for the years 1999 through 2001, create Time subfolders called:
|
C:\My Documents\History\1999 |
|
C:\My Documents\History\2000 |
|
C:\My Documents\History\2001 |
Depending on your preference and on the performance of your computer systems, each subfolder should contain anywhere from several dozen to several hundred files. If there are certain years for which you have very few or no files, you can create subfolders that group these years together in the form “Year1~Year2”. For example, if you have many files for 1994 and 1998 but very few for the years in between, create Time Subfolders called:
|
C:\My Documents\History\1994 |
(1994) |
|
C:\My Documents\History\1995~1997 |
(1995 through 1997) |
|
C:\My Documents\History\1998 |
(1998) |
If there are too many files for a given year, you can break that year up into multiple subfolders in the form “Month1~Month2”. For example, if you have thousands of files for the year 1993, you could break that year up into separate folders called:
|
C:\My Documents\History\1993\01~08 |
(January through August 1993) |
|
C:\My Documents\History\1993\09~11 |
(September through November 1993) |
|
C:\My Documents\History\1993\12 |
(December 1993) |
When you’re done setting up your Time Subfolders, they should span the entire time period for which you have content without any gaps or overlaps, and each folder should contain an appropriate number of files. It’s worth mentioning that the Time Subfolder structure can easily be changed at any time by simply adding, renaming, or deleting subfolders and moving files between them. This plan is designed so that you’ll be able to do this without renaming any files, so don’t worry if you’re not sure exactly how to set up your subfolders: if you change your mind or need to add or remove large numbers of files down the road, you can always change the structure without much hassle.
Normally, all History Data captured within a time period should reside in the Time Subfolders for that period (e.g., files captured in 1999 should reside in the “1999” subfolder). However, within each Time Subfolder, you can create Category Subfolders to hold files that are of lesser interest, poorer quality, or which are merely alternate versions. For example, within the first-level “1999” Time Subfolder, you could create Category Subfolders called:
|
C:\My Documents\History\1999\Out |
(Out-takes) |
|
C:\My Documents\History\1999\Alt |
(Alternate versions) |
|
C:\My Documents\History\1999\Todo |
(Unfinished files that still need work) |
Category Subfolders such as these can reduce the number of less interesting or highly similar files in the Time Subfolders, making it easier and more enjoyable to view slideshows and locate your most desirable files. However, this plan recommends that you create such Category Subfolders sparingly since they weaken the chronological organization of the History Data folder structure. Also, when using Category Subfolders, be sure to move (rather than copy) files into or out of them since History Data filenames should be unique within your dataset and since copying would use up unnecessary storage space.
At some point you may want to create folders to hold ad-hoc collections of History Data files grouped by topic or file type, such as “Birthdays”, “Videos”, or “Pictures of Me”. Or, you may want to store several History Data files alongside a textual document that describes their contents. Although many of the individual files in these virtual albums may be History Data files, the collections themselves aren’t really History Data since they’re not chronological. This plan recommends that you create these topic-based folders elsewhere in your dataset to avoid disrupting the chronological organization of the History Data folder structure. For example, if your dataset is stored in “C:\My Documents”, you could create an “Albums” folder outside the “History” folder with subfolders called:
|
C:\My Documents\Albums\Birthdays |
|
C:\My Documents\Albums\Videos |
|
C:\My Documents\Albums\Pictures of Me |
History Data files are most useful for reconstructing past events when they are stored in the chronological History Data folder structure, so they shouldn’t be moved out of “History” into album folders. They also shouldn’t be copied into album folders since this would take up unnecessary storage space and violate the guideline that History Data filenames should be unique within your dataset. Therefore this plan recommends that you create ad-hoc collections by placing shortcuts in the album folders that point to History Data files stored in the History Data folder structure. This is just as easy to do as copying or moving the files, and most thumbnail browser and slide show utilities work just as well with folders full of shortcuts as they do with the original files. In addition, this approach allows you to create multiple overlapping albums without wasting storage space, and it makes it safe to delete or reorganize albums at any time without risking the loss of any History Data content.
One of the main goals of this plan is to help you protect History Data files so their contents aren’t lost over time. In many ways, this will be easier than protecting traditional analog content. If you’ve ever worked with old pictures, you’ve probably noticed they tend to fade, yellow, crack, and accumulate smudges, spots, and fingerprints over time. As cassettes and videotapes age they suffer drop-outs, pick up noise, wear out, become sticky, and can wrinkle and jam. Digital content is automatically protected from many of these problems because it’s independent of its media: you can make perfect copies and move it around so it doesn’t degrade with the aging of any one particular device.
However, digital information faces its own problems. This section provides guidelines for overcoming:
This section provides recommendations for long-term protection of History Data from incompatibility, hardware and software obsolescence, and accidental corruption or degradation of data.
Over the years, many file formats have been devised to store picture, audio, and video data. Some have enjoyed widespread adoption and have become mainstream standards, while others have fallen into disuse or have become niche standards used only for special applications. This plan recommends that you use only mainstream standard formats that are currently in widespread use.
Using standard formats offers many advantages, including maximizing the odds that people you send files to will be able to enjoy them and maximizing the number of software applications you can use to work with your files. The table below lists the most common standard formats in use today. For tips on using these formats, see Tips for Scanning Pictures on Page 37 and Tips for Digitizing Audio on Page 38.
|
Content Type |
Name |
Extension |
Compression |
|
Picture |
TIFF |
TIF |
No* |
|
Picture |
JPEG |
JPG |
Yes, Lossy |
|
Audio |
Wave |
WAV |
No |
|
Audio |
MPEG Audio Layer 3 |
MP3 |
Yes, Lossy |
|
Audio/Video |
MPEG-1, MPEG-2 |
MPG |
Yes, Lossy |
|
Audio/Video |
QuickTime |
MOV |
Yes, Lossy* |
|
Audio/Video |
Audio/Video Interleaved |
AVI |
No* |
* Supports both compressed and uncompressed content, but is typically used as shown above
Another good idea is to use as few formats as possible. One reason for this is that you’ll get to know the strengths and weaknesses of all your formats more quickly so you can make the best use of them. Another reason is that you’ll be able to use your favorite software tools and apply your best techniques to all files of the same type. Why store one half of your pictures in a format that supports advanced features and works with your favorite image editing software, and then store the other half in a format that doesn’t? And finally, the less formats you use, the less formats you’ll have to keep track of as you watch for obsolescence over time (this will be discussed further in Convert Periodically below).
One important decision to make when creating History Data files is whether to use compression, a feature supported by many file formats. Compressed files take up less space for the same or similar content, so in theory compression sounds like a good thing. However, compressed files are more susceptible to data loss due to corruption. And lossy compression, the technique used most often to shrink multimedia files, results in degradation of quality every time you save a file.
The probability of a file getting corrupted at any given point in time is small, and the extent of quality degradation each time you save with lossy compression may be modest. But the risk of data loss and the amount of cumulative quality degradation tends to increase over time. Therefore, to protect your History Data for the long term, this plan recommends using uncompressed file formats whenever possible. For more on this topic see Why Uncompressed? on Page 28.
Removable media such as floppy diskettes, Zip disks, recordable CDs or DVDs, and tape drives can be very useful for backing up or sharing files, but this plan recommends using hard disks for primary storage of all your History Data content. Hard disks offer many advantages, including:
However, given that uncompressed multimedia files can be very large, you may wonder whether you’ll have enough storage capacity to hold all your files. The good news is that hard disk storage is plentiful and cheap. As a benchmark, consider that as of this writing you can buy a high-performance 100GB drive for about $200, or $2 per gigabyte. The following table shows how this translates into storage of uncompressed picture, audio, and video data:
|
Content |
Quality |
Unit |
Size |
Units/100GB |
Cost/Unit |
|
Picture |
5 megapixel |
1 picture |
15MB |
6,666 pictures |
$0.03 |
|
Picture |
20 megapixel |
1 picture |
60MB |
1,666 pictures |
$0.12 |
|
Audio |
CD quality |
1 minute |
10MB |
10,000 minutes |
$0.02 |
|
Video |
DV quality |
1 minute |
200MB |
500 minutes |
$0.40 |
And there’s even better news: as storage technology is rapidly evolving, we can look forward to bigger, faster, and cheaper hard disks in the near future.
Information technology tends to evolve rapidly over time, and one consequence of this is that data you store today could become unusable in as little as 5-10 years due to hardware or software obsolescence. For example, as recently as 10 years ago 5¼ inch floppy diskettes were in widespread use, but you’d be hard pressed to find a computer that can read one today. Or consider the fact that many of the best-selling DOS word processing, spreadsheet, presentation, and database applications from a decade ago used file formats that can’t be read by today’s software.
To protect against obsolescence, this plan recommends that you periodically review the file formats and storage devices used to store your History Data and convert anything that is starting to fall out of mainstream use. Recordable CDs and Zip disks are popular today, but history suggests that this could change in as little as 5 years. Proven file formats like TIFF, JPEG, and MPEG may be the best choice for storing content today, but the industry is already hard at work developing replacements, and eventually software publishers tend to stop supporting older formats.
Plan on checking your History Data collection once every 5 years or so for impending obsolescence. Ask yourself these questions: “Am I using file formats or storage media that are becoming less popular?” “If I bought a new mainstream computer today and installed only the latest mainstream software, would I have trouble accessing any of my content?” If so, it may be time to convert. Be sure the conversion process doesn’t degrade the quality of your content, since you may have to convert multiple times over the years. The fewer file formats you use in your collection, the fewer formats you’ll have to watch for obsolescence, and the fewer different kinds of conversion you’ll have to tackle when the time comes.
This section outlines high-level recommendations for long-term protection of any kind of data from loss or failure of storage media, accidental or malicious destruction, and unauthorized access. Topics in this section are not covered in depth since they are not specific to History Data and are addressed in many other sources.
Performing regular backups is the single most important thing you should do to protect any data you care about, especially if you want to keep it for the long term. Unfortunately, backing up is like flossing: we know it’s important, but we don’t always do it as often as we should. Backing up requires some time and discipline, and these days hard disk technology is so reliable it’s easy to take for granted. But it’s a fact that although the risk is small, any hard disk you use could fail at any moment without warning. If it does, you probably won’t get your data back.
The good news is that current storage devices can copy large amounts of data very quickly, and current software can perform regularly scheduled backups automatically. All you have to do is decide how often you want to back up and where you want to store the copied data.
The answer to how often you should back up is a question: how much data are you willing to lose? If you back up once a month, then the most data you could lose is one month’s worth. If you don’t add to your History Data collection very often, and if most of what you add comes from analog media that you could easily digitize again, then backing up once every few months might be enough. On the other hand, if you often take pictures with a digital camera, you may be adding many irreplaceable files to your collection every week, and you might want to consider backing up several times per month. You’ll have to decide what works best for you, but if you’re not sure then once a month is a good place to start.
One of the main reasons for backing up is to protect against storage device failure. To accomplish this, data should be copied to a different physical storage device or to different media. It doesn’t matter whether you back up to a hard disk, floppy diskette, Zip disk, recordable CD, recordable DVD, or tape as long as the copy ends up on a destination device that won’t be affected if the source device fails.
Conventional wisdom holds that the best destination device for backup is a tape drive. Tape drives are usually rather slow, but the cartridges have historically been very inexpensive. The advantage is that with cheap media you can afford to make several copies of your data so you can keep several versions and store some copies offsite to protect against theft, fire, flood, or other damage to your building. However, one disadvantage with tape drives is that you can’t just read from them or write to them the way you do with normal disks, so backing up to tape can be more complicated than using other types of devices. Another disadvantage is that in recent years inexpensive tape drives have not kept up with the rapid increase in capacity of hard disks.
Floppy diskettes are too small for today’s multimedia files, but Zip disks and recordable CDs are popular choices for backup. They offer many of the advantages of tape, but are usually faster and can be written to and read from like normal drives so that backup is simpler. Tape cartridges often have larger capacities than Zip or CD media, but recordable DVD technology is becoming mainstream with prices and capacities comparable to the most popular tape formats. However, neither Zip, CD, nor DVD has enough capacity to back up today’s largest hard drives.
Using hard disks to back up data is an unconventional option, but one you should consider. Hard disks used to be too expensive relative to the other options, but that is no longer the case. If you need to back up 100GB of data, buying a second hard disk is cheaper than buying twenty 5GB recordable DVDs and much, much less expensive than buying a tape drive with comparable capacity. Hard disks are also much faster than the other options, especially for incremental backup where only the files changed since the last backup are copied. And a small premium will get you an external hard drive that can be easily disconnected for offsite storage.
Today’s operating systems do a good job of managing data stored on hard disks, but it’s still a good idea to do some periodic maintenance. Consider using your operating system’s defragmenter and error-checking utilities once or twice per year to keep your volumes clean and tuned for best performance.
Also, avoid using compressed volumes. Like data in compressed files, compressed volumes are fragile and are more susceptible to data corruption than normal volumes. Plus compressed volumes always use lossless compression, which usually achieves little or no reduction in size for multimedia files.
Anyone using a computer today faces some risk of losing data due to a malicious computer virus. If a virus wipes out all your data, you will probably notice immediately and be able to restore most of it from a backup. But some viruses insidiously damage data a few files at a time, so you might not notice right away and could still lose data by copying newer, damaged files over older backups.
The best way to protect against computer viruses is through a combination of regular backups and antivirus software. As of this writing there are several good antivirus products on the market and it shouldn’t be hard to find one that meets your needs. Other standard preventive measures include checking any files you receive from unknown sources before opening or running them.
Depending on your situation, security may or may not be a big issue. In some cases, your data may be at risk of inadvertent or malicious destruction, or you may want to prevent unauthorized access to private data. Think about your computing environment and decide whether your data needs protection from other people.
Is your computer in a physical location prone to theft or unauthorized access? Does your computer have a type of Internet connection that makes it especially visible to or vulnerable to hackers? Do you have toddlers capable of pressing the Delete key but too young to know what it does? Do you have content you want to keep private from others who frequent your household or workplace? If so, some security measures may be in order, such as:
The table below contains a list of suggested attribute types you can use in History Data 2.x filenames. The first 4 and last 2 attributes in the list have special significance, and this plan recommends that you use them as shown here. But the rest is up to you: feel free to create your own attribute types or make any other changes that meet your needs.
The first 3 columns show full and abbreviated names you can use for Attribute Labels. The Format column shows which Value Format each attribute takes and whether it accommodates single or multiple values (for a detailed description of formats see Value Formats on Page 19). The Notes column describes the attribute and indicates whether it can be used as a top-level attribute or a nested sub-attribute. The list shows attribute types in their recommended order, meaning that attributes from further down in the list would appear further to the right in filenames.
|
Long Name |
Medium Name |
Short Name |
Format |
Notes |
|
Captured Date |
Date |
DT |
Date/Time, Single |
(Top-level or sub-attribute of Caption) Date and time content was originally captured. |
|
Sequence Number |
Seq# |
SN |
Integer, Single |
Chronological position of this file within the set of all files that have the same Captured Date value. |
|
Component Number |
Com# |
CN |
Integer, Single |
Subjective position of this file within the set of all components of the same originally captured content. |
|
Version Number |
Ver# |
VN |
Integer, Single |
Subjective position of this file within the set of all versions of the same originally captured content. |
|
Title |
Title |
TL |
Text, Single |
Subjective title (not necessarily descriptive) given by the Creator of the content. |
|
Subject |
Subj |
SU |
Text, Multi |
Living or non-living things represented in the content. Examples: Albert Einstein, Woman, Crowd, Bambi, Dog, Mt. Everest, Golden Gate Bridge, Tree, Sunset. List multiple subjects in left-to-right order for photo files, or in order of appearance for audio/video files. |
|
Position |
Posn |
PS |
Text, Multi |
(Sub-attribute of Subject) Relative position of Subject within picture, sound, or video. Examples: Front, 3rd from left, Top right, Halfway through, 5 seconds before end. |
|
Place |
Place |
PL |
Text, Multi |
Places where content was captured. Examples: France, California, Manhattan, Central Park, 1600 Pennsylvania Ave, White House, Yosemite, Mt. Everest. Enter multiple values in the order of a traditional mailing address. |
|
Activity |
Actv |
AC |
Text, Multi |
(Top-level or sub-attribute of Subject) Activities engaged in by a Subject or occurring while content was captured. Examples: Smiling, Singing, Hiking, Falling, Raining. |
|
Version Description |
VDsc |
VD |
Text, Multi |
Description of how this version differs from other versions. Examples: subset of original, orientation, resolution, codec, transforms, filters, edits. |
|
Version Editor |
VEdr |
VE |
Text, Multi |
Names of the people who created this version. |
|
Version Date |
VDat |
VT |
Date/Time, Single |
Date and time this version was created.
|
|
Quality |
Qual |
QU |
Percent, Single |
Numerical rating of content objective quality (focus, color balance, color saturation, contrast, distortion, artifacts, noise, tonal balance, stability, etc.) in the range 0 through 100. |
|
Interest |
Intr |
IN |
Percent, Single |
Numerical rating of content subjective quality (interest, aesthetics, rarity, sentimental value, etc.) in the range 0 through 100. |
|
Creator |
Crtr |
CR |
Text, Multi |
(Top-level or sub-attribute of Caption) Names of the people who operated the devices that originally captured the content. |
|
Device |
Devc |
DV |
Text, Multi |
Names or descriptions of the devices used to capture the content. Examples: Nikon F1, Mary’s camcorder, Fisheye lens, UV filter, unidirectional microphone. |
|
Lens Orientation |
Lens |
LE |
Text, Single |
(Sub-attribute of Device) Orientation of lens on swivel-lens camera at time content was captured. Examples: Forward, Reverse. |
|
Flash |
Flash |
FL |
Text, Single |
(Sub-attribute of Device) Flash configuration used to capture still image content. Examples: On, Off, High, Low, Internal, External, Indirect. |
|
Shutter Speed |
Speed |
SP |
Text, Single |
(Sub-attribute of Device) Shutter speed used to capture still image content. Examples: 30s, 500ms, 250ns. |
|
Compression |
Comp |
CP |
Text, Single |
(Sub-attribute of Device) Compression mode used to capture content. Examples: Fine, Standard, Snapshot. |
|
Source |
Srce |
SO |
Text, Multi |
Source medium from which content was digitized. Examples: APS Negative, 35mm Positive, APS 4x7 Print, 8x10 Print, Digital (e.g., digital camera). |
|
Owner |
Ownr |
OW |
Text, Multi |
Names of people who owned medium on which content was captured. |
|
Batch |
Batch |
BA |
Text, Multi |
ID of batch within which content was captured (film roll ID, cassette ID, DCF directory number, DCF free characters, etc.). |
|
Volume |
Volm |
VO |
Integer, Single |
ID of volume within batch in which content was captured (film frame number, cassette track number, DCF file number from digital camera, etc.). |
|
Category |
Catg |
CG |
Text, Single |
Category which describes content. |
|
Keywords |
Keyw |
KW |
Text, Multi |
Keywords which describe content. |
|
Caption |
Capt |
CP |
Text, Multi |
Blocks of text associated with content (such as descriptive text written on front or back of picture). |
|
Comment |
Comm |
CM |
Text, Multi |
(Top-level or sub-attribute of Title, Subject, Place, Activity, Version Editor, Quality, Interest, or Creator) Free-form comment regarding any other aspect of the file or its contents. |
|
Extension |
Extn |
EX |
Extension, Single |
Filename extension corresponding to the file’s format. Examples: TIF, JPG, MPG, WAV, MP3. |
History Data filenames are made up of attributes, which are basically just fields that describe your files. Attributes may contain different kinds of information, such as words, numbers, or dates and times. Each of the attributes in a History Data 1.x or History Data 2.x filename uses one of the following formats.
The Date/Time format accommodates values that are a combination of date and time information. Date/Time values should be in the form:
YYYY-MM-DD @HH-NN-SS
Where the components have the following significance:
|
YYYY |
MM |
DD |
HH |
NN |
SS |
|
4-digit Year |
2-digit Month |
2-digit Day |
2-digit Hour |
2-digit Minute |
2-digit Second |
For example, August 3rd 2001 at 3:26:45pm would be broken down into components:
|
Year 2001 |
August |
3rd |
3pm |
26 minutes |
45 seconds |
|
2001 |
08 |
03 |
15 |
26 |
45 |
And entered as the following Date/Time value:
2001-08-03 @15-26-45
Why can’t we just enter this in the more familiar 8/3/01 3:26:45pm format? There are several reasons. First, we want Date/Time values to appear in chronological order when sorted alphabetically, so we:
Second, Windows doesn’t allow / (“forward slash”) or : (“colon”) characters in filenames, so we replace these date component delimiters and time component delimiters with – (“hyphen”) characters. And finally, the @ (“at sign”) between the date and time portions makes the whole Date/Time value easier to read by breaking up a long string of digits and hyphens. The at sign also fits because we’re used to saying “August 3rd at 3pm”.
It’s good to be as precise and specific as possible when entering Date/Time values, even going as far as capturing the time down to the second whenever possible. This is especially true for the Captured Date attribute. For example, you could find yourself with many video clips recorded during the same hour or with many pictures shot during the same minute. In such cases, it’s a lot easier to let your files sort themselves chronologically by Hour-Minute-Second rather than just entering a date and then having to figure out the order by hand and manually assign Sequence Numbers to the whole batch. Fortunately, many modern cameras and recording devices capture the date and time for you automatically. However, there will be many times when Date/Time values are not known precisely, so the Date/Time format incorporates two mechanisms for accommodating uncertainty.
One way to handle uncertainty in Date/Time values is through right-hand truncation. As you move to the right through the components of a Date/Time value, the components become more specific. For example, Month is more specific than Year, Day is more specific than Month, and Second is most specific of all. Right-hand truncation means leaving off more components from the right side the less precisely you know the Date/Time value. However, you should always include at least a Year component. Examples:
|
2000-08-03 @17-47-23 |
August 3rd, 2000 at 5:47:23pm |
|
2000-08-03 @17-47 |
Sometime between 5:47:00pm and 5:47:59pm on August 3rd, 2000 |
|
2000-08-03 @17 |
Sometime between 5:00:00pm and 5:59:59pm on August 3rd, 2000 |
|
2008-08-03 |
Sometime on the day of August 3rd, 2000 |
|
2000-08 |
Sometime during the month of August 2000 |
|
2000 |
Sometime during the year 2000 |
Another way to handle lack of precision in Date/Time values is to use uncertainty ranges. Even if you don’t know a Date/Time value exactly, you can always say with certainty that it falls within a period between Date/Time X and Date/Time Y. For example, even though you may not remember exactly when you took a certain picture, you might be able to deduce from who appears in it, what they’re wearing, and how old they look that it was taken between 1977 and 1982. Such a pair of “earliest” and “latest” Date/Time values is an uncertainty range. Uncertainty ranges should be entered in the form:
YYYY1-MM1-DD1 @HH1-NN1-SS1 ~ YYYY2-MM2-DD2 @HH2-NN2-SS2
Where the first Date/Time represents the earliest possible end of the range, the second Date/Time represents the latest possible end of the range, and the ~ (“tilde”) separates the two. You can use right-hand truncation if the ends of the range fall on the boundary of a minute, hour, day, month, or year. And you can leave out the second date if you are entering a range of times on a specific day. Examples:
|
2000-06 @18-20 ~ 2000-07 @21-40 |
Between 6:20pm on 6/1/2000 and 9:40pm on 7/31/2000 |
|
2000-08-01 @19-35 ~ @19-40 |
Between 7:35pm and 7:40pm on August 1st 2000 |
|
2000-08-01 @18 ~ @21 |
Between 6:00pm and 9:59pm on August 1st 2000 |
|
2000-04-28 ~ 2000-05-02 |
Between April 28th and May 2nd 2000 |
|
1999-12 ~ 2000-04 |
Between December 1999 and April 2000 |
|
1998 ~ 2001 |
Between 1998 and 2001 |
Files with uncertainty ranges in their Captured Date attribute should be filed according to their earliest Date/Time value. In other words, a file with Captured Date = 1998 ~ 2001 should be grouped with other files captured in 1998.
Time zones can make things a bit messy because your collection may include files captured on the same day in different parts of the world. For example, a picture taken at 2pm in London was actually taken earlier than a picture taken at 11am in New York on the same day. Theoretically, you could store all Date/Time values in Greenwich Mean Time (GMT). In that case, the London picture would have a Captured Date of 2pm GMT and the New York picture would have a Captured Date of 5pm GMT since there’s usually a six-hour time difference between the two cities. But then you would have to do GMT conversions every time you enter or interpret a Date/Time value, which could be quite cumbersome. Was New York on Daylight Savings time that day? Was London? Despite the complexities of time zones, this plan recommends that you use the prevailing date and time at the location where the contents of each file were originally captured.
The Integer format accommodates whole number values with a fixed number of digits (to enable proper text-mode alphabetical sorting). Three digits is usually enough, in which case values should be in the range 0 through 999. Values should be padded with leading zeros as needed so that they are always the proper number of digits in length. In the unlikely event that 3 digits are insufficient, you can use additional digits. Examples:
001
002
050
100
The Percent format is just like the Integer format except that its values should be in the range 0 through 100 and it should always be 3 digits long.
The Text format accommodates free-form text of variable length. Text should be in sentence case and should be in the language used predominantly to name files in your dataset. Text may include punctuation if necessary for longer or multiple-sentence values, but remember that filenames may not exceed 255 characters and that long filenames are unwieldy. Certain characters may not be included depending on whether you are using History Data 1.x or History Data 2.x filenames as shown below.
|
Character |
Description |
Where Illegal |
Why |
|
@ |
“At” sign |
History Data 1.x |
Time delimiter |
|
~ |
Tilde |
History Data 1.x |
Range delimiter |
|
# |
“Pound” sign |
History Data 1.x |
Sequence Number delimiter |
|
. |
Period |
History Data 1.x |
Extension delimiter |
|
{ |
Open curly bracket |
History Data 2.x |
Attribute Start delimiter |
|
} |
Close curly bracket |
History Data 2.x |
Attribute End delimiter |
|
= |
Equal sign |
History Data 2.x |
Attribute Label delimiter |
|
; |
Semicolon |
History Data 2.x |
Attribute Value delimiter |
|
\ |
Backslash |
Both |
Disallowed by Windows |
|
/ |
Forward slash |
Both |
Disallowed by Windows |
|
: |
Colon |
Both |
Disallowed by Windows |
|
* |
Asterisk |
Both |
Disallowed by Windows |
|
? |
Question mark |
Both |
Disallowed by Windows |
|
” |
Quote |
Both |
Disallowed by Windows |
|
< |
Smaller than |
Both |
Disallowed by Windows |
|
> |
Greater than |
Both |
Disallowed by Windows |
|
| |
“Pipe” symbol |
Both |
Disallowed by Windows |
The Extension format accommodates valid Windows filename extensions, such as “TIF”, “WAV”, or “MPG”. Extension is similar to the Text format except that values should almost always be 3 characters in length and may never include . (period) characters.
You might ask why we would want to capture and store our picture, sound, and video recordings in digital form. After all, what’s wrong with traditional film, prints, slides, videotape, and audio cassettes? These technologies have been around for decades, they’re proven and refined, and everybody knows and loves them. These days you can buy excellent analog cameras, camcorders, and tape recorders for very little money, and you don’t need a computer to use them or to enjoy the content they capture.
You could of course continue to use analog technology. But the digital approach has many advantages.
Analog content is tightly linked to its physical media. All physical objects deteriorate with manipulation and the passage of time, so every time you handle or play analog media the content gets slightly damaged. And the decay continues even if you leave analog media in a box or on a shelf due to temperature fluctuations, changes in humidity, radiation, organic decay, and chemical processes.
By contrast, digital content is independent of the media it’s recorded on. Although the physical media still deteriorates slowly with use and age, digital content quality isn’t affected until the deterioration gets bad enough to alter some of the ones and zeros, which usually takes a long time. Until then, you get the same original quality every time you play the content.
Because analog content is tightly linked to its physical media, it’s impossible to make exact copies. Every copy you make contains less information and more noise than the original, and the degradation gets cumulatively worse as you make copies of copies. So it’s relatively hard to preserve quality when sharing analog content among several parties, and you can’t stop progressive degradation of quality over time even if you periodically copy the content to new media.
By contrast, you can make perfect copies of digital content because it’s independent of the media it’s recorded on. You can copy the content as many times as you like and even copies of the copies are exactly as good as the original. So it’s easy to share and distribute high quality content, and before the physical media wears out you have plenty of time to transfer the content without any loss of quality.
With analog media, making copies is inconvenient. For pictures you can order double prints, but then you can only share each photo with one other person, and you’ll probably be paying for second copies of pictures you’re not interested in distributing. Or you can have reprints made of selected shots in selected quantities, but this involves additional delay and trips back to the shop. Copying analog audio or video can be done at home, but it takes time, you have to have the right equipment, and the quality is often noticeably worse than in the originals. And regardless of content type, analog content can’t be shared without carrying around or shipping physical media.
Sharing digital content is much easier, since anyone with a computer can make perfect copies quickly and easily. You can still distribute content using removable physical media like Zip disks or recordable CDs, or you can simply use the Internet to share content via a Web site or through email.
By modern standards, analog media is not very compact. As an example, suppose you use a 35mm camera and shoot an average of 1 roll of film per month. Over a 20 year period, you’ll shoot almost 250 rolls of film and take about 6,000 pictures. If you take 6,000 4x6 prints and put them in a stack it will be over 5½ feet tall and weigh almost 70 pounds. That’s enough to completely fill 6 large shoeboxes, or 30 large photo albums taking up 5 feet of shelf space*. And don’t forget the 250 rolls’ worth of negatives, which take up a couple more shoeboxes or binders.
 
* [A stack of 90 4x6 prints is about 1 inch tall and weighs about 1 pound. An average large shoebox is about 12 inches long, and a typical large photo album holds about 200 prints and is about 2 inches thick.]
 
Digital storage is much more compact. If you take 6,000 pictures with today’s best 5-megapixel consumer digital cameras you’ll produce around 12GB of high-quality compressed JPG files, which fits comfortably on the hard disk in an average notebook computer. Or, you could use a high end scanner and digitize 6,000 frames of 35mm film to produce 360GB of film-replacement quality 20-megapixel uncompressed TIF files. You could store that much data on a high-end desktop computer today, and within a year or two you’ll most likely be able to store it on an average desktop computer using a hard disk that’s about the same size as a traditional VCR cassette.
Analog media isn’t very flexible. There’s typically only one way to use each type of media, and it’s usually not very easy to modify content. For example, you can hold a photo print in your hands, or put it in an album, or put it on a wall or in a frame, but there isn’t really much you can do besides look at it. You can’t put prints in a slide show, and you can’t put slides into a photo album. Any changes you want to make such as enlarging, reducing, or retouching involve the delay, expense, and inconvenience of trips to the shop. And you can’t combine audio and video clips into edited sequences unless you spend a lot of time and own special equipment such as mixing consoles and multiple tape decks.
Digital content is much more flexible. You can instantly enlarge or reduce photos, and you can look at any picture by itself or in a slide show. Photo editing software lets you retouch damage right on your computer, plus you can make cosmetic changes, apply special effects, add or remove subject matter, and assemble collages. Video editing software lets you combine clips into movies, and you can add captions, narration, and sophisticated effects without any special equipment. Want to enjoy content with friends without huddling around your computer? Hook it up to your home audio/video system to watch slide shows and movies on your big-screen TV and hear audio through your stereo. Or use an inexpensive desktop printer that lets you make your own high-quality prints in sizes up to 8x10. Plus, you can use database techniques to quickly locate digital content and cross-reference it in many topical categories.
On top of all the general benefits of the digital approach, digital still cameras in particular offer many advantages when compared to traditional film cameras. One advantage is that digital cameras provide instant feedback while recording content: they usually have a screen that shows you in real time what you’ll get if you press the shutter button. The moment you do take a shot, you can go back and review it to make sure it came out the way you want. If it didn’t, you can delete it and re-shoot right on the spot. You can take shots from different angles or with different settings, then keep the best and delete the rest. It’s often said that instant feedback enhances creativity, and you have more privacy since there’s no film to send to a processing lab for development.
Most digital cameras let you capture brief audio/video clips along with your still images, and some models let you narrate a 4-5 second comment after taking each shot. Most models store date, time, and various camera settings as metadata that doesn’t affect image content. And although digital cameras are more expensive up front, they eliminate ongoing costs of buying film and development processing.
You might ask why we would want to organize History Data chronologically. After all, couldn’t we just group our files into folders called “Mary’s Wedding”, “David’s Graduation”, “30th Birthday Party”, “Pictures of Susie”, “Little League Videos”, etc? And if we did use topical category folders like these, couldn’t we skip all the work and hassle associated with dating each and every file?
There are of course other ways to organize your files besides chronologically. And if you did store your files by topical category rather than by time period, it might initially save you some work and even make it easier for you to find certain files. But this plan recommends the chronological approach because it has many advantages. The following sections present arguments in favor of using chronological information as your primary categorization system, but remember that you can always create secondary topical categories by creating ad-hoc albums (see Ad-hoc Albums on Page 13 for details).
Years, months, and days mean the same thing to everyone, as do hours, minutes, and seconds. Time categories like these are universally understandable, whereas many other topical categories are only meaningful if you’re familiar with the context. For example, setting up a category like “The Brat” for your younger sibling might be entertaining, but unless the people you share your content with know you well it may be unhelpful. Other people, including your own descendants, may be unable to locate or identify content using categories that require inside knowledge of your view of the world.
Time categories have the advantage of being objective, whereas some topical categories have subjective meanings that can change over time. If you set up a category called “My Friends” and fill it with photos of a person you later grow to dislike, you may eventually have to go back and re-categorize those pictures. You’ll never incur this kind of redundant maintenance effort if you use objective categories like years, months, and days.
A good practice when setting up a filing system is to avoid unclear or overlapping categories. Years, months and days have the advantage of being exclusive, whereas topical categories frequently allow for overlap and ambiguity. For example, if you set up category folders called “John” and “Susan”, where do you file a picture in which they both appear? And should you file a picture of an 18-month-old under “Babies” or “Kids”? By contrast, if you know you took a picture in 1997, you’ll never waste time trying to figure out which year folder to store it in.
Another good practice when setting up a filing system is to avoid choosing categories that leave gaps in between. However, topical categories don’t always fit together seamlessly. For example, you might divide a collection of vehicle pictures into categories called “Cars” and “Airplanes”. As your collection grows larger and more varied, you might need to supplement these with categories called “Trucks”, “Boats”, and “Motorcycles”, which doesn’t seem problematic. But still later you might end up with pictures of odd vehicles such as dune buggies or hovercrafts that don’t seem to fit anywhere. On the other hand, you’ll never have this problem with time categories based on years and months, because every member of your collection is guaranteed to fit somewhere.
Over time your collection of content may grow to include many thousands of files, and you may want to sub-divide categories to keep the number of files per folder manageable. However, splitting topical categories can be complicated. For example, if you start with a category called “People”, you could split it into “Young” and “Old”, or you could split it into “Family”, “Friends”, and “Strangers”. But what if you end up needing a dozen more sub-categories? You might start out along one theme and put in the work necessary to sub-categorize all your files, and yet later have to start all over using a different theme. With time categories, on the other hand, sub-dividing is easy since years can be split into quarters, months, or even days without any special analysis or planning. And if your files sort chronologically you can quickly select and move blocks of files without having to examine them one by one.
Time categories are very extensible: no matter how big your collection grows, you’ll never run out of years and months into which you can group your files. Adding new time categories is simple, and it requires no analysis or planning. It’s easy to create new time categories such as “2002” that don’t overlap or create inconsistencies with your existing categories such as “1999”, “2000”, and “2001”, so you’ll never have to go back and reorganize files that you’ve already categorized. As discussed above, topical categories are often not very extensible since many topical themes can only be broken down into a finite number of sub-categories. For example, if you use gender as a theme, only 2 topical sub-categories are available. If you need to add more categories later on, you could switch themes to something like profession, but then you’d have to re-categorize each and every file that used to be in “Male” or “Female” into “Doctor”, “Engineer”, “Professor”, and so on.
Because years, months, and days form a continuous hierarchy, it’s easy to use chronological information to store or search for content that spans multiple categories. For example, if you don’t know the date a picture was taken you can categorize it with a date range such as “between January 15th and March 2nd” or “between 1998 and 2001”. On the other hand, if you set up topical category folders like “New York” and “Chicago” there’s no practical way to categorize a file as being “either New York or Chicago”. Similarly, with time categories you can perform a variety of searches simply by retrieving all content that falls within narrower or wider ranges of dates. It’s a lot easier to request a search for “all files between 1970 and 1990” than for “all files in New York or Chicago or Los Angeles or San Francisco or Boston or Atlanta or Houston or Washington D.C. or Denver or Seattle or Baltimore”.
These days many digital cameras and recording devices automatically record the date and time content was captured as metadata within content files. This means you could use software to automatically generate History Data filenames for these files, and even automate the process of moving them into the appropriate time folder. You would then end up with content that files itself without any manual intervention. On the other hand, it’s unlikely you’ll be able to buy a system capable of automatically annotating and filing content by arbitrary topical categories anytime soon.
When you organize content chronologically, files that describe the same or related sets of real-world events tend to group together. For example, if several people take pictures during a group outing using different cameras, you might digitize the individual rolls of film at different times and might not immediately remember that they include related content. But if you sort all the resulting files chronologically all the pictures from each portion of the trip automatically group together, organizing themselves without manual effort. This is especially useful later if you decide you’d like to see all photos from all rolls of film taken during that trip, since you can filter by time and don’t have to worry about which arbitrary categories individual files may have ended up in.
Perhaps the biggest advantage of organizing your content chronologically is that you can view it sequentially in the form of a story. Looking at individual pictures ordered randomly within arbitrary topical categories like “Hiking Trips” can be great for bringing back memories, but the experience is like a series of disjointed flashbacks. Viewing content in chronological sequence, on the other hand, tells a complete story with a beginning, a chain of events, progress over time, and an end. You can watch people grow up and grow old as they move from experience to experience over hours, days, months, and years. In short, organizing your content chronologically turns it into a documentary about you and the people in your life.
Using long filenames to store metadata may sound strange – or even like a bad idea. It certainly goes against the grain of conventional thinking, in which identifiers (such as filenames) are usually supposed to be short and are not supposed to contain any descriptive information or taxonomic structure. However, filename metadata is currently the best way to meet the goals of the History Data approach.
The challenge of associating standard, extensible metadata with file contents has been the subject of much work and discussion in recent years. Several approaches are possible, but only filename metadata meets the key History Data goals (see Page 3) of being open, consistent, and portable:
Let’s take a closer look at the benefits of filename metadata. In addition to meeting the key History Data goals, filename metadata offers several advantages over other approaches:
Unfortunately, the filename metadata approach also has certain disadvantages:
However, on balance, filename metadata works quite well. Despite its disadvantages, filename metadata is the best approach available today for annotation of History Data files.
By now you may be wondering: will there ever be a way to annotate files that meets all the History Data goals and avoids the disadvantages of long filenames? The answer is that there will eventually be a way to achieve this – but it’s not available yet.
The ideal approach for History Data file annotation would involve the following:
The basic idea would be to store easily-extractable XML-like metadata internally within all content files. You’d still want the filenames to begin with the chronology attributes (see Page 8) to allow chronological sorting, but all other attributes could be moved out of the filenames and into the files themselves. Your files would end up with filenames short enough to fit comfortably in any list display, they could be recorded onto CDs without modification, and they would accommodate unlimited extensible metadata.
You could implement such an approach today with many multimedia file formats, including TIF, JPG, WAV, and AVI. Unfortunately, some very popular file formats in use today don’t support internal metadata, most notably MPG. The good news is that the technology industry is already hard at work developing new file format standards that support internal metadata and accommodate any type of multimedia content.
With luck, new file format standards will emerge and be adopted over the next few years that will allow the ideal approach to file annotation as described above. When this happens, a new version of the History Data approach (maybe version 3.x?) will switch from filename metadata to internal metadata, and Tempest Solutions may provide tools that automate the conversion of your History Data 1.x and 2.x files.
Compressed files take up less space for the same or similar content, so in theory compression sounds like a good thing. However, compressed files by definition use less bits to hold the same data, which makes them more fragile.
For example, suppose you have two versions of a picture: an uncompressed TIF version and a compressed JPG version. The JPG file will of course be smaller. But now suppose that a few bits from the middle of each file get corrupted. This could happen if the disk they’re stored on develops a minor surface defect or if they’re sent over a network and a transmission error occurs. The uncompressed TIF file is resilient: our data corruption may produce a few garbled pixels in the middle of the image, but this is easy to fix and the rest of the picture won’t be affected. In most JPG files, unfortunately, corruption in the middle of the data means the entire lower half of the image will be damaged.
Granted, the probability of a given file getting corrupted at any point in time is small. But over the years you may end up with thousands of files, they may reside for long periods of time on many different storage devices, and you may transmit them over networks many times. The probability of corruption grows with the size of your collection and with the passage of time. Therefore, to protect your History Data content for the long term, this plan recommends that you use uncompressed file formats whenever possible.
If you do use compression, another consideration is whether the compression is lossless or lossy. When you open a file that uses lossless compression, the contents come out exactly the same bit-for-bit as they were before the file was compressed. The familiar ZIP file format, for example, works very well for word processing documents, spreadsheets, and databases, using lossless compression to produce files that are typically 10%-50% of their original size.
Unfortunately, lossless compression doesn’t work very well on multimedia files containing picture, sound, or video data. Because these files tend to be large in the first place, special techniques have been developed that compress multimedia data by keeping details humans are attuned to and discarding information we tend not to notice. Also known as perceptual coding, these compression techniques are called lossy because data does not come out bit-for-bit the same as the original: some information gets lost in the translation.
Lossy compression can achieve impressive results. For example, compressed JPG pictures are often virtually indistinguishable from the originals, yet they may take up only 2%-10% as much storage. The MP3 format can produce good quality compressed audio files that are only 10%-20% the size of the originals. And the MPG format achieves similar results for video.
However, there are drawbacks to using lossy compression. One issue is that compressed files are more fragile than uncompressed files, as discussed in the preceding section. But the big drawback is that saving multimedia content to a file using lossy compression almost always results in some measurable loss of quality. Other than in a few special cases, every time you open a file, make changes, and then save the file using a lossy format, the quality of your contents gets degraded.
The amount of quality degradation each time you save a file in a lossy format may be modest. And you may argue that you’re rarely going to need to modify your files. But it’s hard to predict how many changes may be made in the future. Years from now, you or your descendants may want to crop out certain portions of pictures based on interests that are hard to foresee. You or someone else may want to change contrast, color balance, saturation, size, volume, speed, or some other aspect of your content. Or, it may be necessary to convert files stored in old formats that are becoming obsolete. Because the degradation of quality adds up every time content gets saved to a lossy format, this plan recommends that you avoid using lossy compression whenever possible.
If you have History Data files that are already in a lossy format, such as JPG files from a digital camera, it’s OK to keep them that way. Content quality may have been degraded once, but it won’t get any worse as long as you don’t modify the files. However, any time you edit the contents or convert to a new format, avoid using lossy compression so no further degradation occurs. Any new History Data files you create should ideally be saved in an uncompressed format to avoid degradation entirely.
We live in a world full of information, and as time goes on more and more of it is managed by computers. Computers store information in files, and if you look at the files on a computer you’ll see that they can be grouped into three categories:
The set of all the Data files on your computer can be called your dataset. Files in your dataset can be further divided into two sub-categories:
Composed Data is generally:
Captured Data is usually:
In practical terms, Captured Data is “hard data” that conveys an accurate representation of real-world events, while Composed Data is “soft data” that conveys thoughts, opinions, recollections, and commentary which may or may not be factual. Captured Data always results from a recording made on a particular date at a particular time, whereas Composed Data such as documents may result from many editing sessions by different people on different days. Because of these differences, this plan recommends that you store Captured Data in a chronological folder structure separately from Composed Data stored in more conventional topical category folders.
Dating your files is perhaps the single most important part of the History Data methodology. It’s not always easy, but it’s essential for chronological organization of your content.
Nowadays most digital cameras, digital audio recorders, and digital video cameras automatically capture date and time so you can almost always tell when digital recordings were made. However, most analog pictures and audio/video recordings don’t include date and time information, so files digitized from these sources usually have to be dated manually.
In History Data terms, dating files means determining their chronology (see Page 8). There are 3 key steps involved:
As mentioned above, content files from digital recording devices often contain internal metadata that indicates the date and time of capture down to the second. There are several ways to date these files:
If none of the above suggestions work for your digital content files, it may be that the recording device’s internal clock was not set during capture, or the files may simply lack internal metadata for some other reason. In such cases, follow the suggestions in the following sections.
This section suggests an approach for dating sets of content files. The term “file” as used here refers to any monolithic chunk of content including photos, audio clips, video clips, digital computer files, and analog source media such as film or tape.
The following procedure works best when you’re dating sets of files and already have a large collection of related, previously dated files. If you’re just dating a single file and don’t have any related files in your collection you can skip this process and jump to Dating Individual Files below.
One final suggestion: get to know your content! The more familiar you are with the people, places, things, events, and activities in your content, the easier it will be to compare different files, identify similarities, determine chronological sequence, and annotate your files with accurate dates.
There are many different clues you can look for to determine when individual photo or audio/video files were recorded.
If you’re lucky, you may be able to find annotation on your source media that tells you when the content was recorded:
If you can’t find any external annotation, look for internal annotation recorded as part of the content:
Sometimes you can guess earliest/latest years for your content by examining the media it’s recorded on:
You can almost always determine earliest/latest years by identifying:
You can often determine seasons or months by identifying:
You can sometimes determine months or even days by identifying:
You can occasionally determine time of day by identifying:
It’s also possible to date files to the day by identifying major events with known dates such as:
A great way to date files is to talk to people who were present when the content was recorded, or who are at least familiar with the subject matter. Witnesses are often very helpful, and talking to them can be a lot of fun. Try asking them:
It’s often possible to date files by looking at historical records and various documents that tell you when major events took place:
You may have files that are associated with recognizable events such as travel, concerts, performances, sporting events, competitions, conferences, reunions, and so on. Souvenirs acquired during these types of events are often dated:
There are several clues you can look for to determine the chronological sequence of a set of files with similar or related content.
If you’re lucky, you may be able to find annotation on your source media that tells you the order in which the files were recorded:
Look for a progression over the course of:
This section provides high-level tips for scanning pictures from film or prints. The suggestions presented here are fairly general and are independent of scanner model, software, and source media used.
These general suggestions apply when scanning any kind of picture:
These suggestions are relevant when scanning film negatives and slides:
These suggestions are relevant when scanning photo prints:
There’s a lot of useful information about scanning pictures on the Web. Try www.scantips.com for an on-line guidebook presented in chapters, or try browsing through the posts in the comp.periphs.scanners discussion group.
This section provides high-level tips for digitizing audio from analog tapes. Although other audio recording technologies have been available over the last 50 years, magnetic tape most likely accounts for the vast majority of home audio recordings you may want to convert into History Data. The suggestions presented here are fairly general and are independent of audio board, software, and tape media used.
You can digitize audio from just about any tape media format including cassette, microcassette, open reel (a.k.a. reel to reel), even early wire recorders. You can also digitize audio from videotape formats such as VHS, Betamax, VHS-C, and 8mm, although in such cases you may want to consider using a video capture board to digitize video information along with the audio. Be aware that in some tape formats audio is recorded digitally to begin with, including DAT, Hi8, Digital8, DV, Mini DV, and the new Micro MV. Re-digitizing analog audio from a digital source is unnecessary and results in loss of quality.
There are many audio editing software applications available on the Web. Most are small programs that can be conveniently downloaded, some for a modest price and others for free. To find them try:
In fact, there are so many audio editing programs available you may have trouble choosing. If you’re not sure where to start take a look at GoldWave, a full-featured and very capable audio editor available from www.goldwave.com.
5/1/00
5/2/00
5/30/00
6/2/00
6/9/00
6/28/00
7/27/00
7/29/00
7/30/00
8/1/00
8/2/00
8/3/00
8/4/00
8/7/00
8/10/00
8/16/00
9/6/00
9/7/00
9/11/00
1/26/02
1/27/02
1/28/02
1/29/02
1/30/02
1/31/02
2/1/02
2/2/02
2/5/02
2/6/02
2/7/02
2/11/02
2/14/02
2/15/02
2/16/02
2/27/02
2/28/02
3/18/02
3/19/02
3/20/02
3/21/02
3/22/02
3/23/02
6/17/02
6/18/02
6/24/02
6/25/02
6/26/02
6/27/02