Digital Archive Project

Here is a list of my goals for this project. The ball is already rolling for hardware thanks to the IT guy, my manager and a local vendor. Now it is time to talk about how the files will be organized and how we will search amongst them. Our tape collection is currently being weeded by the facility’s assistant manager in accordance with the department’s collection development policy.

A “big Bucket” if you will, needs to do more than act like another external hard drive. It is more than just a catch all for digitized files. It will function like a digital shelving rack. Currently we seek out tapes from the library using

first, the date created,

second, the title and

finally, the tape number if further information is needed.

A label looks like this:

12/12/12 Power of the  Word of the Almighty etc.


The tape number is the unique identifier assigned in a very old (2000) database created in Microsoft Access.

There are fields attached to series programs, such as Creator, Title, Description and then the entries contain episode titles, duration, date and format. There are currently about 9000 tapes described by this system.

We need to be able to describe both One Off Shows (shows that are not part of a recurring series) and Series shows that have multiple episodes. As it stands, there is no keyword searching for individual episodes.

Chatting with the I.T. guy about storage

behold…the answer to all your problems. Hewlett Packard x1600

you find out the most interesting things….like how monstrous video files are (even when you have compressed the bujeezus out of them). At our current MPG2 settings one half-hour show takes up nearly a Gigabyte of space (900mb). The entire library’s ILS, mail system, staff storage, back ups, etc. fit nicely on a 6 TB server. We have that already in our playback system. We need storage….and lots of it. The other day my boss and I got a look at the new virtual server system that has just been implemented. What we saw was a rack with big ol’ storage devices, 48TB to be exact. That is what we want. The HPx1600, ladies and gents. You buy this and start packing it with 6Gig drives.

Hardware is easy. We need to establish our archiving practices before this behemoth ships. Naming Convention, Content Management System, Metadata, Controlled Vocabulary, Collection Development Policy, Accessibility. The list is a long one.

Did I ever PowerPoint on the importance of Metadata or what?

Me, Caroline Rubens, Colin Rhinesmith, Erik Mollberg

Yup, I went out on a limb and told a whole room full of gently dozing Alliance for Community Media attendees how I felt about Metadata and not one of them stood up, pointed a finger and shrieked “Imposter! I disagree! Dublin Core is CRAP! Don’t listen to her!” This is a small victory because I said some pretty racy stuff, like: “Don’t bother looking for digital video preservation standards because they haven’t got any.” The Library of Congress’ Fancy Pants Facility in Culpeper is migrating its SD video to JPEG2000. Which is nice, I guess. It might be a reality in 5 years when small time joints like PEG access centers can afford the cost of saving a file that has a very large bit rate.

“As to the question that opened this post – the “right” or “best” digital video file format for preservation – I teach a class for the Graduate School of Library and Information Science at the University of Illinois about AV preservation and I get this question every semester. I answer the question by saying that there is no “one size fits all” format for digital video preservation.

Rather, the preservation professional must ask a series of questions about the workflow, size and means of their particular institution. There is also the issue of sustainability when choosing digital formats.” by this guy.

One of those questions about sustainability is “How do I know if any of this is worth migrating?” Here is an interesting Blue Ribbon Commission Report sponsored by e U.S. National Science Foundation (NSF Award No. OCI 0737721), the Andrew W. Mellon Foundation, the U.S. Library of Congress, the U.K. Joint Information Systems Committee, the Electronic Records Archives Program of the National Archives and Records Administration, and the Council on Library and Information Resources. This report talks about just such an issue on the large scale. So much of this digital content is so new that we don’t know how people are going to use it if they are going to use it at all.

Digitization Activities

While prepping for the presentation, I came across an excellent resource for anyone attempting a digitization project. It’s put out by the Federal Government. It looks at projects in an organizational, format agnostic light. This makes it scalable to any kind of material and any size collection. I suggest you read it. It breaks projects down into four main phases planning, pre digitization, conversion and post digitization. Good times. I’m working on the agenda. There will be two reports, on about our disc usage and one about the facility humidity and temperature status.

Issue for the Workgroup

While pulling tapes today I did a quick survey and found that the oldest digital tape in the collection dates back to 9/97. On the same shelf were 16 3/4″ tapes and about 5 SVHS tapes. It is kind of the unofficial policy to play DV tapes before their analog compadres because the DV format is:

1. easier to play because there are more decks and

2. because they are viewed as more robust

I think we should include in the digitization policy that DV tapes that get played on the channel should be digitized using the Nexus at the highest setting, tracked and stored on the odrive or some other large storage option. A field should be added to the database that includes the filename (which means we need to come up with a file naming convention). A feasibility study should be undertaken to evaluate the number of hours of ingested files reside in a Gigabyte of storage and then extrapolate the amount of disc space needed.

Collection Development V.S. Acquisition Policy

Access Fort Wayne’s tape collection grows every day. It extends back in to the distant past. We have a front end of stuff that is pretty much unclassified. There are vague Genera assignments, but “Event” really doesn’t cut it for description. The backend is a little easier to parse in that time can give you a lens to look at events to decide if they work in context. Post 911 public events involving the Indiana Supreme Court seem to fit in to the “cultual/regional heritage” criteria. The DAP can guide us on what goes into the permanent collection while the CD can help us with the more recent items. The issue is the collection is serving 2 functions at the moment, it is a “Working” library for the facility where programs go in and are played for their timeliness and an “Archival” library that is useful as a recorded reflection of the community.

Attacking the stacking

I’ve spent several hours down in the basement, looking through the stacks. The main collection starts in 1981 and runs up to the present. I’m up to 1993 or so and I’m starting to see series and sub series. I see Library Productions, Local Politics, History, Religious, Arts and Random Weirdness. The sub series include runs of programming that were also called series, so, Religious: /Power of God’s Word, 1998-2005, or Arts:Artlink Presents, 1986-2001. This is just the very first, very glancing appraisal. Luckily there is a good database of tape entries, the next step would be to evaluate the accuracy of the information in it. The tape library was moved twice in the last 6 years and has had several identification overhauls, one that is pretty serious where the entire tape numbering system was changed. I started an Excel spread sheet to collect the entries I’m interested in digitizing.

Summer Reading Program PSA Archive

On a recent trip outside of my cubicle, I was stopped by a YA librarian. She was holding a couple of VHS tapes. She wanted to know if I could transfer them to DVD as a collection. As I looked at the 3 tapes, I thought to myself that they would make a good starter collection. Access Fort Wayne has pretty vast holdings of library themed videos. It’s lived in a symbiotic relationship with the Allen County Public Library since 1981. The Summer Reading Programs have rolled around annually and sometimes the stars aligned and there was a homegrown Public Service Announcement made. Sometimes the library purchased the ALA video sometimes it was made in house.

Now comes the fun part, appraisal, arrangement, description, preservation, reference services, outreach, legal concerns, and ethics. A nice description comes from the Society of American Archivists.

Today I go down the rabbit hole and into the sub basement to have my first peek at the holdings.