Articles / Setting Data Free

Setting Data Free

I see two trends in progress. In one, we're continuing movement towards application-independent data storage. In the other, we're witnessing a proliferation of devices that each store the same data in a unique and incompatible way. I believe it's a time to watch developments carefully, and to be ready to move our advocacy efforts to a new arena.

A brief tour of my history with computers

In 1983, Texas Instruments acknowledged that their home computer business was beyond hope. The TI 99/4A, originally sold for $1,150, was discontinued, and stores (needing to dump their stockpiles during the Christmas shopping season) put the machines on sale for $50-$100 each. Lines formed in the parking lots in scenes presaging the crowds of hopefuls waiting all night for Playstations (or products of more questionable quality, such as Windows95 and Star Wars I). For some time leading up to this, I had been left in the electronics section of the department store whenever the family went shopping (I had pong at home, but it was still unimaginable magic to hit keys on a keyboard and see letters appear on the screen), so there was no question what to get me for Christmas. My 99/4A was connected to a TV on Christmas morning, and everyone knew where to find me for the next few years.

Eventually, we even got a cable that connected the computer to a cassette recorder, and it was possible to write programs to a cassette so you didn't have to type them in again every time you turned the machine on! If the volume was set just right and the wind was blowing south-southwest, the computer just might understand the contents of the tape and load your program every fifth time.

This was the system I had through the first couple of years of high school. I spent a geek-appropriate amount of time playing with "TI Extended BASIC" -- a lot of graphics and sound programming, a text adventure in which you explored the sunken city of Atlantis, a program that did virtual dice rolls to create random Dungeons & Dragons characters, one that plotted the Cartesian graphs we were studying in Math, etc.

At the start of my Junior year, I transfered to a school that had a lab in the basement filled with Apple ][s. For the first time, I started to use a computer medium (the 5.25" floppy disk) to store information that mattered to me -- my school papers, letters, etc. When I went to college, I took my newly-purchased Apple ][e clone, and that was the machine that handled all my writing chores, balanced my checkbook, etc. for several years.

My geek power points rating was extremely low during this period; I just used the computer as a tool, and was no more involved with it than was the average secretary. It wasn't until I started living with someone who had a PC that I began to be interested in computers for their own sake again. We decided that the 286 with GeoWorks was not going to be sufficient for surfing the Web, so I bought some books and learned to build a 486. After two years of struggling with Windows95, I put my first Linux CD in the drive, and everything was better.

Mostly.

What it takes to retrieve old data

I now own information scattered across a variety of media -- audio cassettes, 5.25" floppies, 3.5" floppies, hard drives, and zip disks -- all at different levels of accessibility.

Information from the TI era
Assuming I could find the cassettes, I would have to dig out the 99/4A and hope I could get the cassette interface to work. Then what? I'm sure there's a FAQ somewhere that explains how to set up a serial connection to a PC, so I might get the code transfered and run it in an emulator, but it would be a hassle.
Information from the Apple era
A bit easier here; I know the Apple has a serial port (I'm not so sure about the TI), emulation software is handy, and I could probably at least yank the text out of my AppleWorks files.
Information from the GeoWorks era
No problem there; it ran on top of DOS, and saved to a DOS filesystem.
Information from the Windows era
No problem at all; when I switched to Linux, I just saved my word processor documents as text and copied them to my Linux partition before reformatting the Windows one as ext2.

The good news is that how hard it is to retrieve the data varies inversely to how important it is to me. I don't care much about what's on my TI cassettes (though I would be curious to see how I wrote the engine for my adventure game). The Apple disks are filled with school papers, short stories, and really bad poetry that I have in hard copy and should burn someday anyway.

Why I'm boring you with this

The good news is deceptive; I believe things have been getting progressively better, but we've turned a corner, and now they're getting worse, from both software and hardware perspectives.

As the good news says, it's become easier for personal/home computers to share information. My first computer, that TI 99/4A, had no way to share data with a Commodore 64, a Timex Sinclair, an Atari ST, a Tandy TRS-80, or any of the other computers around at the time. Say what you will about the near monopoly of IBM PC-compatible hardware, but it gave us a de facto standard that made it easier for our machines to share information. Today, the worst problem I can imagine is that someone would hand me a Mac floppy, and Internet use is now widespread enough that I could get away with asking her to email the files to me instead.

The new problems are:

  1. Hardware compatibility doesn't solve problems of software incompatibility.
  2. Our hardware is becoming incompatible once again, in ways that could make the time of the myriad of unique home computers seem like the good old days.

The software problem

Let's look at how I handle these important pieces of information:

  • email messages
  • addresses and phone numbers
  • notes
  • my schedule

I only started keeping this information on a computer when I got online, so I only have to trace back what I've done to the time of using a terminal program under GeoWorks:

email
My first use of email was accomplished by dialing into my university's Unix systems and using PINE. From there, I moved to Pegasus mail under Windows, to VM in Emacs under both Windows and Linux, and finally to mutt.
addresses and phone numbers
For a long time, I still kept a physical address book. Once I started using Emacs for mail and news reading, I discovered the Insidious Big Brother Database, which has all kinds of nifty features, such as the ability to pop up a window with the information about a person when you open a message from her.
notes
I've used notes-mode in Emacs to keep track of all the things I never get around to doing. Notes mode does automatic linking and indexing of note topics, so you can skip from one note about a topic forward or back to your other notes about it, get an index of all the notes sorted by topic, etc. Unfortunately (or maybe fortunately...), the indexing script has recently started puréeing my notes; I've opened a number of them and found that all but the first few words are missing. I've switched to using note. I'm not as happy with it, but I was even less happy with watching my notes disappear.
my schedule
I somehow gained possession of a Lotus calendar application once (a free gift for buying x amount of hardware from parts-r-us, I think), and used it to keep track of things for a year or so. Once I started living in Emacs, I used its calendar functions. Now I use Yahoo!'s calendar, so I can access it from anywhere, have it email reminders to me, etc.

The heart of the software problem is this question: How hard was it to move data from each application to the next?

PINE stored messages in mbox format. Pegasus used binary folder files. IIRC, I didn't have many saved messages at the time, and I just forwarded them all to myself. Going from Pegasus back to mbox for VM and mutt required something mildly unpleasant like getting Pegasus to write all the messages to separate files and then coercing them into one. I don't remember exactly what I had to do, but it wasn't too bad.

My first collection of email addresses were kept in my PINE address book. I downloaded an application from somewhere that converted PINE address books to Pegasus ones. When I moved to BBDB, I believe I entered everything again by hand while I was adding street addresses and phone numbers.

Transferring notes took some time, but went smoothly. notes-mode keeps a separate note file for each day, stored like ~/NOTES/199909/9909. note keeps everything in ~/.notedb (by default; it can also use MySQL, etc.). Luckily, it can read notes by STDIN, so, after sending Perl in to change the syntax of the topics in my notes to match that used by note, I could just use find to locate and cat each file and feed it to note. (Oh, how I love Unix.) A little fine tuning, and I was done.

My schedule was recreated from scratch each time. Adding all the birthdays again was no fun, but I survived.

These experiences indicate that the software problem is not very great, at least for Unix users. People who have to deal with word processor files on Windows are in bad shape, but the rest of us can usually just look at a pair of file formats and fire up Emacs, vi, or Perl to make the necessary changes.

In spite of that, when I switch applications, experience has taught me to think carefully about the long view. My calendar, for instance, is not exactly locked in to Yahoo!, but transferring it somewhere would not be as trivial as it should be. Yahoo! gives me two options for creating backup files of my calendar. One is the Palm format, Date Book Archive, which stores the info in a binary file. If I looked, I would probably find tools for handling these files, but it still doesn't feel as secure to me as having the data in a text file. The other format is "Outlook format" (sic), or Comma Separated Values, which is quite ridiculous. (For example, instead of saying, "This birthday occurs on this date every year", it creates 37 copies of the birthday record, one for each of the next 37 years. How is the application that imports that supposed to know what was intended?) The best I can hope for is that DBA turns out to be a reasonable representation of my data, or another export format becomes available.

Take BBDB as another example. Now that I'm no longer using Emacs for everything (I've switched to using dedicated programs that can call an external editor, and call XEmacs with gnuclient), BBDB's VM and GNUS features don't matter to me, and there's no reason I couldn't move to using another address book, perhaps one that has good mutt support. I have to think about this two steps ahead; not only is there the problem of converting my ~/.bbdb to the format used by whichever application I pick, there's the problem of considering what I might have to do to move from that application to the next one I decide to use.

There's the crux of the software problem -- all the way down the line, my data doesn't change, but the ways in which my data is stored do. I still want to track the same information, so why shouldn't it be stored the same way whether I'm using an AppleWorks mail merge function, a Windows GUI address book, an Emacs lisp program interacting with mail and news readers, or a Web interface? If we could hop back in time and declare, "This is how address information will be stored. This is how a schedule will be stored. This is how notes and their cross-referencing information will be stored.", I could have used the same files for the last 15 years. My Apple, Windows, Linux, and Web applications would have all read and written the same files. There would have been no need to convert from one format to another. I could have switched back-and-forth between applications at will without a worry.

At this point, the XML alarms may be ringing in your head, and you may be eager to point out that, although it's coming to the game late, we're about to move into that happy situation. Well, maybe, assuming everyone can agree on DTDs and actually uses them properly instead of adding proprietary extensions at every turn. (Look at what happened to HTML; "This page best viewed with browser x" could evolve to "This calendar best viewed with calendar application y".) I certainly hope it works out. The problem lies in the unspoken assumption that you can upgrade your software to take advantage of the new, and hopefully final, format, and this takes us to my real worry:

The hardware problem

A few months ago, I joined the cellphone age. One of the initiation rituals consisted of an entire evening spent punching numbers from my address book into the phone. This is where the unspoken assumption fails. I can't upgrade my phone to new software capable of using the format used by the address book on my computer. Even if I could, I don't have a way of making my computer talk to the phone to pass my collection of numbers to it.

When a number changes, I have to change it in BBDB and on the phone. I have over 80 records in the phone, many with multiple numbers attached to them. If I drop it in the Baltimore harbor tomorrow, those numbers -- and the time spent entering them -- are gone. When I buy a replacement phone, I'll have to enter it all again. There are no backups, because there's no way to create a backup.

This is only going to expand as we move from using desktop computers to using more and more dedicated information appliances. It's a bad situation turning worse at the moment; we're going back to the days of incompatibility, but now with a wide variety of devices instead of just with computers. My TI couldn't talk to your Tandy; neither can my cellphone talk to yours.

This is what I meant by my ominous "Mostly." earlier. I can mostly be happy with the present situation. When I need to convert data on my computer, I have the tools at hand to do it. The problem is that that doesn't help me when I buy a phone and I don't have a shell on it. Even though all the data is sitting on my box waiting to be transfered, I'm stuck using the phone's only interface, the numeric keypad, punching "7" four times to get an "s".

Again, XML is put forward as the way out of this mess, and it holds great promise. When I mark a note as urgent on my laptop, XML should make it possible that my desktop machine and my Pilot will note the change and do whatever I've told them to do about it -- mark it off with a different color, beep me to remind me about it, or whatever. When I change Joe's phone number on my cellphone, it should be changed on the speed dial of all the phones in my house, in my address book, and in Joe-related events on my calendar.

It sounds wonderful, but I'm going to permit myself a dose of skepticism because the implementation requires the cooperation of a large number of people who prefer competition to cooperation. First, they have to reconfigure their devices to take advantage of XML. Then they have to agree on DTDs for the data they're using. Then they have to stick to the agreed-upon format and find other ways to distinguish their products now that they can no longer lock their customer base into their proprietary way of storing data.

I'm not saying it's impossible; the Internet has proven that it is. Proprietary protocols have been forgotten in the face of TCP/IP, HTTP, SMTP, etc. because software makers have to conform or die in all these areas. What I'm saying is that we need to be aware of the issue and keep manufacturers honest. Standards come into being in two ways -- people decide on a standard and implement it, or, more commonly, something becomes so widely used that it becomes the standard, even if it's unbearably awful. XML authors are trying to do it right the first time, but they're going to be outmaneuvered if manufacturers are allowed to implement the standards only in the ways and to the extents that they suit them. It will eventually sort itself out -- a toaster that doesn't work with the other appliances on your home network is just not going to sell -- but there will be an initial competitive period that could be dangerously similar to the early days of personal computers, when nothing worked with anything else.

You can help shorten this period by being aware of the standards as they are created and checking that the products you buy are in compliance. If your new cellphone is supposed to use the new name & number storage format but you find that you can't share numbers from your address book with your friend's phones, take it back to the store, and let the manufacturer know that you exchanged their product for someone else's.

How long will it take for the old formats to go away? Will all our devices really be speaking the same language, and how soon? I don't know, but I do know that it will happen faster if we demand it. It's worth the effort, because it extends the ideals of the Internet into all the electronic accessories of our lives. When we can get there, there won't be TI information, Apple information, Windows information, Unix information, or Web information. There won't be information known only to your phone, your car, your Pilot, or your workstation. There will just be information, freely shared everywhere.


Jeff Covey received his degree in classical guitar performance but spent so much time with his computer that he fell in with a bad crowd and ended up working for Andover.net OSDN. He currently works on freshmeat and runs a computer lab for the kids in his neighborhood in his spare time.
http://pobox.com/~jeff.covey
jeff.covey@freshmeat.net


T-Shirts and Fame!

We're eager to find people interested in writing editorials on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an editorial gets a freshmeat t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about.

Recent comments

10 Mar 2003 02:25 Avatar Falcon611

Nice editorial :)
Hah- And I thought I was the only one who faced the same problem!

I get concerned about things in the long run. After many years of going from different OS's, programs, formats etc, I pay close attention to 'exportability'. I don't use an addressbook built-in to a mail client, because it means I'll have to start the program up to lookup an address, it typically has light, if not nil support for exporting to other formats, and you usually can't open the data files, say if the poo hit the aircon and you were dragging config files out in a recovery procedure. This was the reason I used basic programs or scripts instead of heavy or advanced gui-based apps. Of course, it all depends on the mail client, and various other factors, but most would agree they're only there to save entering the same email address continuously.
Eventually (after searching through old, new, console, and x programs), I got myself a PalmPilot. Initially I didn't think I had much use for them, but after a year and a half of using one, I can honestly say they're great PIM's :)
Refusing to change the 'topic' (there are enough 'I love my PDA' pages out there), I keep all my 'personal info' on my PalmPilot. I use it to store addresses, birthdays (which is simply a program that reads the 'Birthday' fields of the address book, and inserts events into the-), datebook/schedule, general notes (eg email drafts), 'small' databases such as reg codes or moderator logs, passwords (heavily encrypted of course), and lists of todo's (household tasks, stuff I forget to do, etc), to name a few.
It syncs to my desktop easily, so I can enter data both at the computer and out somewhere (very few ppl spend their entire day in front of their own computer)- Underestimatively (is that a word?) useful for the occasions where you want to store or retrieve someones info at, say, a coffee shop or similar.
The desktop client I use has few dependancies, and exports to commonly used formats, so when I eventually move on, or my PalmPilot dies, I wont spend days re-entering everything.
I have a mobile phone, and although having a PalmPilot adds to the `daily loadout weight`, it saves entering large amounts of numbers into an inefficient 15 key device, aimed at showing cute little animations over speed and efficiency. Besides that, I can keep just 1 big address list, instead of looking after a mobile for phone no's, a book for birthdays, MUA addr book for emails, etc.

21 Jul 2002 15:03 Avatar ldapguru

Setting data free with LDAP aks Internet task
Setting data free with LDAP aks Internet task commettee!
by Alan - Jul 21st 2002 15:02:33

Hello all - all these concens were addressed by LDAP protocol check openldap.org, MS AC, Oracle Internet Directory, iplanet=netscape LDAP, NOVEL NDS etc stay cool Alan

21 Jul 2002 15:02 Avatar ldapguru

Re: Setting data free with LDAP aks Internet task commettee!

Hello all -

all these concens were addressed by LDAP protocol
check openldap.org, MS AC, Oracle Internet Directory, iplanet=netscape LDAP, NOVEL NDS etc

stay cool
Alan

23 Nov 2000 07:36 Avatar ratface

More standards for data exchange
What articles like this (and comments from people such as my boss who is a "typical" computer user) make me realise is that there is definitely a need for open standards that are built upon "lowest common denominator" technologies such as ASCII text. By this I mean markup languages such as HTML or even better, standards built upon XML (as many have already noted in these comments).

One such effort that is underway is the SyncML (http://www.syncml.org/) project which is a cross-company project aimed at defining and implementing a data exchange language based upon XML which would be used for synchronising information such as contact and calendar info across as many devices / environments as possible.

Whether this particular project will come to be of much use or will die an ignominous death is unclear, but my hope is that with time more and more data will be stored in a format that is simpler to manipulate.

My view is that as programming becomes more wide-spread and especially Open Source programming projects, the mystique behind data formats will slowly be broken down. It seems to me that proprietary data formats are a legacy of the closed-software style of development. We can all hope that more sensible choices for data are made in the future and that this approach is adopted more widely with time by even the closed-source software companies.

21 Nov 2000 06:55 Avatar ulriceriksson

Tarballs
How is a tarball not a single file? That is the way Siag Office
has been storing structured documents (documents containing
other documents) for years. It works great, the contents can be
examined with standard tools and so on.

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.