In the (good ol') days when FTP and archie were king, it was fairly simple for developers to spread their offerings far and wide. I had scripts set up to drop the right files in the right locations, and it didn't much matter if there were two or twenty archives.
Enter the Web, and the focus shifts from pushing software out to archives in favor of pulling people into Web sites. I think that's a good thing, because it puts more information into the users' hands, but in the process, developers have lost the ability to easily push anything out. Instead, we have to manually go to a number of tracking sites (the more the better, usually), set up accounts, and edit essentially the same information on all those sites.
My long-winded question is essentially this: Is there any interest in automating this process? I currently have a property list (easily made available in plist or XML format (and simple to convert to other formats, if necessary)) I can use to build the dynamic pages of my site which contains all or nearly all the information that is gathered at various software tracking sites. If a general software description file format can be agreed on, simply making that file available would give sites all the information they need to update their database entries. No fuss, no muss. Minimizing the administrative efforts will really lower the barrier of entry for all sites.
BetterConsole is my newest piece of software, recently released, which has brought this issue to the surface. You can find my SPIF files for it in two formats:
The plist format might not be particularly easy to parse on non-NeXT/Apple systems, so I would be willing to write a converter that puts out a format that is easier to parse.
Keep in mind that this is a work in progress and represents only a first pass effort at a format that contains sufficient information to satisfy most tracking software. Most tracking sites seem to consider a basic piece of software to have eight attributes: author, description, version, system, license, price, category, and package.
The author attribute identifies who owns the software. It does this via contact information by assigning values to the sub-attributes name (e.g., Joe Programmer) and location (e.g., http://www.someisp.com/~joe). Other sub-attributes such as email could also be added, though most tracking sites currently only seem to ask for the name and URL of the author.
The description attribute identifies the software in increasing levels of detail. Currently that is done with 3 sub-attributes: name, short, and long.
The version attribute identifies the version of the packaged software. This is done with 2 sub-attributes: revision, and status.
The system attribute identifies what operating systems the software is for. This is done with 2 sub-attributes: name, and version. It may be a good idea to make this an array instead, as some software may run on many different systems (the workaround would be to make a SPIF file for each system). It may also be a good idea to add further system requirements (RAM, HD, etc.), but that does not seem to be a major consideration from a tracking point of view.
The license attribute identifies the license the software is distributed under. This is done with 2 sub-attributes: name, and location.
The purchase attribute gives information on purchasing the software. This is done with 2 sub-attributes: price, and location.
The category attribute identifies how the software should be organized. In looking at the various tracking sites, there was really no consistency in the arrangement or naming of software categories. Additionally, most sites had additional fields for keyword descriptions of software (for search purposes). I'm hoping these features can be subsumed by this one category attribute. It should be considered a prioritized list of organization and search keywords. The software that scans the file should be able to look at this list and determine where it fits with all the other software that is being tracked. If it fails, I suppose it would be up to the tracker to either adjust the scanning software to be more robust or inform the developer of the error.
Leaving the most complicated for last, the package attribute identifies all the files that are associated with this software. Example sub-attributes are info, binary, and source. Each of these identifies a document that is related to this particular piece of software. That is done by further breaking that file information down to location, size, and checksum. I also included contact information, just in case the contact for, say, the binary might be different from the contact for the source, but I'm not sure that's really necessary.
[You can watch the original version of this document at http://www.subsume.com/spif/ for updates about the file format proposal. -- Ed.]
As Doc points out, there are two ways of getting your information through -- you can push it, or you can let people pull it. His proposal for letting sites pull the information from you makes it possible that you would only need to keep one piece of information up-to-date at all the sites on which you want your project listed -- the URL of your project info file. If you want your site listed on freshmeat, gnu.org, and kde.org, you could just give them the URL, and, at a regular interval, each site would have a bot check if the file has changed. If it has, it would compare the file with the information in the site's database and submit any differences as change requests to be reviewed by the site's staff. If the regular interval is taking too long, you could click a button on the site to ask it to check your info file immediately. You could even have an "info file URL" field in the info file, and you wouldn't have to go to the sites to keep the URL up-to-date. When you changed servers, you would put your files up at the new location and change the field in the file at the old location. The URL to the new info file would be submitted as a change request like any other. After allowing enough time for everyone to catch up, you could just remove the old files.
One big thing missing from his first draft is an "announcement" field. When you release a new version, you want it to show up on freshmeat's front page. Using Doc's scheme, you should simply be able to change the necessary fields in your info file, including one that lists what's new in this version, ready to appear in an announcement on freshmeat's front page, in the newsletter, on the newsgroups, etc.
This brings us to a problem I see in doing this. Ask any of the freshmeat staff, and they'll tell you that the number of items that are submitted and approved without any changes is quite small. Sometimes, there are errors in spelling and grammar. Sometimes, it's not clear what the contributor is trying to say, and we have to work it out with him or her. Sometimes, there are just changes that have to be made to make the submission match our editorial policies -- for example, we don't allow the name of the project to appear in the short description, and we insist that it appear in the long description (preferably in the first few words), we don't allow HTML in descriptions, etc.
Now, let's say you make changes to your info file, and we pick them up as change requests. What do we do?
Even when we get past that point and your info file is acceptable to us, you'll check your mail in an hour and see messages from two other sites saying that they need you to change x and y. When you change x, site number 3 will be unhappy, and when you change y, site number 1 will be unhappy. At this point, you'll wish you were just going to Web pages and filling out forms again.
The issues of what options to include in the file format can be overcome. Everyone who wants to take part in the system can get together and flame each other until they work it out. Dealing with policy issues and the editorial needs of all the sites is not as easy.
You might end up having tag attributes to accommodate different sites:
<announcement site="freshmeat.net"> (text acceptable to freshmeat.) </announcement> <announcement site="linuxdoc.org"> (text acceptable to the LDP.) </announcement>
, etc. Whatever the solution, the problem would have to be dealt with. One size is not going to fit all.
Another idea that comes up from time to time is that of letting people submit information by email. Again, you would have a standard format for the information, only now it would be sent to the sites, where a script would parse it and submit the parts of it as change requests as needed.
The advantage to this is that you no longer have multiple sites trying to get you to change your info file to match their needs. They each receive your request and can contact you with any problems they have. You could have your XML info file in your build directory and have a rule in your makefile with a list of the addresses to which it should be sent and a command that will send it. Then:
I like that just for the coolness factor. :)
I have two questions for everyone:
Doc O'Leary (email@example.com) is a COG in the machine of Subsume Technologies, Inc. (http://www.subsume.com/). He is lazy, and has thus been an advocate of free software since 1996 and of object-oriented development for nearly a decade.