Software Information Metacatalog



Swim is a Software Information Metacatalog that gathers detailed information about the software components and packages installed on each grid resource. Information is currently gathered for Executable and Linking Format (ELF) executables and shared libraries, Java classes, shell scripts, and Perl and Python modules. Swim is built on top of the Pour framework, which is a general-purpose framework for reconciling information from periodic, on-demand, and user-specified sources. Swim consists of a set of Perl modules for extracting software information from a system, an XML schema defining the format of data that can be added by users, and a Pour XML configuration file that describes how these elements are utilized to generate periodic, on-demand, and user-specified information. The Pour framework provides the user interface and the basic back-end information storage and retrieval. Pour validates user-specified information against the Swim XML schema and calls the appropriate Swim modules when information is required on-demand. Periodic information is generated by cron jobs that run Swim scripts on each grid resource and invoke the appropriate Pour method to add the information to a Pour repository on some host.

Periodic software information is derived mainly from the package managers used on each system. Swim collects information from native package managers on FreeBSD, Solaris, and IRIX, as well as the RPM, Perl, and Python package managers on multiple platforms. It is advantageous to use package managers since in most cases they are the tools used by administrators to install the software in the first place. Since not all software is available or installed in package form, however, Swim also crawls the set of relevant paths from the Filesystem Hierarchy Standard, which defines the standard filesystem structure used by all major Unix distributions. Using these two techniques, the vast majority of software installed on a system will be located.

Information that is too expensive to compute for all software components or that is specific to individual users can be computed on-demand based on the query arguments. Swim currently supports three types of on-demand information. It derives software dependencies for ELF executables and libraries, Java classes, and Perl and Python modules. It computes the same information gathered by the periodic routines for specific files on specific hosts. Finally, it locates software on a system given only its name and type.