Converting Legacy Data to RDF

There are different types of tools that can be considered, depending in particular on the format of the original source.

From Relational Databases

Triplify is tool to generate RDF from relational databases. The way to use Triplify is to define a set of SQL select queries on the database, that also include information about the way the results should be converted into RDF.

D2RQ allows to create a mapping that relates the structure of a database to RDF triples, and transforms at run-time SPARQL queries into SQL queries using this mapping. The D2RQ tool can create a default ‘naive’ mapping from the database schema, which can then be customized. It is worth noticing also that D2RQ can also create an RDF dump of the content of the database using the same mapping.

It is worth noticing here as well that the W3C has set-up an RDB2RDF working group, in charge in particular of defining a common language, set of requirements and test cases for transforming relational databases (RDB) into RDF.

From XML and RSS

XML and RDF share a common base, at least in terms of syntax (i.e., RDF/XML uses an XML syntax, XML can be made, somehow, RDF friendly and RSS 1.0 is, in principle, already in RDF). There have therefore been quite a few examples of syntactic conversions of XML to RDF/XML, using in particular XSLT.

The GRDDL language recommended by the W3C intends to provide a standard and systematic way to achieve such XSLT based transformation, by making it possible to declare that XML documents include data compatible with RDF. It has been extensively used for example for the conversion of microformats.

From Tables and Spreadsheets

Google Refine is a tool which is meant as an easy way to clean, transform and explore data in a tabular format. It can import from many different sources, including MS Excel, Google Spreadsheet and CSV, and includes a number of useful features to work on the data. While it is not originally developed to support RDF export, it is extensible. The RDF Extension has been created in order to allow export into RDF (with a graphical definition of the mappings between the table and RDF), as well as to including useful tools to connect the content of the table to external linked datasets.

Other tools exist such as Any23 or QUIDICRC that provides simple, direct transformation of CSV files into RDF.

More Specific Sources

SIMILE RDFIzer is a set of specialized converters for a large variety of input formats. Of relevance to the education domain, we can for example notice marcmods2rdf which converts library catalog records to RDF, oai2rdf which can extract RDF from open archive repositories (OAI-PMH) and ocw2rdf which can extract RDF from MIT OpenCourseWare metadata.

Other examples include Bibtex2RDF for converting bibliographical references in the Bibtex format, or the Youtube2RDF tool that converts Youtube playlists into RDF using media vocabularies.

On of the most common source of information about available resources in a university is often the library catalogue. Marimba offers a complete solution for the extraction and curation of MARC records as linked data, based on custom mappings.

Community Management and Sharing

VIVO is a tool for representing information about research and researchers -- their scholarly works, research interests, and organizational relationships. VIVO provides an expressive ontology, tools for managing the ontology, and a platform for using the ontology to create and manage linked open data for scholarship and discovery. VIVO is now being established as an open-source project with community participation from around the world. By the end of 2012, over 20 countries and 50 organizations will provide information in VIVO format on more than one million researchers and research staff, including publications, research resources, events, funding, courses taught, and other scholarly activity. See the book "VIVO: A Semantic Approach to Scholarly Networking and Discovery" for more information.

One of the most common tools used to manage repositories of research outputs in Universities is ePrints: An Open Source, customisable platform allowing researchers to deposit their publications and to expose them in open access. ePrints automatically provides a link for each item to be exported into RDF, using the BIBO ontology (see the Vocabularies section). Similarly, the online bibliography management and sharing platform Bibsonomy can create bibliographic lists in various formats, including RDF.

GNOSS is software platform, that integrates knowledge management, informal learning and collaborative work in a Linked Data environment. A GNOSS space incorporates semantic facet-based searches and semantic context creation based on structured data and links to Open Data (Linked Data) internal and external linked data sources. The platform has in particular been in Didactalia.net, a K-12 global community and an storage place for teachers, students and parents to create, share and find open educational resources, as well as in the context of higher education, such as with the University of Deusto, to manage research communities and groups.