XLWrap – Spreadsheet-to-RDF Wrapper

News

Contents

  1. Introduction
  2. Example
  3. XLWrap Mappings
  4. Mapping Design Patterns (another page)
  5. Usage – XLWrap-Server
  6. Download
  7. Getting Started
  8. Related Projects
  9. Publications
  10. Support
  11. Credits

Introduction

XLWrap is a spreadsheet-to-RDF wrapper which is capable of transforming spreadsheets to arbitrary RDF graphs based on a mapping specification. It supports Microsoft Excel and OpenDocument spreadsheets such as comma- (and tab-) separated value (CSV) files and it can load local files or download remote files via HTTP.

Application Areas:

Example

Consider, we would like to make data from this spreadsheet available via a SPARQL endpoint. The spreadsheet contains revenues from a company represented in a cross table by country, year, and products. We want to provide the generated RDF data according to the SCOVO vocabulary.

These are the only three steps we need to do:

That's it!
You can use Snorql (known from D2R-Server) to browse the wrapped dataset or use the SPARQL endpoint at http://localhost:8900/sparql.

Here is the complete target graph for the given example (alternatively in RDF/XML syntax).

Additional examples are provided as part of the Mapping Design Patterns collection.

XLWrap Mappings

A mapping is provided as part of an RDF file in TriG syntax. TriG is very similar to N3 and allows the representation of multiple (named) RDF graphs in a single file. An XLWrap mapping file must contain exactly one graph that contains exactly one RDF instance of xl:Mapping (typically as part of the default graph). Other graphs in the same file will be used to denote template graphs.

Example:

{ # default graph in TriG file
	[] a xl:Mapping ;
	xl:template [
		xl:fileName "file:example.xls" ;
		xl:sheetName "foo" ;
		xl:templateGraph :Foo ;
		xl:transform [
			...
		]
	] .
}

:Foo {
	rdfs:label "A2 & ' ' & B2"^^xl:Expr ;
		...
}

XLWrap mappings are based on template graphs, which may contain XLWrap expressions including cell references similar to a typical spreadsheet application such as Microsoft Excel or OpenOffice Calc. Template graphs are repeatedly applied on a work sheet (or other work sheets) in order to produce the target graph. Depending on the representation of the information stored in the spreadsheet, which may be flat tables or cross tables over multiple sheets and files, each template graph is moved across sheets and subsequently applied for different combinations of cells. How template graphs are moved is specified by transform operations.

The XLWrap mapping vocabulary can be found here: http://purl.org/NET/xlwrap#.

Mapping Specification

An xl:Mapping consists of one or more map templates, which are denoted by the property xl:template. Example:

{ # default graph (mapping specification)
	[] a xl:Mapping ;
	xl:template [ ... ] ;
	xl:template [ ... ] ;	
	...
}

Each map template (value of xl:template) consists of the following components:

Transform Operations

The following transform operations (sub-classes of xl:Transformation) are currently available:

For each of these operations, an optional range restriction (property xl:restriction) can be specified, which restricts the operation to a multi-range, i.e. only range references within the restriction are transformed (default is any range: *.*).

Furthermore, an optional break condition can be specified as an XLWrap Expression with the xl:breakCondition property (default condition expression: false, i.e. "never break until end of file or specified times repeated"). The condition is evaluated after a template graph is transformed and if it evaluates to true, the transform operation is skipped and XLWrap continues with the next stage of the following transform operation.

For shift operations it is possible to specify the steps (in terms of columns/rows/sheets to shift) with the property xl:steps (default is 1) and the number of times the operation should be repeated with the property xl:repeat (default is the maximum integer value of the runtime system).

In case of the repeat operations it is required to specify a list of sheet names with the property xl:sheetNames or file names, respectively, with xl:fileNames. They are specified as a comma-separated list in a plain literal.

XLWrap Expressions

XLWrap expressions are used:

The basic elements of XLWrap expressions are:

The complete grammar for XLWrap expressions is available as part of the distribution.

Range References

Expressions may contain range references in order to obtain values from spreadsheets. The common syntax for range references is:

( ( [filename] "#$" )? [sheet1] "." )? [column1] [row1] ( ":" ( [sheet2] "." )? [column2] [row2] )?

File and sheet names have to be quoted if they contain spaces or special characters. Please note the optional parts inside of the brackets. There are actually different kinds of ranges references:

Furthermore, it is possible to specify sheets by numbers instead of names with a #-prefix: e.g. #1.A3 refers to A3 on the first sheet of the current workbook (the current workbook is usually the base workbook specified for the map template, however, as a consequence of transform operations this can also be another workbook file).

Depending on the situation, specific kinds of ranges may be valid or not. For instance, specifying a box range as a single expression is not valid: A3:A9 is invalid. However, it is possible to specify a box range as an argument of the function SUM(): SUM(A3:A9; A10; 34; 10).

URIs and Anonymous Identity

In order to construct URI nodes, a blank node is used with the special property xl:uri, specifying the URI of the node. For example:

[ xl:uri "'http://example.org/' & URLENCODE(A2 & B2)"^^xl:Expr ] a foaf:Person .

will create a statement like <http://example.org/Tim+Smith> a foaf:Person at runtime.

Similarly, it is possible to link anonymous resources (blank nodes):

[] a ex:Revenue ;
scv:dimension [ xl:id "A4"^^xl:Expr ; a ex:Product ; rdfs:label "A4"^^xl:Expr ] .

The XLWrap processor will ensure, that all resources with equal xl:id values will get equal blank node identifiers in the target graph. This feature is very important for more sophisticated mappings. It is also used for the SCOVO example above.

Functions

The function library of XLWrap will be continuously extended. We will provide detailed documentation of functions in future. For the meanwhile, please have a look at the source code of the core functions.

It is also possible to add custom functions. Each function is implemented by a Java class. The name of the class must adhere to the following pattern: E_Func[NAME] where NAME is the name of the function as it is used as part of expressions (e.g. NOW() is implemented by the class E_FuncNOW). The location on the classpath is not important, you can place it anywhere.

Usage – XLWrap-Server

XLWrap can be used either in-process via the Jena API or as a SPARQL endpoint via the bundled XLWrap-Server.

Using XLWrap via the API

The relevant classes to look for are:

Example:

XLWrapMapping map = MappingParser.parse("mappings/iswc09-example-scovo.trig");
XLWrapMaterializer mat = new XLWrapMaterializer();
Model m = mat.generateModel(map);
m.write(new FileOutputStream("docs/website/example/revenues-scovo.n3"), "N3");

The materializer returns an ordinary Jena model. If you want to provide your own Jena model (possibly an inference model), you can explicitly provide a target model:

Model m = mat.generateModel(map, targetModel);

The materializer instance can be reused multiple times for different processes. It will keep any workbooks in memory for faster processing of multiple mappings with common spreadsheet files (especially when they have been downloaded from the Web).

Using XLWrap-Server

Just start the server from the XLWrap directory: $ bin/server and place mapping files into the folder mappings. The command takes the following arguments:

 -?,--help                        help
 -B,--dataset-base <uri-prefix>   Dataset base URI prefix for Pubby (see
                                  http://www4.wiwiss.fu-berlin.de/pubby/)
 -c,--config <port>               Joseki config file
 -D,--desc <text>                 Dataset description
 -d,--data <port>                 RDF data (.n3, .ttl, .rdf, .xml)
 -H,--homepage <url>              Homepage
 -h,--host <hostname>             hostname (if it cannot be correctly
                                  determined by the JVM)
 -p,--port <port>                 port (default: 8900)
 -T,--title <text>                Dataset title

If no argument is specified, the shell script automatically adds "-c joseki-xlwrap.ttl" to the call. The options -c and -p are probably the most important ones. Some meta data displayed at the web frontend and used by Pubby can be specified also.

Now open your browser and go to http://localhost:8900 (please adjust the port if needed).

Useful Functions to provide Linked Data

XLWrap provides useful functions for mappings: HOSTNAME() and PORT() that can be used to obtain the current hostname and port the server is running on. Another important function is MAKEURI(), which can be used to create local URIs in order to make the resources accessible via the integrated Linked Data Browser Pubby. Call MAKEURI() with either one or two String arguments. In the first case, the argument is URL-encoded, in the second case, the first argument is appended as-is and the second String argument is URL-encoded.

Additionally, we will provide functions in future to help users creating external links to other Linked Data sources such as DBpedia. We will integrate several record linkage algorithms and geo-mapping functions to create links to DBpedia, geonames.org, etc.

Sources and Download

Sources: https://github.com/theandyl/xlwrap

Downloads: http://sourceforge.net/projects/xlwrap/files/xlwrap/

Getting Started Guide

  1. Please have a look at the example and download XLWrap.
  2. For details on starting and using XLWrap-Server proceed with the usage section.
  3. In order to start creating your own mappings read the mappings section.

Related Projects

XLWrap has been developed because there previously was no spreadsheet-to-RDF wrapper that supported cross tables. The following related projects are all based on a row-by-row wrapping process and plain text CSV files only:

Publications

Andreas Langegger, Wolfram Wöß (2009): XLWrap – Querying and Integrating Arbitrary Spreadsheets with SPARQL. In Proceedings of the 8th International Semantic Web Conference (ISWC2009), Washington D.C. LNCS 5823, Springer, 2009.

Andreas Langegger, Wolfram Wöß (2009): Querying and Semantically Integrating Spreadsheet Collections with XLWrap-Server – Use Cases and Mapping Design Patterns. Poster & Demo Session at the 8th International Semantic Web Conference (ISWC2009), Washington D.C., 2009.

Support

Please use the XLWrap Users mailing list to get support.

If you need professional support in your project, please contact me directly.

Credits

Contact: aka AndyL

Many thanks to Richard Cyganiak for contributing bug fixes and very good ideas for improvements and new features.

XLWrap has been developed as another wrapper for the Semantic Web Integrator and Query Engine (SemWIQ).

Thanks to SourceForge.net for providing the infrastructure.

SourceForge.net Logo