Desktop Data Indexing

On semantic and fulltext indexing of data and metadata on the desktop.

We want to help standardizing clever indexing on the desktop, therefore we discuss with various projects and communities:

  • Gnome-Beagle++
  • Nepomuk
  • KDE-strigi
Topics to discuss:

  • Storage of Metadata: Nepomuk needs to have all data stored in the RDF
repository. Strigi creates a Lucene index. I think we need to integrate both.
  • Metadata extraction: which ontologies to use, i.e. which field names/uris.
Leo, what are you using in the Beagle++ or Aperture crawlers?

People:

  • Leo Sauermann, skype-name: leobard
  • Sebastian Trüg, skype-name: truegerich
  • Volker Krause, skype-name: vkrause
  • Jos van den Oever, skype-name: strigikde
  • Gunnar Aastrand Grimnes, skype-name: gromgull
Input:

Strigi

http://www.vandenoever.info/software/strigi/

strigi uses c++ lucene for fulltext, or other frameworks like tracker. tracker on freedesktop.org

indexing: http://websvn.kde.org/trunk/playground/base/strigi/src/streamindexer

char analyze(Indexable& idx, jstreams::StreamBase *input);

http://websvn.kde.org/trunk/playground/base/strigi/src/streamindexer/indexable.h?rev=609648&view=markup

Dataintegrationhub

data integration hub would delegate subqueries to the data stores?

Ideas

idea: use strigi as backend to redland, implement an rdf store using strigi

Jos : sebastian: that may cause problems, but there are backends or redland with sql.. Jos : question: how to query this rdf enabled strigi store? Leo: this is tricky, its easier to use redland as backend to strigi - or tracker if they support rdf

use SPARQL as query language, with the extensions from the "LuceneSail" (a hack)

These queries are done in the lucenesail, DFKI and Aduna are writing a paper at the moment about this:

select ?x, ?name where {?x rdf:type nio:email. ?x strigi:fulltextSearch ?ft. ?ft onproperty nio:emailBody. ?ft searchText "Hello World".}

select ?x, ?name where {?x rdf:type nio:email. ?x nio:emailBody ?body. FILTERREGEX(?body ,"Hello World").}

?x is the URI of the e-mail

select ?x, ?subject where {?x rdf:type nio:email. ?x nio:emailSubject ?subject. ?x nio:emailBody ?body. FILTERREGEX(?body ,"Hello World").}

select ?x, ?name where {?x rdf:type nio:email. ?x strigi:fulltextSearch ?ft. ?ft onproperty nio:emailBody. ?ft searchText "Hello World".}

select ?x, ?name, ?rank where {?x rdf:type nio:email. ?x strigi:fulltextSearch ?ft. ?ft onproperty nio:emailBody. ?ft searchText "Hello World". ?ft rank ?rank}

http://www.arcknowledge.com/gmane.comp.freedesktop.tracker/2006-10/msg00290.html

Version 2.1 last modified by Lucien Pereira on 24/07/2008 at 18:01

Comments 0

No comments for this document

Attachments 0

No attachments for this document

Creator: Leo Sauermann on 2007/01/24 11:59
Copyright 2004-2007 (c) XPertNet and Contributing Authors
1.1.1