Search-based User Interaction on the semantic web, a survey of existing systems.

Michiel Hildebrand, $Date: 2007/06/15 15:37:50 $

The survey is now maintained at SWUI wiki!

Introduction

The goal of this survey is to provide an overview of the different notions of text based search on the semantic web, also known as semantic search.

W3C already maintains a list of Semantic Web Tools and various others exists, such as the Developers Guide to Semantic Web Toolkits and the Comprehensive Listing of Semantic Web and Related Tools by Michael K. Bergman. These lists support the semantic web community (mainly developers) by providing an overview of available tools. This survey can contribute to this by giving more insight into the different perspectives on accessing semantic web content by text based search.

We have started compiling a list of systems that provide access to semantic web data through a graphical user interface. We are interested in systems that provide some form of free text search, in addition to those that include different forms of browsing. For each system we state the intended purpose, intended users, the scope, the triple store and optionally the technique or software that is used for literal indexing.

Secondly, we analyze the systems in three different stages of the search process: input, processing and search results, as well as the role of user feedback within these. For each of these we consider the functionality that the system provides and how this is made available through the graphical user interface.

Contributions

This list is not complete with respect to the covered systems neither with respect to the information provided about the systems. We aim to make (and keep) it complete and welcome any contribution that can help in achieving this. In particular we would really appreciate suggestions on missing systems and on information about the systems within the specified fields. If you have any comments, suggestions or additions please send an email to michiel.hildebrand@cwi.nl.

System Overview

Notes

The suggested values (second row in the table) do not cover all possible values, they merely serve as examples.

Purpose Users Scope Store Literal Index
Examples
  • Search engine
  • Information management
  • (Faceted) Browser
  • Portal
  • Wiki
  • End users (by domain expertise)
  • developers
  • software agents
Does the system provide access to all documents on the web, to all resources on the semantic web or do they give access to a limited set of resources (in a particular domain)?
  • Sesame
  • Jena
  • SWI prolog
  • Other
  • Lucene
  • Application specific
  • Other
AquaBrowser
(demo, analysis )
Search engine, faceted browser End-users novice-medium Resources Database ? ?
AquaLog
(demo, paper, analysis)
Question Answering End Users (medium-expert) Single Ontology - GATE
Autofocus
(demo, Paper, analysis)
Search engine, Browser End users (medium-expert) RDFized text documents Sesame Lucene
BrowseRDF
(demo, paper, analysis)
Faceted Browser Developers, End Users RDF Collection Andy store accessible through ActiveRDF Ferret (Ruby port of Lucene)
DBin
(demo, paper, analysis)
Information management Developers. End users (medium-expert) Domain specific RDF and OWL ? ?
DBpedia Search
(demo, analysis)
Search Engine End Users (novice-medium) Wikipedia in RDF Virtuoso Virtuose
e-Culture
(demo, paper, analysis)
Portal, Search engine End users (novice-expert) Multiple data collections and thesauri in RDFS. Thesauri mappings in OWL. SWI Prolog Semantic Web Library SWI Prolog Semantic Web Library (porter stemming)
Falcon-S
(demo, analysis)
Search Engine, Query builder End users (novice-expert) RDF Collection and OWL ontology Sesame and OWLIM Lucene
Flink
(demo, paper, analysis)
Browser End users (medium-experts) RDFized documents. Sesame ?
FreeBase
(demo, analysis)
Data management, Wiki, Search engine End-users novice-expert Resources Database ? ?
Ginseng Question answering End users Resource + ontology ? ?
H-DOSE
(paper, analysis)
Search Engine Developers, End Users Web Documents and Domain Ontology Sesame SQL DB
Haystack
(demo, paper, analysis)
Information management Developers and end users (medium-expert) Semantic web Application specific (C++) Lucene
Hybrid Search
(analysis, paper)
Search engine End users Single collection in RDF Application specific Lucene
InWiss
(paper, analysis)
Search Engine End Users Data collections in RDF Sesame Applications specific index
KIM
(demo, paper, analysis)
Information management, Search engine, Faceted browser Developers and end users (medium-expert) Text documents and extracted metadata and KIMO ontology Sesame Lucene
Longwell
(demo, analysis)
Faceted browser End users (medium-expert) Single data collection in RDF Sesame Lucene
mspace
(demo, paper, analysis)
Browser End users (medium-expert) Single data collection in RDF ? ?
Museum Finland
(demo, paper, analysis)
Portal, Faceted Browser, Search Engine End users (novice-expert) Multiple data collections in RDF and 1 upper ontology in OWL. Ontogator Ontogator
OntoKhoj
(paper, analysis)
Search engine Developers Semantic Web ? ?
OntoSearch
(paper, analysis)
Search engine Developers Semantic Web
OntoWiki
(demo, paper, analysis)
Wiki End users (medium-expert) Single data collection in RDF MySQL MySQL
OpenAcademia
(demo, analysis)
Search engine, Browser End users (novice-expert) multiple collections in RDF. Instance mappings in OWL. Sesame ?
OWLIR
(paper, analysis)
Search engine End users Text Document + Extracted RDF triples + Additional scraped triples DAMLJessKB SIRE
QuizRDF
(paper, analysis)
Search Engine End Users (novice-expert) Text documents. RDFS Ontologies. Application specific Application specific
SemSearch
(demo, analysis)
Search Engine End Users (medium-expert) Single data collection in RDF Sesame Lucene
Semantic MediaWiki
(demo, paper, analysis)
Wiki End Users (medium-expert) MySQL ?
SHOE search tool
(demo, paper, analysis)
Search Engine Developers Text document with SHOE annotations Application specific ?
Slashfacet
(demo, paper, analysis)
Faceted Browser Developers. End users (medium-expert) Multiple collections and thesauri in RDF SWI prolog semantic web library SWI prolog semantic web library (porter stemming)
Squiggle
(demo, paper, analysis)
Search Engine Developers, End Users (novice-medium) RDFized image metadata and RDF thesaurus Sesame Lucene, record data and literal values in RDF
Squirrel
(paper, analysis)
Search Engine End users (novice-experts) Text documents and extracted metadata PROTON Ontology KAON2 and Sesame+OWLIM Lucene
Swoogle
(demo, paper, analysis)
Ontology Search Engine Developers and Software agents Semantic web ? ?
SWSE
(demo, analysis)
Search engine End users (novice-expert) Semantic web and RSS feeds YARS Lucene
Tap: Semantic Search
(demo, paper, analysis)
Context based search Engine End users Text document and RDF ontologies Tap Framework Tap Framework
Topia
(demo, paper, analysis)
Search engine End users (novice-expert) Single cultural heritage collection Sesame ?

System analysis

Notes

Many applications allow the user to view a specific resource and its direct metadata values. We call this a local view on a resource. Typically the values in the local view are hyperlinks that allow the user to browse the semantic graph. This is an useful technique to explore the RDF graph ( Systems such as Tabulator and Disco use this approach for browsing the entire semantic web). Unless additional features are presented in the local view we do not mention this functionality.

The suggested values (second and third row in the table) do not cover all possible values, they merely serve as examples.

Search Input Processing User Feedback Search Results
Examples Functionality Search term:
  • Single/multiple keyword(s)
  • URI(s)
Structure search term:
  • boolean operators
  • special purpose operators
  • regular expressions
Value selection:
  • Facets
  • Forms
syntactic matching:
  • Single keyword: prefix, substring
  • Interpretation of multiple keywords
Retrieval:
  • direct matching document/URI
  • query extension
  • graph search
Search input:
  • disambiguation: clarify intended meaning
Search results:
  • refinement: constrain/extend the current result set
  • exploration: find new (related) resources
Result Item:
  • resource (document, image, concept)
  • set of triples
Organization:
  • ranking
  • clustering
Interface Keyword entry
  • text entry box
Value selection
  • facet (value list)
Input options
  • target type selection
  • syntactic match type (prefix, substring)
  • processing time
  • number of results
  • loading message
  • Select from discrete list of values
  • New or extended search in text-entry box
  • Resource (document,image etc.)
  • selected metadata
  • Fresnel
  • template
Aquabrowser Functionality
  • Text search: multiple keywords
  • Syntactic matching: minimal edit distance
  • Semantic processing: related terms
  • Query refinement
  • Exploration through related terms, (associations,translations,spelling)
  • Change sorting (predefined properties)
  • List of items
  • Ranking syntactic
Interface
  • Text entry Box
  • Number of results
  • Query refinement: value selection from facets
  • Related terms: Concept graph
  • Sorting: value list
  • Label + selected metadata
AquaLog Functionality
  • Natural language expression
  • ?
  • Disambiguate input terms
  • Set of items
Interface
  • Text entry box
  • Interpretation of NL expression. Description of the syntactic form of the expression. Ontology terms corresponding to the terms in the input. result of syntactic match
  • Disambiguation: Drop down list with ontology resources
  • Label of item
Autofocus Functionality
  • Text search with multiple terms
  • Value selection from predefined facets
  • syntactic matching: subword match on extracted terms
  • Retrieval (keyword search): resources with matching literal value
  • Retrieval (value selection): resources with selected value as metadata
  • Refinement: add new search term or facet value. In contrast to faceted browsing all values remain available for refinement (the active values are highlighted). This allows to make multiple different intersections.
  • Set of items grouped by relation to constraints
Interface
  • Text search: text entry box and keyword suggestion list
  • Search options: check boxex for specific metadata fields to search in
  • Value selection: selectable facets with grouped value list
  • Number of results per selected value or search term
  • Similar as initial search
  • Table with items metadata. Cluster map visualization
BrowseRDF Functionality
  • First select facet and then select value from it. Can be done recursive to make joins.
  • Resources with selected value as metadata
  • Add new facet value
  • Set of items
Interface
  • Value selection: results are facet values
  • Selected facet values
  • Similar as initial search
  • Label
DBin Functionality
  • Value Selection from ontology navigator. Selection "precooked" queries with value selection for variables inside the query
  • Matching results as defined by view
  • Update Query
  • Set of items or triples (view dependent)
Interface
  • Ontology navigator: Tree browser. Queries: value selection lists
  • View dependent
  • Similar as initial search
  • Several visualization widgets
DBpedia Search Functionality
  • Search term: compound term
  • Syntactic matching: direct match on RDF literal values and description
  • Retrieval:Metadata through 1 step relation
  • Refinement: Select Class
  • Set of items. Ranked by pagerank variant
Interface
  • Text Entry Box
  • Relation to matching value
  • Number of results
  • Processing time
  • TagCloud
  • Label + thumbnail + description + class
e-Culture basic Functionality
  • Text search with single search term
  • syntactic matching: minimal letter distance on stemmed index
  • Retrieval: backwards graph search with weighted relations. Weight are manually assigned by relation type
  • Disambiguation: result clusters grouped by result path
  • Set of items grouped by search path
  • Clusters: search path
  • Ranking: clusters are ranked by path length. Items within a cluster a ranked by score (=syntactic match * total path weight).
Interface
  • Text entry box
  • number of total results and number of results per cluster
  • Hyperlinks for cluster headers to zoom in on this cluster
  • Thumbnails with selected metadata
Falcon-S Functionality
  • Query construction by value selection
  • Image related that values that match query
  • Select specific value matching query
  • List of items
Interface
  • drop down lists
  • -
  • List of matching values
  • Image+source url
Flink (only person network) Functionality
  • Value selection from person list
  • 1. Resource matching selected value (many owl:sameAs relations). 2. All resource related by foaf:knows. 3. social network analysis to determine statistical values
  • -
  • Set of triples
Interface
  • Value list (alphabetically grouped)
  • -
  • -
  • Graph of social net
FreeBase Functionality
  • Text-search: multiple keywords
  • Filtering
  • Browse: by domain, type, topic
  • Syntactic matching: exact match
  • Refinement by filters. Filter values are not updated according to current selection
  • Set of items
  • Ranking: syntactic
Interface
  • Search: Text entry field
  • Filtering: multiple property fields
  • Browse: value lists
  • # of results
  • Multiple property fields
  • Label + Type + summary
Ginseng Functionality
Interface
H-Dose Functionality
  • Multiple keywords
  • "Semantic"Vector Space Model
  • -
  • List of items, Ranked with tf.idf at concept level
  • Spectra of results
Interface
  • Webservice (no GUI)
  • -
  • -
  • URI of result
Haystack Functionality
Interface
Hybrid Search Functionality
  • Text search with multiple search terms
  • syntactic matching: ?
  • Retrieval: Spread Activation algorithm. Weights are determined by similarity and specificity measure plus manually assigned by relation type.
  • Refinement: Related keywords
  • Set of items clustered by type.
  • Ranking based an activation.
Interface
  • Text entry box
-
  • list of keywords
  • Item presented by title and visually grouped by type
InWiss Functionality
Interface
KIM Functionality
  • Text search with multiple terms and Lucene operators
  • Pattern search consisting of a structured query and a search term
  • Value selection from facets
  • Facet value autocompletion search
  • syntactic match: string match on extracted entities and metadata
  • Retrieval (keyword search): resources with matching literal value
  • Retrieval (pattern search): resources with matching literal value and exact query match
  • Retrieval (value selection): resources with selected value as metadata
  • Refinement (keyword search): add search term for other metadata field
  • Refinement (value selection): Add new facet value or related concept
  • Set of items
Interface
  • Text search: Form with text entry Boxes for title,keyword,author and content
  • Pattern search: a complex form representing the structure of the query
  • Value selection: facets with value list and text entry box for autocompletion. The facets that are shown in the interface can be manually selected.
  • Number of matching documents.
  • Selected terms
  • Similar as initial search but with available facet value updated to current selection.
  • Item presented by title and date
Longwell Functionality
  • Value selection from facets
  • Facet value autocompletion search
  • Text search with single search term
  • syntactic matching (autocompletion): prefix
  • syntactic matching (keyword search): subword
  • Retrieval (keyword search): resources with matching literal value
  • Retrieval (value selection): resources with selected value as metadata for specific facet
  • Facet values are updated to current selection
  • Add new facet value or search term
  • Set of items
Interface
  • Text search: text entry box
  • Value selection: facets with value list and text entry box for autocompletion. All facets are shown but can be the value list can be hidden.
  • Loading message at every click
  • Number of total results
  • Number of results for each facet value
  • Selected facet values
  • Similar as initial search
  • Fresnel
mspace Functionality
  • Value selection from facets
  • Facet value autocompletion search
  • syntactic match (autocompletion): prefix
  • Retrieval: results related to selected value by predefined graph paths.
  • Refinement: select new value from facet
  • Change order of facets to construct different view
  • Selected item
Interface
  • Value selection: facets with value list and text entry box for autocompletion. Visible facets can be manually selected.
  • Selected facet values are highlighted
  • Facets are draggable to change order
  • All related values of the result item are shown.
Museum Finland Functionality
  • Value selection from facets
  • Text search with single search term
  • syntactic matching: subword
  • Retrieval (keyword search): resources with matching literal value
  • Retrieval (value selection): resources with selected value with metadata or with a narrower concept as metadata for specific facet
  • Disambiguation: keyword search matches by use (facet in which they occur as a value)
  • Refinement: add value from new facet or select more specific value from active facet
  • Exploration (if a single result is selected): related results have similar values for predefined properties or paths
  • Set of items
Interface
  • Text search: text entry box
  • Value selection: facets with value list
  • Number of results for each result cluster
  • Number of results for each facet value
  • Selected facet values
  • Similar as initial search
  • Thumbnail with selected metadata
OntoKhoj Functionality
  • Single keyword
  • Syntactic matching: -
  • Semantic processing: First try direct values, else synomyn, else hypernym
  • Disambiguation through Wordnet senses
  • List of items
  • Ranking: Naive Bayes, TFIDF, Probabilistic Indexing and K-Nearest neighbor
Interface
  • -
  • -
  • -
  • -
OntoSearch Functionality
  • Keyword
  • Google search on content of RDFS files
  • -
  • List of items
Interface
  • Text entry box
  • -
  • -
  • URI
OntoWiki Functionality
  • Text search: multiple keyword
  • Value selection: from class hierarchy
  • Value selection from filters (facets)
  • Syntactic matching: subword
  • Semantic processing: direct values
  • Add value from filter (facet)
  • List of items
  • Text search results are ranked on # of matches
Interface
  • Text search: text entry field
  • Value selection: Nested list
  • Value selection: faceted filters with drop down list
  • Number of results
  • Selected class
  • Similar as initial search
  • Label + Type
  • Interactively add column to result table with other properties
Open Academia Functionality
  • Text search with single search term for metadata field
  • Value selection from metadata fields
  • syntactic matching: subword
  • Retrieval (keyword search): Resources with matching literal for specified metadata field
  • Retrieval (value selection): resources with selected value as metadata for specified field
  • Values in metadata fields are updated to selection
  • Add new search term or metadata field
Set of items.
Interface
  • Text search: search form
  • Value selection: drop down lists for fixed set of fields
  • Number of total results
  • Processing time
  • same as initial search
Different visualization tools, tagcould, topic graph, social net, timeline, clustermap and relation graph
OWLIR Functionality
Interface
QuizRDF Functionality
  • Text search with multiple search terms
  • Search options: Class of the provided input. Options for syntactic match, case, exact match, only in title.
  • syntactic matching: defined by match options
  • Retrieval: documents with syntactic match on index table. Index table of a document contains the literal values from all direct annotations.
  • Disambiguation: select class and values for metadata fields used for instances of this class
  • Set of items
  • Ranking by a variant of tf.idf
Interface
  • Text search: text entry field
  • Options: Drop down menu with classes
  • Options: checkboxes for search options
  • Number of results
  • Possible classes of the input are updated to result set
  • Drop down for classes
  • Search form for properties with a literal value range
  • Title of document + all metadata
.
SemSearch Functionality
  • Text search with multiple search terms
  • Structure: Boolean operators AND/OR. Search engine specific operator ":" to indicate the result target type
  • syntactic matching: subword
  • Interpretation: Based on the sets of resources matching the input a formal query is constructed
  • Retrieval: Resources matching the constructed query. RDFs reasoning over class and property hierarchy.
  • Disambiguation/Refinement: Deselect class/property/instance of matching search terms
  • Set of items
  • Ranking on syntactic match
Interface
  • Text search: Text entry box
  • Number of total results
  • Processing time
  • Form with the matches for each keyword. Checkboxes to toggle them
  • Title + the entities that matched the query + the relation from the keyword matches to the result
Semantic MediaWiki Functionality
  • Text search: multiple keyword
  • Value selection: class
  • Value selection: values from instances of selected class
  • Syntactic matching: full text of articles
  • Syntactic matching: subword of literal values
  • Select namespace of target
  • Set of items
  • Clustered by type
Interface
  • Text search: text entry box
  • Value selection: classes from navigation menu (list)
  • Value selection: instances from alphabetical sorted value list
  • Number of results
  • List of checkboxes
  • Label + bytes of result + property and Matching value + summary
Shoe Functionality
  • Text search: multi keywords per field
  • Value selection: from class hierearchy
  • Syntactic matching: exact match
  • Semantic processing: direct values
  • -
  • List of items
Interface
  • Text search: search form
  • Value selection: nested list
  • Number of results
  • -
  • Table with selected properties
Slashfacet Functionality
  • Value selection from facets
  • Facet value autocompletion search
  • Global facet autocompletion search
  • syntactic matching (autocompletion): prefix
  • Retrieval (value selection): resources with selected value as metadata for specific facet or with a narrower concept
  • Facet values are updated to current selection
  • Complex query paths can be constructed through interactive interface
  • Disambiguation (global facet search): by use of value (facet in which the value occurs)
  • Disambiguation (in facet search): by location in the value hierarchy
  • Refinement: Add a new facet value
  • Set of items
  • Clustered by manually selected property
Interface
  • Value selection: facets with value list and text entry box for autocompletion. All facets are shown but can be the value list can be hidden.
  • Global facet search: text entry box with autocompletion dropdown list
  • Loading message at every click
  • Number of results per cluster
  • Number of results for each facet value
  • Selected facet values are highlighted
  • Disambiguation (global search): grouped by class
  • Disambiguation (in facet search): value in hierarchy shown as unfolded tree
  • Refinement: similar as initial search
  • Thumbnail with selected metadata
Squiggle Functionality
  • Text search with multiple search terms
  • syntactic matching: Lucene search engine.
  • Retrieval: Resources with matching literal value. After disambiguation by selecting a concept resources are matched to all literal values known for the selected concept.
  • Multiple terms in a query are interpreted disjunctive. Conjunctive queries on concepts can be made by selecting multiple values from the suggestions.
  • Disambiguation: by matching URI and by rdf:type
  • Exploration: related concepts grouped by rdf:type
  • Set of items
Interface
  • Text search: text entry box
  • Total number of results
  • Processing time
  • Hits per matching literal
  • Disambiguation: List of concepts with checkbox
  • Exploration: List of concepts
  • Thumbnail or title + selected metadata
Squirrel (in progress) Functionality
  • Text search: Multiple keywords
  • Select instances of matching Class
  • Refine result set by topic
  • List of result types including most relevant instances
  • Ranking: statistical measure syntactic match. Optionally rank according to user profile.
Interface
  • Text entry box
  • Number of results per type
  • List of values
  • Label and result count
Swoogle Functionality
  • Text search: Search term or URI
  • Structure: boolean operators AND,OR. Specific constructs to indicate domain for syntactic match: in URI, namespace, local name, literal values
  • syntactic match: subword
  • Retrieval (ontology): contains resource with matching literal value
  • Retrieval (term): resource with matching literal value
-
  • Set of items
  • Ranking: Ontorank [explain] and termran [explain]
Interface
  • Text search: Text entry box
  • Options: result type (document, ontology, term)
  • Number of total results
  • Processing time
- rdfs:Label for terms and URI for documents + selected metadata
SWSE Functionality
  • Text search: Multiple search terms
  • syntactic match: subword
  • Retrieval: resources with direct matching literal value
  • Search results refinement: select target type
  • Set of items
  • Ranking: ReConRank (pagerank on full RDF graph + context graph)
Interface
  • Text search: Text entry box
  • Number of results
  • Processing time
  • Search results refinement: drop down list with Classes
  • URL + rdf:type + number of triples in which URI occurs
Tap: Semantic Search Functionality
  • Text search with max two search terms
  • syntactic matching: subword
  • Retrieval (semantic search): Full graph search. Restricted to manually assigned properties for each class
  • Retrieval (web crawling): Literals value found in graph search crawl preselected websites for additional information
  • Refinement: select a specific view by topic
  • Exploration: the semantic search result augment results from a traditional search engine
  • Set of items
  • Clustering: by type
Interface
  • Text search: Text entry box of host search engine
- (see results)
  • Results are presented alongside traditional search results
  • Template for each result class
Topia Functionality
  • Single keyword
  • Syntactic match: subword
  • Semantic processing: Direct values
  • Refine by metadata value from selected property
  • List of items
Interface
  • Text entry box
  • Advanced options to control clustering algorithm, including weights for the properties.
  • Number of results per cluster
  • Score for each possible cluster
  • Ranked list of properties (clusters)
  • Once property is selected the values are presented in a tree
  • Result list with Labels for each result
  • Focus on particular result shows detailed view including image