Combining the Web of documents with the Web of data to generate enhanced snippets

Authors

Pierre-Edouard Portier, Mazen Alsarem, Sylvie Calabretto

Abstract

We enhance an existing search engine's snippet (i.e. excerpt from a web page determined at query-time in order to efficiently express how the web page may be relevant to the query) with linked data (LD) in order to highlight non trivial relationships between the information need of the user and LD resources related to the result page. Given a query, we first retrieve the top ranked web pages from the search engine results page (SERP). For each result, we build a RDF graph by combining DBpedia Spotlight [2] and a RDF endpoint connected to the DBpedia dataset. To each resource of the graph we associate the text of its DBpedia's abstract. Given the initial result and this textually enhanced graph, we introduce an iterative co-clustering approach in order to add edges between related resources. Then, we apply a first PARAFAC tensor decomposition [1] to the graph in order to select the most promising nodes for a one-hop extension from a DBPedia SPARQL endpoint. Finally, we compute a second tensor decomposition for finding hubs and authorities for the most relevant types of predicates. From this graph analysis, we build the enhanced snippet.

[1] Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM review 51(3), 455–500 (2009)

[2] Mendes, P.N., Jakob, M., Garc ́ıa-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems. pp. 1–8. I-Semantics ’11, ACM (2011)

Mashup online

http://demo.ensen-insa.org