ppi 201502ZU4659

continuidad de la revista impresa

ISSN 0254-0770 / e-ISSN 2477-9377/ Depósito legal pp 197802ZU38

UNIVERSIDAD DEL ZULIA

Una Revista Internacional Arbitrada

que está indizada en las publicaciones

de referencia y comentarios:

• REDALYC

• REDIB

• SCIELO

• DRJI

• INDEX COPERNICUS INTERNATIONAL

• LATINDEX

• DOAJ

• REVENCYT

• CHEMICAL ABSTRACT

• MIAR

• AEROSPACE DATABASE

• CIVIL ENGINEERING ABTRACTS

• METADEX

• COMMUNICATION ABSTRACTS

• ZENTRALBLATT MATH, ZBMATH

DE LA FACULTAD DE INGENIERÍA

VOLUMEN 45 MAYO - AGOSTO 2022 NÚMERO 2

REVISTREVISTA

A TÉCNICA

• ACTUALIDAD IBEROAMERICANA

• PERIODICA

• BIBLAT

Rev. Téc. Ing. Univ. Zulia. Vol. 45, Nº 2, Mayo - Agosto, 2022, 133-144

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

Skyline Queries in SPARQL: An Overview

Marlene Goncalves Da Silva1, Ana Isabel Aguilera Faraco*2

1Departamento de Computación y Tecnología de la Información, Universidad Simón Bolívar,

Caracas, Venezuela, Apartado 89000, Caracas, Venezuela.

2Escuela de Ingeniería Informática, Facultad de Ingeniería, Universidad de Valparaíso,

Valparaíso, C.P. 2340000, Chile

*Autor de correspondencia: ana.aguilera@uv.cl

https://doi.org/10.22209/rt.v45n2a06

Recepción: 02 de Febrero 2022 | Aceptación: 29 de abril de 2022 | Publicación: 01 de mayo de 2022

Abstract

The growth of RDF (Resource Description Framework) datasets and the expansion of their use in conjunction

with the definition of SPARQL, a declarative query language, have made RDF data management an active area

of research and development. In this regard, mechanisms have been proposed to help users find their desired

answers in less time, including ranking methods and preference-based queries. Skyline queries constitute one of

the most practical and predominant types of preference-based queries. The aim of this work was to provide a

guide to specifying SPARQL skyline queries using syntax proposed in state-of-the-art works, and SPARQL

versions 1.0 and 1.1. The results show the possibility of rewriting skyline queries in SPARQL to express

preferences. We plan to develop a tool to translate SPARQL skyline queries applying the different grammars

proposed, into SPARQL 1.0 and 1.1 with the aim of providing an automatic mechanism of translation.

Keywords: databases; data formats; data processing; orogramming languages; skyline query; SPARQL.

Consultas Skyline en SPARQL: Una Visión General

Resumen

El crecimiento de los conjuntos de datos RDF (Resource Description Framework) y la expansión de su uso junto

con la definición de SPARQL, un lenguaje de consulta declarativo, han convertido la gestión de datos RDF en

un área activa de investigación y desarrollo. En este sentido, se han propuesto mecanismos para ayudar a los

usuarios a encontrar las respuestas deseadas en menos tiempo, incluidos métodos de clasificación y consultas

basadas en preferencias. Las consultas Skyline constituyen uno de los tipos más prácticos y predominantes de

consultas basadas en preferencias. El objetivo de este trabajo consistió en proporcionar una guía para especificar

consultas de Skyline SPARQL, utilizando la sintaxis propuesta en trabajos de última generación y SPARQL en

las versiones 1.0 y 1.1. Los resultados muestran la posibilidad de reescribir consultas de Skyline en SPARQL

para expresar preferencias. Se plantea desarrollar una herramienta para traducir las consultas de horizonte

SPARQL, aplicando las diferentes gramáticas propuestas, en SPARQL 1.0 y 1.1, con el objetivo de proporcionar

un mecanismo automático de traducción.

Palabras clave: bases de datos; formatos de datos; lenguajes de programación; procesamiento de datos, skyline

query; SPARQL.

Goncalves Da Silva y Aguilera Faraco. 134

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

Consultas Skyline no SPARQL: uma visão geral

Resumo

O crescimento de conjuntos de dados Resource Description Framework (RDF) e a expansão de seu uso

juntamente com a definição de SPARQL, uma linguagem de consulta declarativa, tornaram o gerenciamento de

dados RDF uma área ativa de pesquisa e desenvolvimento. Nesse sentido, têm sido propostos mecanismos para

ajudar os usuários a encontrar as respostas desejadas em menos tempo, incluindo métodos de classificação e

consultas baseadas em preferências. As consultas de horizonte são um dos tipos mais práticos e predominantes

de consultas baseadas em preferências. O objetivo deste trabalho foi fornecer um guia para especificar consultas

Skyline SPARQL, utilizando a sintaxe proposta em trabalhos de última geração e SPARQL nas versões 1.0 e

1.1. Os resultados mostram a possibilidade de reescrever consultas Skyline no SPARQL para expressar

preferências. Propõe-se desenvolver uma ferramenta de tradução de consultas de horizonte SPARQL, aplicando

as diferentes gramáticas propostas, em SPARQL 1.0 e 1.1, com o objetivo de fornecer um mecanismo de

tradução automática.

Palavras-chave: Bases de dados; formatos de dados; linguagens de programação; processamento de dados,

consulta de horizonte; SPARQL.

Introduction

Semantic data on the Web has increased over the past years. This data is mainly based on the Resource

Description Framework (RDF) and the current state of technologies and techniques in cloud computing (Elzein

et al., 2018); RDF is a data model for representing information about World Wide Web resources. Being a

mature widely tested and robust technology for modeling data, RDF provides a foundation for publishing and

linking data (Ontotext, 2020). RDF data representation allows information to be identified, disambiguated and

interconnected by software agents and different systems. The growth of RDF datasets and the expansion of their

use in conjunction with the definition of a declarative query language called SPARQL, defined by W3C (World

Wide Web Consortium), have made RDF data management an active area of research and development, and a

number of data management systems have been developed for this purpose (Zou and Özsu, 2017). Actually,

RDF datasets exceed billions of triples and continue to grow in terms of both number of repositories and their

sizes (Elzein et al., 2018). SPARQL, a recursive acronym for SPARQL Protocol and RDF Query Language, is a

query language for RDF, also called a semantic query language, used to retrieve data and give precise results

(Kostylev et al., 2015). SPARQL was announced as a new standard by RDF Data Access Working Group in

2008 (Prud‟hommeaux and Seaborne, 2008). Due to the growing amount of linked data, the importance of

semantic search engines for retrieving information has increased. The traditional search model of finding links

on the Web is unsatisfactory for the increasingly complex tasks that seek to leverage the diverse, increasingly

structured and semantically annotated data sets found on the Web (Sessoms and Anyanwu, 2014). The semantic

web search engines that have provided a query language, SPARQL, for processing and running queries on their

indexed data, require some mechanisms for ranking SPARQL query results besides the ranking methods applied

to keyword queries, in order to help users find their desired answers in less time (Feyznia et al., 2014). Other

mechanisms consider preference-based queries, which show encouraging results for personalizing and filtering

the massive amount of information residing in today‟s databases and information systems (Abidi et al., 2018).

Among the types of preference-based queries that have been most extensively studied are skyline queries

(Borzsonyi et al., 2001), which constitute one of the most practical and predominant types of preference-based

queries (Gulzar et al., 2019). They return the most interesting objects according to the user‟s criteria based on

the Pareto dominance operator (Abidi et al., 2017). Skyline queries are typically used in multi-criteria decision-

making applications to find answers that are of interest to a user (Keles and Hose, 2019). Other applications

include, but are not limited to, decision support systems, recommendation systems, and databases. Even though

skyline queries have been extensively studied on relational data in the database community, little attention has

yet been paid to research on how the skyline principle can help identify sets of interesting entities in knowledge

graphs and, in particular, in RDF queries (Keles and Hose, 2019). The aim of this work is to provide a guide to

specifying SPARQL skyline queries both in the syntax proposed by different authors, and in SPARQL 1.0 and

1.1.

Skyline Queries in SPARQL: An Overview 135

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

The structure of this paper is as follow: section II introduces the background knowledge necessary to

understand the topics related to this work (skyline queries, RDF and SPARQL); section III presents related

works; section IV describes our approach; and finally section V concludes the paper and gives insights for future

work.

Background

In this section, we define the skyline operator, Resource Description Framework (RDF) and the

SPARQL query language before explaining the syntax for specifying skyline queries.

Skyline

The skyline operator filters a set of interesting tuples from a relational database. A tuple is interesting if it is not

dominated by any other tuple. A tuple dominates another tuple if it is as good or better in all attributes and better

in at least one attribute. Börzsönyi et al. (2001) incorporated the SKYLINE OF clause in a SQL command as

follows:

SELECT <attributes>

FROM <relations>

WHERE <conditions>

GROUP BY <attributes>

HAVING <conditions>

SKYLINE OF d1 [MIN|MAX|DIFF],..., dn [MIN|MAX|DIFF];

where d1,…, dn denote skyline dimensions or attributes.

In addition, MIN, MAX, and DIFF indicate if the dimension value is minimized, maximized, or

different respectively. Börzsönyi et al. (2001) formalized the dominance relationship and the skyline set in

Definitions 1-2.

Definition 1 (dominance): Let SKYLINE OF d1 MIN, ..., dl MIN, dl+1 MAX, ..., dm MAX, dm+1 DIFF, ..., dn

DIFF a clause of a skyline query.

A tuple t = (t1, . . . , tl, tl+1, . . . , tm, tm+1, . . . , tn) dominates a tuple u = (u1, . . . , ul, ul+1, . . . , um, um+1, . . . , un)

if and only if:

● ti ≤ ui for all i = 1, . . . , l

● ti ≥ ui for all i = (l + 1) , . . . , m

● ti = ui for all i = (m + 1) , . . . , n

If ti = ui for all i = 1, . . ., n, then t and u are incomparable and both are skyline if no DISTINCT is specified.

Definition 2 (skyline): Let T be a set of tuples t1, . . . , tp. The skyline S is the set of tuples from M, such that

there is no tuple ti that dominates any tuple in S.

Resource description framework

Resource Description Framework (RDF) is a standard model for data interchange on the Web (RDF,

2021). RDF is a graph data model that formally describes the semantics, or meaning, of information. It consists

of a labeled, directed graph of relations between resources and literal values. It is composed by triples based on

an Entity-Attribute-Value (EAV) model, in which the subject is the entity, the predicate is the attribute, and the

object is the value. Each triple has a unique identifier known as the Internationalized Resource Identifier, or IRI.

IRIs look like web page addresses. The parts of a triple, the subject, predicate, and object, represent links in a

graph. Figure 1 shows an example of an RDF graph for Twitter data. For this case, the resource

https://twitter.com/Commercial_Crew/status/1326274591564718080 is a tweet with the post “Such a privilege to

work with people I like & respect so much. I feel blessed” created on 2020-11-11 by Elon Musk. Mr Ellon Musk

is a user who joined the social network on June, 2020 and owner of the https://twitter.com/elonmusk account.

This account has 39.8 million of followers and 96 followings.

Goncalves Da Silva y Aguilera Faraco. 136

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

Figure 1. A Twitter resource description framework graph.

SPARQL

SPARQL (SPARQL Protocol and RDF Query Language), is a query language for RDF (Křemen,

2018). SPARQL is a semantic query language for databases able to retrieve and manipulate data stored in RDF.

SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF

or viewed as RDF via middleware.

The structure of a SPARQL query comprises (Feigenbaum, 2009):

● Prefix declarations, for abbreviating IRIs.

● Dataset definition, stating what RDF graph(s) are being queried.

● A result clause, identifying what information to return from the query.

● The query pattern, specifying what to query for in the underlying dataset.

● Query modifiers, slicing, ordering, and otherwise rearranging query results.

A general structure for a SPARQL query is as follow (Feigenbaum, 2009):

# prefix declarations

PREFIX foo: <http://example.com/resources/>

...

# dataset definition

FROM ...

# result clause

SELECT ...

# query pattern

WHERE {

...

}

# query modifiers

ORDER BY …

Twitter:account

Such a privilige to work with people l like & respect so much. I feel blessed.

Twitter:tweet

http://Twitter.co

m/elonmusk

Elon

Musk

39.8M

@elonmu

184 mil

6.1 mil

11/11/2020

June,200

dc:text

dc:favourites_count

http://Twitter.com/

Commercial_Crew/status/

1326274591564718080

rdf:type

dc:retweet

dc:date

Twitter:creator

Twitter:use

dc:followings

count

dc:followers_count

dc:nick

dc:join

Skyline Queries in SPARQL: An Overview 137

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

Currently, SPARQL is the standard query language for RDF data. The W3C specification of the first

version of SPARQL was SPARQL 1.0 (Prud‟hommeaux and Seaborne, 2008), which was published in January

2008. This version defines the fundamental elements of the language, mainly the notion of graph patterns. In

March 2013, SPARQL 1.1 (Harris and Seaborne, 2013) was released and its specification defines operators that

allow more complex queries such as aggregation, sub-queries and path queries.

SPARQL 1.1 extends SPARQL 1.0 with several advanced features, among the most important we can

mention: explicit operators to express the negation of graph patterns, operators to express path queries, aggregate

operators, sub-queries and federated queries. Particularly, sub-queries allow expressing queries not supported by

SPARQL 1.0. For example, a sub-query allows using the results obtained from the inner query, in particular

when aggregate operators are included. The SPARQL 1.0 specification mentions (Prud‟hommeaux and

Seaborne, 2008), Section 11.4.1) that the negation of graph patterns can be simulated through the combination of

an optional pattern and a filter condition of type !bound().

Related Work

Although Bentley et al. (1978) proposed the first skyline algorithm, referred to as the maximum vector problem,

Börzsönyi et al. (2001) defined the skyline operator in the context of databases. In this work, the authors

introduced a skyline algorithm based on the divide & conquer principle and the Block Nested Loop (BNL)

algorithm where each one of the tuples is compared with non-dominated tuples in a window. Subsequently, SFS

(Sort Filter Skyline) (Chomicki et al., 2003), LESS (Linear Elimination Sort for Skyline) (Godfrey et al., 2005),

and SaLSa (Sort and Limit Skyline algorithm) (Chomicki et al., 2003) were proposed to improve BNL by means

of a monotone preference function that reduces the number of dominance checks. Also, skyline computation

algorithms based on index structures were defined where properties of index structures to compute the skyline

set were exploited in several works (Tan et al., 2001; Kossmann et al., 2002; Papadias et al., 2005; Lee et al.,

2010; Selke and Balke, 2011; Bader, 2012; Endres and Glaser, 2019).

Since continuous growth of the Web, other distributed algorithms have been presented to efficiently

compute the skyline over Web data sources (Balke and Guntzer, 2004; Balke et al., 2004; Alvarado et al., 2013).

These algorithms are twofold, i.e., they build the skyline in two phases: first a superset is constructed, and then,

dominated points are eliminated in a second phase. Each algorithm exploits a specific stopping condition to

terminate the first phase, so as to avoid a full scan of Web data sources. Similarly, Chen et al. (2011) proposed

an algorithm to compute the skyline on RDF documents that have been represented as VTPs (Vertical Table

Partitioning).

More recently, there are some works related to extensions of SPARQL but with qualitative preferences

(Siberski et al., 2006; Troumpoukis et al., 2017; Patel-Schneider et al., 2018) which are more general than the

skyline, being the skyline a particular case of them. Siberski et al. (2006) included preference-based querying

capabilities to SPARQL incorporating the PREFERRING clause into the SPARQL syntax. SPREFQL

(Troumpoukis et al., 2017) is another extension of SPARQL for qualitative preferences. Unlike Siberski et al.

(2006), they support conditional preferences (if-then-else). At the implementation level, they presented a query

rewriting technique that maps from a SPREFQL query to an equivalent SPARQL query by means of the NOT

EXISTS operator. Unfortunately, their solution based on query rewriting does not work correctly due to the fact

that it is based on the SPARQL EXISTS, which has many known problems (Patel-Schneider and Martin, 2016).

Thus, Patel-Schneider et al. (2018) identified and fixed the problem in the previous proposals for acyclic and

transitive preference relations. Finally, Keles and Hose (2019) presented a set of client-based algorithms to

evaluate skyline queries over knowledge graphs using standard query interfaces for RDF, but they did not

consider extending SPARQL. In this work, we focus on the proposals of Siberski et al. (2006), Troumpoukis et

al. (2017) and Patel-Schneider et al. (2018) to specify SPARQL skyline queries in both their syntax, and

SPARQL 1.0 and 1.1.

Approaches

Some works that extend SPARQL with qualitative preferences are Siberski et al. (2006), Gueroussova

et al. (2013), Gueroussova et al. (2013b), Troumpoukis et al. (2017), Patel-Schneider et al. (2018). These works

are based on the winnow operator (Chomicki, 2002), which is a more general operator than skyline. In this

section, we will illustrate how these approaches can be used to express SPARQL skyline queries by using an

example based on Twitter data. Suppose a database containing data from Twitter spambots (MIB, 2016) and a

table named users storing the number of followers (followers_count) and the number of tweets each user has

Goncalves Da Silva y Aguilera Faraco. 138

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

liked in the account‟s lifetime (favourites_count), among other data. Also consider that someone wants to

identify the most followed users who have the highest number of tweets he has liked. A subset of our knowledge

base in Turtle syntax is the following:

@prefix : <http://www.example.org/>.

:LOUISHAIRY a :user; :followers_count 20004;

:favourites_count 15958 .

:CBS6Albany a :user; :followers_count 27856;

:favourites_count 291 .

:BryanBroaddus a :user; :followers_count 52287;

:favourites_count 19 .

:KingKhanBeats a :user; :followers_count 1824;

:favourites_count 36945 .

:lilyfan_ a :user; :followers_count 482;

:favourites_count 9909 .

:Adam_Loko116 a :user; :followers_count 943;

:favourites_count 9355 .

:bakkedahla :user; :followers_count 4558;

:favourites_count 1552 .

:myltuazona :user; :followers_count 498;

:favourites_count 13415 .

According to the interested person, both followers_count and favourites_count are equally important

and relevant; hence, a predefined score function cannot be assigned to be used in a query. A user can be chosen

if and only if there is no other user with a higher number of followers and a higher favourites_count. To select a

user, we must identify the set of all the users that are non-dominated by any other user in terms of two criteria:

maximizing followers_count and maximizing favourites_count; this is our skyline. Following these criteria, the

computed skyline is composed by the users :LOUISHAIRY, :CBS6Albany, :BryanBroaddus, and

:KingKhanBeats are the non-dominated ones, i.e., there is no other user with values better than them in these two

attributes. Additionally, a user  dominates a user , if  has better or equal values and at least one better in

followers_count and favourites_count than , e.g., the user :KingKhanBeats dominates the user :lilyfan_.

Next, we will describe how to specify the SPARQL query for the most followed users who have the

highest number of tweets he has liked, following the syntax for each proposal (Siberski et al., 2006;

Troumpoukis et al., 2017; Patel-Schneider et al., 2018) and then we will detail how to express in an equivalent

SPARQL query. Siberski et al. (2006) were the first to propose the addition of qualitative preference-based

querying capabilities to SPARQL by means of the PREFERRING clause, which contains criteria separated by

the AND construct. The CASCADE keyword can be used to prioritize a preference criterion over another one.

The authors did not deal with query processing/optimization issues although they extended the ARQ query

engine (The Apache Software Foundation, 2019) with BNL as a proof of concept. This implementation is not

available.

The basic SPARQL query structure provides solution modifiers such as group by, order by, limit,

offset, etc. Based on these solution modifiers, Siberski et al. (2006) extends them with a preferring clause. As

our focus is on skyline queries, a preferring clause can be expressed in BNF according to Siberski et al. (2006)

as follows in Algorithm 1.

Algorithm 1. Grammar for SPARQL skyline queries according to Siberski et al. (2006).

‹PreferringClause› ::= ‟PREFERRING‟ ‹MultiDimPref›

‹MultiDimPref› ::= ‹AtomicPref› („AND‟ ‹AtomicPref ›)*

‹AtomicPreference› ::= ‹HighestPref› | ‹LowestPref›

‹HighestPref› ::= „HIGHEST‟ ‹Expression›

‹LowestPref› ::= „LOWEST‟ ‹Expression›

Our example skyline SPARQL query can be expressed in terms of Siberski et al. (2006)‟s syntax as

shown Figure 2.

Skyline Queries in SPARQL: An Overview 139

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

Figure 2. SPARQL skyline queries. The skyline of the most followed users who have the highest number of

tweets he has liked according to Siberski et al. (2006)‟s syntax.

Based on Siberski et al. (2006)‟s work, the authors Gueroussova et al. (2013) and Gueroussova et al.

(2013b) proposed an extension of the SPARQL query language called PrefSPARQL, which includes the

expression of conditional preferences and additional atomic preference constructs such as „AROUND‟, „MORE

THAN‟, „LESS THAN‟, and „BETWEEN‟. Since preferences semantically filter the solution set, they add

preferences at the level of filters instead of solution modifiers. A preferring clause for skyline queries can be

expressed in BNF as in Algorithm 2.

Figure 3. SPARQL skyline queries. The skyline of the most followed users who have the highest number if

tweets he has liked, following the PrefSPARQL grammar.

Algorithm 2. PrefSPARQL grammar.

‹Filter› ::= „FILTER‟ ‹Constraint› |

„PREFERRING‟ „(‟ ‹MultiDimPref› „)‟

‹MultiDimPref› ::= ‹AtomicPref› („AND‟ ‹AtomicPref ›)*

‹AtomicPref› ::= ‹HighestPref› | ‹LowestPref ›

‹HighestPref› ::= „HIGHEST‟ ‹Expression›

‹LowestPref› ::= „LOWEST‟ ‹Expression›

Following the PrefSPARQL grammar, our example skyline SPARQL query is specified in Figure 3.

They also show how queries can be rewritten in SPARQL 1.1 and SPARQL 1.0 in order to perform skyline

queries using existing SPARQL query engines. P PREFERRING Pref can be expressed in SPARQL 1.1 as P

FILTER NOT EXISTS {P‟ FILTER (tr(P, P‟, Pref))} where P is a SPARQL pattern, Pref represents preference

criteria, P‟ is the same graph pattern than P but with all variables renamed as fresh variables, and tr is a

translation function that translates the dominance check condition according to Definition 1. Similar to nested

SQL query proposed by Börzsönyi et al. (2001), the condition within FILTER identifies the dominated ones and

FILTER NOT EXIST discards them from the answer. Figure 4 illustrate our example skyline SPARQL query

translated to SPARQL 1.1. In this example, P is “?u a :user ;:followers_count

?followers_count;:favourites_count ?favourites_count” (lines 2-4); P‟ is ?u_ a :user;:followers_count

?followers_count_;:favourites_count ?favourites_count_” (lines 6-8); and tr(P, P‟, Pref) is “?followers_count_

>= ?followers_count && ?favourites_count_ >= ?favourites_count && (?followers_count_ > ?followers_count

||?favourites_count_ > ?favourites_count)” (lines 9-12).

Goncalves Da Silva y Aguilera Faraco. 140

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

Figure 4. SPARQL 1.1. skyline queries. The skyline of the most followed users who have the highest number of

tweets he has liked.

For P‟, the character "_" was added to each variable name of P. Lines 9-12 specify the dominance

check. Lines 5-12 filters a set of dominated instances. An instance dominates another instance if it is as good or

better in all attributes and better in at least one attribute (lines 9-12). In addition, to translate skyline queries in

SPARQL 1.0, we can replace NOT EXISTS by a combination of OPTIONAL and FILTER(!bound). P

PREFERRING Pref can be expressed in SPARQL 1.0 as: P OPTIONAL {P‟ FILTER (tr(P, P‟, Pref)) [ ] ?check

[ ]} FILTER (!bound(?check)) where {[ ] ?check [ ]} is an auxiliary triple pattern that represents any predicate in

P‟ and ?check is a fresh variable that is used to bind and thus, to verify for instance, the non-existence of

instances better than it. As with SPARQL 1.1, P and P‟ represent SPARQL patterns, Pref the preference criteria,

and tr is the translation function. Figure 5 illustrates our example skyline SPARQL query translated to SPARQL

1.0. FILTER within the OPTIONAL clause allows performing pairwise dominance checks for each pair of

instances (lines 11-15) and the FILTER in line 16 verifies the instance is not dominated. If ?u_ is bound, this

means that it is dominated because lines 11-15 found a better instance than ?u). Similar to nested SQL queries

proposed by Börzsönyi et al. (2001), the condition within FILTER identifies the dominated ones and FILTER

NOT EXIST discards them from the answer.

In this example, P is “?u a :user ;:followers_count ?followers_count;:favourites_count ?favourites_count” (lines

2-4); P‟ is ?u_ a :user;:followers_count ?followers_count_;:favourites_count ?favourites_count_” (lines 8-10);

and tr(P, P‟, Pref) is “?followers_count_ >= ?followers_count && ?favourites_count_ >= ?favourites_count &&

(?followers_count_ > ?followers_count ||?favourites_count_ > ?favourites_count)” (lines 11-15).

Figure 5. SPARQL 1.0 skyline queries. The skyline of the most followed users who have the highest number of

tweets he has liked.

Skyline Queries in SPARQL: An Overview 141

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

Subsequently, Troumpoukis et al. (2017) proposed SPREFQL as another extension of SPARQL for

qualitative preferences. Their work comes nearer to Chomicki (2002)‟s framework than Siberski et al. (2006)

because it allows the expression of extrinsic preferences whose formulas may refer both to built-in predicates

(e.g., equality, inequality, and arithmetic comparison operations) on the basis of tuples and to other constructors

such as database relations. Although any query in Siberski et al. (2006) and Gueroussova et al. (2013) can be

expressed in SPREFQL, reverse translation is not always possible. The authors introduced in Troumpoukis et al.

(2017) a couple of cases where a query expressed in SPREFQL cannot be specified in Siberski et al. (2006) and

Gueroussova et al. (2013). At the implementation level, they presented a query rewriter that maps from a

SPREFQL query to an equivalent SPARQL query by means of the NOT EXISTS operator. Also, they

experimentally study the performance of NL (Nested Loops), BNL and query rewriting; NL is a naive algorithm

that compares each input tuple against all input tuples and whose computational complexity is quadratic. NL has

the worst performance while BNL outperforms query rewriting in 6 out of 7 queries. They implemented an open-

source prototype of SPREFQL (Bitbucket, 2021) which is available.

The PREFER clause is after the group-by/having clauses and before the limit/offset clauses. A PREFER

clause for skyline queries can be expressed in EBNF as in Algorithm 3. All non-terminals that are not defined in

this table are defined by standard SPARQL syntax.

Algorithm 3. Prefer grammar.

‹SolutionModifier› ::= [‹GroupClause›] [‹HavingClause›] [‹PreferClause›] [‹OrderClause›]

[‹LimitOfsetClauses›]

‹PreferClause› ::= „PREFER‟ ‹VarList› „TO‟ ‹VarList› „IF‟

‹ParetoPref›

‹VarList› ::= ‹Var› | „(‟ ‹Var› + „)‟

‹ParetoPref› ::= ‹SimplePref› [ „AND‟ ‹ParetoPref› ]

‹SimplePref› ::= ‹Constraint›

Expressing a skyline query in SPREFQL is quite similar to specifying it with the condition of

Gueroussova et al. (2013) and Gueroussova et al. (2013b) proposal to rewrite a preference-based query in

SPARQL. The condition for pair-wise dominance checks within the FILTER NOT EXISTS or OPTIONAL

FILTER(!BOUND) is the same as that expressed in the condition of the IF.

Variable names are assigned to two binding sets that can be distinguished from each other through the

PREFER clause. The first binding set refers to the preferred ones while the second is the dominated ones. Then,

the "IF" clause expresses the conditions that make the first binding set dominate the second one. Each variable

name in the PREFER clause maps to variables in order of appearance. For example, there are four bindings in

each result, (?u ?followers_count ?favourites_count), in the query of Figure 6. Variables in (?u1

?followers_count1?favourites_count1) are assigned to the first binding set while (?u2 ?followers_count2

?favourites_count2) includes variables for the second binding set. All these variables are used in the IF clause to

check the dominance of the first binding set over the second.

Figure 6. SPARQL skyline queries. The skyline of the most followed users who have the highest number of

tweets he has liked, following SPREFQL grammar.

Similar to Gueroussova et al. (2013), Troumpoukis et al. (2017) proposed the translation from a

SPREFQL query to SPARQL 1.1. A query SELECT L WHERE {P} PREFER L1 TO L2 IF C can be expressed

SPARQL 1.1. as SELECT L WHERE {P FILTER NOT EXISTS {P{L/L2} FILTER C{L2/L} where P{L/L1} is

equal to P but replacing all variable names of P that appear in L with its corresponding variable in L1, and

Goncalves Da Silva y Aguilera Faraco. 142

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

C{L2/L} is equal to C but replacing all variable names of L2 with its corresponding variable in L. For our

motivational example, the query is translated as in Figure 4.

Conclusion

In this article, we have described the syntax for specifying SPARQL skyline queries following the grammar

proposed by authors of state-of-art works. Each author proposes a different grammar and implements his own

tool to evaluate this type of query. Despite the fact that some proposals have been made in recent years, there is

no standard language for expressing skyline queries in SPARQL. Therefore, if a user wants to evaluate a

SPARQL skyline query, he must select the grammar and the tool to execute it. An alternative is to rewrite the

query in SPARQL in version 1.0 or 1.1 and have it executed by any SPARQL engine, giving the user a range of

options among the tools, from which to choose. This article summarises a guide to specifying SPARQL skyline

queries to express preferences with different alternatives at the user‟s convenience. Finally, we plan to develop a

tool to translate SPARQL skyline queries using the different grammars proposed, into SPARQL 1.0 and 1.1 with

the aim of providing an automatic mechanism of translation.

References

Abidi, A., Elmi, S., Tobji, M. A. B., HadjAli, A., Yaghlane, B. B. (2018). Skyline queries over possibilistic

RDF data. International Journal of Approximate Reasoning, 93, 277-289.

Abidi, A., Tobji, M. A. B., Hadjali, A., Yaghlane, B. B. (2017). Skyline modeling and computing over trust

RDF data. Proceedings of the 19th international conference on enterprise information systems (ICEIS 2017).

Setúbal: Science and Technology Publications, 634-643.

Alvarado, A., Baldizán, O., Vidal, M., Goncalves, M. (2013). FOPA: a final object pruning algorithm to

efficiently produce skyline points. Database and Expert Systems Applications. DEXA 2013. Berlin: Springer,

334-348.

Bader, M. (2012). Space-filling curves: an introduction with applications in scientific computing. Suisse:

Springer Publishing Company, Inc.

Balke W., Guntzer, U. (2004). Multi-objective query processing for database systems. Proceedings of the 30th

international conference on very large data bases. New York: ACM Digital Library, 936-947.

Balke, W. T., Guntzer, U., Zheng, J. X. (2004). Efficient distributed skylining for Web information systems.

Advances in Database Technology - EDBT 2004. Berlin: Springer, 256-273.

Bartolini, I., Ciaccia, P., Patella, M. (2008). Efficient sort-based skyline evaluation. ACM Transactions on

Database Systems, 33, 1-49.

Bentley, J., Kung, H., Schkolnick, M., Thompson, C. (1978). On the average number of maxima in a set of

vectors and applications. Journal of the ACM, 25, 536-543.

Bitbucket. (2021). SPREFQL dataengineering/sprefql – Bitbucket [online] available in:

https://bitbucket.org/dataengineering/sprefql/src/master/ [accessed: 1 March 2021].

Borzsonyi, S., Kossmann, D., Stocker, K. (2001). The skyline operator. Proceedings of the 17th international

conference on data engineering. Heidelberg: IEEE Computer Society, 421-430.

Chen, L., Gao, S., Anyanwu, K. (2011). Efficiently evaluating skyline queries on RDF databases. The Semanic

Web: Research and Applications. ESWC 2011. Berlin: Springer, 123-138.

Chomicki, J. (2002). Querying with intrinsic preferences. In: Advances in Database Technology — EDBT 2002.

Eds. Jensen, C. S., Šaltenis, S., Jeffery, K. G., Pokorny, J., Bertino, E., Böhn, K., Jarke, M. Berlin: Springer, 34-

51.

Chomicki, J., Godfrey, P., Gryz, J., Liang, D. (2003). Skyline with presorting. Proceedings of the 19th

international conference on data engineering (ICDE 2003). Bangalore: IEEE Computer Society, 717-719.

Elzein, N. M., Majid, M. A., Hashem, I. A. T., Yaqoob, I., Alaba, F. A., Imran, M. (2018). Managing big RDF

data in clouds: challenges, opportunities, and solutions. Sustainable Cities and Society, 39, 375-386.

Endres, M., Glaser, E. (2019). Indexing for skyline computation. In: Flexible Query Answering Systems. Eds.

Cuzzocrea, A., Greco, S., Larsen, H. L., Saccà, D., Andreasen, T., Christiansen, H. Suisse: Springer

International Publishing, 31-42.

Skyline Queries in SPARQL: An Overview 143

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

Feigenbaum, L. (2009). SPARQL by example [online] available in: https://www.w3.org/2009/Talks/0615-qbe/

[accessed: 12 October 2020].

Feyznia, A., Kahani, M., Zarrinkalam, F. (2014). COLINA: a method for ranking SPARQL query results

through content and link analysis. Proceedings of the 13th international semantic Web conference (ISWC 2014).

New York: ACM Digital Library, 273-276.

Godfrey, P., Shipley, R., Gryz, J. (2005). Maximal vector computation in large data sets. Proceedings of the

31st international conference on very large data bases. New York: ACM Digital Library, 229-240.

Gueroussova, M. Polleres, A., McIlraith, S. (2013). SPARQL with qualitative and quantitative preferences.

Proceedings of the 2nd international conference on ordering and reasoning. New York: ACM Digital Library, 2-

Gueroussova, M., Polleres, A., McIlraith, S. (2013b). SPARQL with qualitative and quantitative preferences

(extended report). Tech. Rep. CSRG-619. Toronto: University of Toronto.

Gulzar, Y., Alwan, A. A., Abdullah, R. M., Xin, Q., Swidan, M. B. (2019). SCSA: evaluating skyline queries in

incomplete data. Applied Intelligence, 49, 1636-1657.

Harris, S., Seaborne, A. (2013). SPARQL 1.1 query language. W3C Recommendation [online] available in:

http://www.w3.org/TR/2013/REC-sparql11-query20130321/ [accessed: 1 March 2021].

Keles, I., Hose, K. (2019). Skyline queries over knowledge graphs. Proceedings of the 18th international

semantic Web conference. Berlin: Springer, 293-310.

Keles, I., Hose, K. (2019). Skyline queries over knowledge graphs. In: The Semantic Web – ISWC 2019. Eds.

Ghidini, C., Hartig, O., Maleshkova, M., Svátek, V., Cruz, I., Hogan, A., Song, J., Lefrançois, M., Gandon, F.

Berlin: Springer International Publishing, 293-310.

Kossmann, D., Ramsak, F., Rost, S. (2002). Shooting stars in the sky: an online algorithm for skyline queries.

Proceedings of the 28th international conference on very large data bases. New York: ACM Digital Library,

275-286.

Kostylev, E. V., Reutter, J. L., Ugarte, M. (2015). Expressiveness of construct queries in SPARQL. 18th

international conference on database theory (ICDT‟15). Eds. Arenas, M., Ugarte, M. Brussels: Dagstuhl

Publishing, 1-25.

Křemen, P. (2018). SPARQL query language for RDF [online] available in:

https://cw.fel.cvut.cz/b181/_media/courses/osw/lecture-03sparql-s.pdf [accessed: 1 March 2021].

Lee, K., Lee, W. C., Zheng, B., Li, H., Tian, Y. (2010). Z-sky: an efficient skyline query processing framework

based on z-order. The VLDB Journal, 19, 333-362.

MIB. (2016). My information bubble project [online] available in: http://mib.projects.iit.cnr.it/ [accessed: 1

March 2021].

Ontotext. (2020). What is RDF and why to use it? Ontotext Fundamentals Series [online] available in:

https://www.ontotext.com/knowledgehub/fundamentals/what-is-rdf/ [accessed: 8 December 2020].

Papadias, D., Tao, Y., Fu, G., Seeger, B. (2005). Progressive skyline computation in database systems. ACM

Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS, 30, 41-82.

Patel-Schneider, P. F., Martin, D. (2016). EXISTStential aspects of SPARQL. Proceedings of 15th international

semantic Web conference. Kobe: Computer Science Bibliography, 1-4.

Patel-Schneider, P. F., Polleres, A., Martin, D. (2018). Comparative preferences in SPARQL. In: Knowledge

Engineering and Knowledge Management. Eds. Zucker, C. F., Ghidini, C., Napoli, A., Toussaint, Y. Berlin:

Springer International Publishing, 289-305.

Prud‟hommeaux, E., Seaborne, A. (2008). SPARQL query language for RDF. W3C Recommendation [online]

available in: https://www.w3.org/TR/rdf-sparql-query/ [accessed: 2 December 2020].

RDF. (2021). RDF - semantic Web standards [online] available in: https://www.w3.org/ [accessed: 1 March

2021].

Selke, J., Balke, W. T. (2011). Skymap: a trie-based index structure for high-performance skyline query

processing. Database and Expert Systems Applications. DEXA 2011. Berlin: Springer, 350-365.

Sessoms, M., Anyanwu, K. (2014). Enabling a package query paradigm on the semantic Web: model and

algorithms, Transactions on Large-Scale Data -and Knowledge- Centered Systems XIII. Berlin: Springer, 1-32.

Goncalves Da Silva y Aguilera Faraco. 144

Rev. Téc. Ing. Univ. Zulia. Vol. 45, No. 2, Mayo - Agosto, 2022.

Siberski, W., Pan, J. Z., Thaden, U. (2006). Querying the semantic web with preferences. In: The Semantic Web

- ISWC 2006. Eds. Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.

M. Berlin: Springer, 612-624.

Tan, K., Eng, P., Ooi, B. (2001). Efficient progressive skyline computation. Proceedings of the 27th international

conference on very large data bases. San Francisco: Morgan Kaufmann Publishers Inc., 301-310.

The Apache Software Foundation (2019). Arq – A SPARQL processor for jena [online] available in:

http://jena.apache.org/documentation/ query/index.html [accessed: 8 November 2019].

Troumpoukis, A., Konstantopoulos, S., Charalambidis, A. (2017). An extension of SPARQL for expressing

qualitative preferences. In: The Semantic Web – ISWC 2017. Eds. d‟Amato, C., Fernandez, M., Tamma, V.,

Lecue, F., Cudré-Mauroux, P., Sequeda, J., Lange, C., Heflin, J. Berlin: Springer International Publishing, 711-

727.

Zou, L., Özsu, M. T. (2017). Graph-based RDF data management. Data Science Engineering, 2, 56-70.

REVISTA TECNICA

DE LA FACULTAD DE INGENIERIA

UNIVERSIDAD DEL ZULIA

www.luz.edu.ve

www.serbi.luz.edu.ve

www.produccioncienticaluz.org

Esta revista fue editada en formato digital y publicada

en abril y mayo 2021, por el Fondo Editorial Serbiluz,

Universidad del Zulia. Maracaibo-Venezuela

Vol. 45. N°2, Mayo - Agosto, 2022_________________________