thesis

Infrastructure and algorithms for information retrieval based on social network analysis mining

Defense date:

Jan. 1, 2013

Edit

Disciplines:

Abstract EN:

Nowadays, the Web has evolved from a static Web where users were only able to consume information, to a Web where users are also able to produce information. This evolution is commonly known as Social Web or Web 2. 0. Social platforms and networks are certainly the most adopted technologies in this new era. These platforms are commonly used as a means to interact with peers, exchange messages, share resources, etc. Thus, these collaborative tasks that make users more active in generating content are one of the most important factors for the increasingly growing quantity of available data. From the research perspective, this brings important and interesting challenges for many research fields. In such a context, a mostly crucial problem is to enable users to find relevant information with respect to their interests and needs. This task is commonly referred to as Information Retrieval (IR). IR is performed every day in an obvious way over the Web, typically under a search engine. However, classic models of IR don’t consider the social dimension of the Web. They model web pages as a mixture of a static homogeneous terms generated by the same creators. Then, ranking algorithms are often based on : (i) a query and document text similarity and (ii) the existing hypertext links that connect these web pages, e. G. PageRank. Therefore, classic models of IR and even the IR paradigm should be adapted to the socialization of the Web, in order to fully leverage the social context that surround web pages and users. This thesis presents many approaches that go in this direction. In particular, three methods are introduced in this thesis : (i) a Personalized Social Query Expansion (PSQE) framework, which achieves social and personalized expansions of a query with respect to each user, i. E. For the same query, different users will obtain different expanded queries. (ii) a Personalized Social Document Representation (PSDR) framework that uses social information to enhance, improve and provide a personalized social representation of documents to each user. (iii) a Social Personalized Ranking function called SoPRa, which takes into account social features that are related to users and documents. All these approaches have the particularity of being scalable to large-scale datasets, flexible and adaptable according to the high dynamicity of social data, and efficient since they have been intensively evaluated and compared to the closest works. From a practical point of view, this thesis led to the development of an experimental social Web search engine called LAICOS that includes all the algorithms developed throughout this thesis.

Abstract FR:

Le Web 2. 0 a introduit de nouvelles libertés à l’utilisateur dans sa relation avec le Web, en lui permettant d’interagir avec d’autres utilisateurs qui ont les mêmes centres d’intérêts. Les plateformes et les réseaux sociaux sont certainement les technologies les plus adoptées dans ce nouveau contexte. Ces plateformes permettent aux utilisateurs d’interagir, d’échanger des messages, de partager des ressources, etc. Ainsi, ces taches collaboratives permettant à l’utilisateur d’être plus actif dans la génération du contenu sont l’un des facteurs les plus importants dans l’accroissement constant des données. Dans un tel contexte, l’un des problèmes les plus cruciaux est de permettre aux utilisateurs de trouver de l’information pertinente par rapport à leurs besoins. Cette tache est communément appelée Recherche d’Information (RI). Cependant, les modèles classiques de RI ne prennent pas en considération cette dimension sociale du Web. Par conséquent, ces modèles classiques de RI et même le paradigme de RI doivent être adaptés a cette socialisation du Web, afin de tirer pleinement profit du contexte social qui entoure les pages web et les utilisateurs. Cette thèse illustre plusieurs approches qui vont dans ce sens. En particulier, trois méthodes sont introduites pour : (i) l’expansion de requêtes, (ii) la modélisation de documents, et (iii) le ranking des résultats. Toutes les approches présentées sont basées sur des annotations sociales comme source d’information sociale, qui sont extraites des folksonomies.