thesis

Partitionnement de données pour l'informatique climatique : Contributions à l'amélioration des méthodes d'identification automatique des régimes de temps en climat tropical insulaire.

Defense date:

Sept. 25, 2020

Edit

Institution:

Antilles

Disciplines:

Abstract EN:

This manuscript reports on new work in the field of climate informatics, which has led to a set of contributions to methods for the automatic identification of weather patterns in the Caribbean region. Starting from computational methods widely presented in the bibliography consulted, we have previously targeted those of unsupervised learning, and more particularly the K-Means clustering (KMS) and Hierarchical Ascending Classification (HAC) methods. Direct applications of these methods to the problems of the vector currents of Sargasso algae banks, and then to the partitioning of Geopotential data have been carried out. These methods made it possible to identify groups of days (or clusters) with similar characteristics. The barycentres (or centroids) of the groups thus obtained were analysed by climate experts. However, this approach does not systematically produce consistent results since these barycentres do not always represent the physical reality of the structures selected. Subsequently, we concentrated our efforts on researching and identifying weather patterns characteristic of the Caribbean zone. These regimes are generally described as recurrent spatio-temporal configurations, on a large scale, which influence local weather situations. Research in this area for the Caribbean region is still in its infancy. For the work already published, several points seem problematic. Three of them have attracted our attention. Firstly, the lack of quantification of the quality of the clusters, makes a large amount of physical justification necessary, to validate the relevance of the proposed regimes. It also complicates the comparison between the different existing works. Then, among the arguments presented, some show that the proposals made are not fully satisfactory. Finally, according to the experts, the temporal coherence of the clusters of certain studies does not seem to correspond to the seasonality of the region.In order to overcome these difficulties, as a first step, we propose the use of the Silhouette index. The evaluation of the relevance of the selected clusters, but also the comparison of the different methods used, were carried out using this index. After verification, there is a concordance between the analysis produced by the index and that of the climate experts. Nevertheless, in some cases, the index also indicates that the clusters constituted can be improved. Looking more specifically at the partitioning algorithms, and in particular at the notion of distance they use, it appears that these difficulties are mainly related to the complexity of the data, but also to the similarity measures that make it possible to compare them. After a critique of the properties of the distance L2, used by default, we propose the implementation of a new dissimilarity measure, named Expert Deviation (ED). It is based on a spatial breakdown, a quantification in histograms, and a zonal treatment with the Kulback-Leibler (KL) divergence. We show that the ED leads to much better results, both in numerical evaluations of cluster quality by the silhouette index and in interpretations by experts in the field.This new measure is adaptive in its design and use. We present its principle and move on to an application in the field of atmospheric physics, using data such as precipitation measured by satellite. Rainfall in the Lesser Antilles is known to be highly variable in space and time and directly influences the climate at these latitudes. Using ED, we were able to identify more coherent and physically interpretable recurrent patterns for this parameter and for wind. These results have increased the knowledge of climate experts on the atmospheric structures related to inter-seasonal weather patterns and their dynamics. All this work and the use of the "ED measure" open up a large number of perspectives for the search for recurrent spatio-temporal configurations, but also in all fields of applications using images.

Abstract FR:

Ce manuscrit fait état de nouveaux travaux dans le domaine de l’informatique climatique, qui ont conduit à un ensemble de contributions aux méthodes d’identification automatique des régimes de temps en région Caraïbe. À partir de méthodes informatiques largement présentées dans la bibliographie consultée, nous avons au préalable ciblé celles de l’apprentissage non supervisé, et plus particulièrement les méthodes de clustering K-Means (KMS) et Classification Ascendante Hiérarchique (CAH). Des applications directes de ces méthodes aux problématiques des courants vecteurs de banc d’algues sargasses, puis au partitionnement de données de Géopotentiel ont été réalisées. Ces méthodes ont permis d’identifier des groupes de jours (ou clusters) ayant des caractéristiques similaires. Les barycentres (ou centroïdes) des groupes ainsi obtenus ont été analysés par les experts du climat. Cependant, cette approche ne produit pas systématiquement des résultats cohérents puisque ces barycentres ne représentent pas toujours la réalité physique des structures retenues. Cette nouvelle mesure est adaptative dans sa conception et son utilisation. Nous présentons son principe et passons à une application dans le domaine de la physique de l’atmosphère, en utilisant des données telles que les précipitations mesurées par satellite. Aux petites Antilles, les pluies sont connues pour leur forte variabilité spatio-temporelle et elles influencent directement le climat à ces latitudes. À l’aide d’ED, nous avons ainsi pu identifier des configurations récurrentes plus cohérentes et physiquement interprétables pour ce paramètre et pour le vent. Ces résultats ont permis d’accroître les connaissances des experts du climat sur les structures atmosphériques liées aux régimes de temps d’inter-saison et leur dynamique. L’ensemble de ces travaux et l’utilisation de la « mesure ED » ouvrent un grand nombre de perspectives pour la recherche de configurations spatio-temporelles récurrentes, mais également dans tous les domaines d’applications utilisant des images.