fuzzy models for big data mining

eCampus University, Italy

Data mining allows extracting useful knowledge from data. In the last decades, data mining has been considerably investigated and a huge number of different techniques have been proposed for generating, for instance, descriptive models in clustering and frequent pattern analysis, and predictive models in classification and regression tasks. We are currently experiencing the Big Data Era and classical data mining algorithms appear to be inadequate to manage Big Data. Indeed, Big Data are characterized by the four V's, namely volume, variety, velocity and veracity: large volumes of data, which are often produced at very high speed and need to be elaborated in almost real time (velocity), are generated by different sources and may have different formats (variety) and trustworthiness (veracity). These data represent a very important source of added-values in several contexts, such as in marketing strategies, industrial applications, and Internet of Things.

Due to the high relevance of Big Data, in the last years several researchers have introduced data mining approaches purposely designed and implemented for Big Data. Most of these approaches have employed specific distributed frameworks, such as Apache Hadoop and Apache Spark which have been recently proposed with the aim of dealing with data storage and elaboration of Big Data.

FMs are particularly suitable for handling the variety and veracity of Big Data. This is mainly due to their good capability of coping with vague, imprecise and uncertain concepts. Moreover, the use of overlapped fuzzy labels ensures a good coverage of the problem space. Finally, the interpretability of a FM, namely the capability of explaining the model itself and how it works, is an important feature that may be required also when dealing with Big Data.

In this lecture, we will discuss some recent algorithms, which employ Fuzzy Models (FMs) for handling Big Data. We will mainly focus on descriptive models, namely algorithms for approaching classification tasks.

Syllabus:

  • Introduction to Big Data and Big Data mining
  • A snapshot on paradigms and frameworks for distributed data storage and elaboration
  • The first approach for generating fuzzy rules for Big Data classification: the distributed Chi et al. Algorithm
  • Fuzzy Associative Classifiers for big data
  • Distributed Fuzzy Decision Tress
  • Distributed Multi-objective Evolutionary Fuzzy Systems
  • Final Remarks