By Ioannis T. Christou

Head, Big Data Mining

Athens Information Technology

ichr@ait.edu.gr

Taking as starting point the successful outcome of our joint proposal FIREMAN submitted to the “Big Data for Industry” CHIST-ERA call, I would like to say a few words about some nearly forgotten Machine Learning and AI methods that seem to work particularly well in certain contexts of the Industry 4.0 initiative.

This year’s (2018) ACM Turing award was justly awarded to key persons involved in the development of theory and practice of Deep Learning (DL). There already exist a number of mature computational frameworks for DL, including TensorFlow (Google) and derivatives such as TensorFrames, plaidml (intel), theano (no longer active), keras (Google), caffe 1 or 2 and so on. Many of them allow computations to be executed in parallel on clusters of multi-core or many-core machines, and are also capable of executing computations on computing grids found in high-end graphics cards (GPUs). Given such levels of sophistication, and funding being poured into this particular area of research & development, one could be tempted to think that DL might become the panacea, the cure-all for all of the Machine Learning problems we are facing today… unfortunately, the history of Computer Science (and most of technology) is full of unfulfilled expectations, so I don’t have such high expectations from this or any other approach, and I always remember that there is “No Free Lunch” . So, if, theoretically speaking, no algorithm for classification/prediction can be expected to outperform any other (including random guessing) when performance is averaged across all possible datasets, what can be done in a particular context? As it turns out, plenty. Remember, that DL has thrived in image processing applications such as face recognition. This is a direct result of the power of new optimization algorithms specifically tailored to train ANNs with many layers as well as the power of advanced parallel/distributed computing algorithms and frameworks that make the huge effort required possible given enough cores and/or GPUs in a cluster. It is also a result of the power of having collected huge datasets to serve the training needs of such networks (YOLO https://pjreddie.com/darknet/yolo/ is a testament to this power.)

In a similar manner, the old area of rule-extraction from data, seems to be well-suited to certain problems arising in an Industry 4.0 setting. Within this area, the problems of Predictive Maintenance, and the related problem of Rare Events Detection seem to be candidates that nicely fit the approach. Let’s focus on an almost standard problem of Predictive Maintenance (PdM for short): the problem of predicting Remaining Useful Life (RUL) of a machine, tool or component, given a labeled set of sensor readings and other possible data such as characteristics of the raw materials processed by the machine etc. Preliminary results of experiments with a diverse set of algorithms implementing widely varying approaches to classification/regression indicate that Rule-based approaches to RUL estimation is one of the best to try. Data for these experiments were recently made available to us by the Jaguar-Land Rover (UK) and PHILIPS (The Netherlands) companies within the context of the EU-funded project PROPHESY (https://prophesy.eu) that AIT participates in (a project that owes a lot to the efforts of J. Soldatos.)

Here are some of the reasons for this fit: Rule-extraction systems work by determining rules that explain the minority classes (from most minor class to least minor class) first, leaving the majority class to become the default, if all rules fail. This simple strategy (found in all systems ranging from Quinlan’s FOIL system to REP, to IREP, and then to RIPPER and SLIPPER systems) can deal with large imbalances in class distributions in a natural way. And class skewness is one of the most frequent issues in PdM applications where for the most time, a system behaves normally, up to the time where an initially developed but ignored fault has expanded to such a degree that quickly leads the system state to essentially unusable, resulting in zero RUL.

Similarly, Quantitative Association Rule Mining algorithms for multi-dimensional datasets are a natural fit for PdM problems: as the minimum required support of the rules to be found is a user-defined quantity, such algorithms can search in parallel the vast space of sensor reading configurations to find conditions represented as hyper-cubes in high dimensional spaces that imply with high confidence the value of the RUL of a machine, even when the total number of data-points within the hyper-cubes found are so few when compared to the entire number of points in the dataset that they are like a needle in the “data-stack”. The QARMA system (https://arxiv.org/abs/1804.06764) that has been in research & development for the past 5 years, produces RUL estimates on a real dataset from one of the PROPHESY partners that are significantly better than the estimates produced by standard ANNs, Support Vector Regression Machines, or any other standard method for regression. In essence, QARMA is a parallel/distributed Breadth-First Search algorithm with appropriate short-cuts to speed-up execution as much as possible, that is capable of finding all interesting quantitative rules in any multi-dimensional dataset, guaranteeing maximal coverage of the dataset under the user-provided thresholds. QARMA has already been successfully tested in the domain of movie recommendation systems and Responsible Gaming, and is capable of producing rules that specify pre-conditions whose appearance implies with high confidence the existence of later symptoms of pathological gambling behavior of online gamblers. In the context of the FIREMAN project (to start in May 2019), we will apply online (or incremental) pruning techniques to the QARMA algorithm to further improve the quality of the results produced and the running times of the system, when applied to the context of PdM and Rare Event Detection for PdM applications.