DOI number:10.11999/JEIT240648
Affiliation of Author(s):Hefei University of Technology
Journal:Journal of Electronics & Information Technology
Place of Publication:Beijng,China
Key Words:Rotating machinery
Fault diagnosis
Multimodal fusion
Deep clustering
Abstract:Abstract:
Objective Rotating machinery is essential across various industrial sectors, including energy, aerospace, and
manufacturing. However, these machines operate under complex and variable conditions, making timely and
accurate fault detection a significant challenge. Traditional diagnostic methods, which use a single sensor and
modality, often miss critical features, particularly subtle fault signatures. This can result in reduced reliability,
increased downtime, and higher maintenance costs. To address these issues, this study proposes a novel modal
fusion deep clustering approach for multi-sensor fault diagnosis in rotating machinery. The main objectives are
to: (1) improve feature extraction through time-frequency transformations that reveal important temporalspectral patterns, (2) implement an attention-based modality fusion strategy that integrates complementary
information from various sensors, and (3) use a deep clustering framework to identify fault types without
needing labeled training data.
Methods The proposed approach utilizes a multi-stage pipeline for thorough feature extraction and analysis.
First, raw multi-sensor signals, such as vibration data collected under different load and speed conditions, are
preprocessed and transformed with the Short-Time Fourier Transform (STFT). This converts time-domain
signals into time-frequency representations, highlighting distinct frequency components related to various fault
conditions. Next, Gated Recurrent Units (GRUs) model temporal dependencies and capture long-range
correlations, while Convolutional AutoEncoders (CAEs) learn hierarchical spatial features from the transformed
data. By combining GRUs and CAEs, the framework encodes both temporal and structural patterns, creating
richer and more robust representations than traditional methods that rely solely on either technique or
handcrafted features. A key innovation is the modality fusion attention mechanism. In multi-sensor
environments, individual sensors typically capture complementary aspects of system behavior. Simply
concatenating their outputs can lead to suboptimal results due to noise and irrelevant information. The
proposed attention-based fusion calculates modality-specific affinity matrices to assess the relationship and
importance of each sensor modality. With learnable attention weights, the framework prioritizes the most
informative modalities while diminishing the impact of less relevant ones. This ensures the fused representation
captures complementary information, resulting in improved discriminative power. Finally, an unsupervised
clustering module is integrated into the deep learning pipeline. Rather than depending on labeled data, themodel assigns samples to clusters by refining cluster assignments iteratively using a Kullback-Leibler (KL)
divergence-based objective. Initially, a soft cluster distribution is created from the learned features. A target
distribution is then computed to sharpen and define cluster boundaries. By continuously minimizing the KL
divergence between these distributions, the model self-optimizes over time, producing well-separated clusters
corresponding to distinct fault types without supervision.
Results and Discussions The proposed approach’s effectiveness is illustrated using multi-sensor bearing and
gearbox datasets. Compared to conventional unsupervised methods—like traditional clustering algorithms or
single-domain feature extraction techniques—this framework significantly enhances clustering accuracy and
fault recognition rates. Experimental results show recognition accuracies of approximately 99.16% on gearbox
data and 98.63% on bearing data, representing a notable advancement over existing state-of-the-art techniques.
These impressive results stem from the synergistic effects of advanced feature extraction, modality fusion, and
iterative clustering refinement. By extracting time-frequency features through STFT, the method captures a
richer representation than relying solely on raw time-domain signals. The use of GRUs incorporates temporal
information, enabling the capture of dynamic signal changes that may indicate evolving fault patterns.
Additionally, CAEs reveal meaningful spatial structures from time-frequency data, resulting in low-dimensional
yet highly informative embeddings. The modality fusion attention mechanism further enhances these benefits
by emphasizing relevant modalities, such as vibration data from various sensor placements or distinct physical
principles, thus leveraging their complementary strengths. Through the iterative minimization of KL
divergence, the clustering process becomes more discriminative. Initially broad and overlapping cluster
boundaries are progressively refined, allowing the model to converge toward stable and well-defined fault
groupings. This unsupervised approach is particularly valuable in practical scenarios, where obtaining labeled
data is costly and time-consuming. The model’s ability to learn directly from unlabeled signals enables
continuous monitoring and adaptation, facilitating timely interventions and reducing the risk of unexpected
machine failures. The discussion emphasizes the adaptability of the proposed method. Industrial systems
continuously evolve, and fault patterns can change over time due to aging, maintenance, or shifts in operational
conditions. The unsupervised method can be periodically retrained or updated with new unlabeled data. This
allows it to monitor changes in machinery health and quickly detect new fault conditions without the need for
manual annotation. Additionally, the attention-based modality fusion is flexible enough to support the inclusion
of new sensor types or measurement channels, potentially enhancing diagnostic performance as richer data
sources become available.
Conclusions This study presents a modal fusion deep clustering framework designed for the multi-sensor fault
diagnosis of rotating machinery. By combining time-frequency transformations with GRU- and CAE-based deep
feature encoders, attention-driven modality fusion, and KL divergence-based unsupervised clustering, this
approach outperforms traditional methods in accuracy, robustness, and scalability. Key contributions include a
comprehensive multi-domain feature extraction pipeline, an adaptive modality fusion strategy for heterogeneous
sensor data integration, and a refined deep clustering mechanism that achieves high diagnostic accuracy
without relying on labeled training samples. Looking ahead, there are several promising directions. Adding more
modalities—like acoustic emissions, temperature signals, or electrical measurements—could lead to richer
feature sets. Exploring semi-supervised or few-shot extensions may further enhance performance by utilizing
minimal labeled guidance when available. Implementing the proposed model in an industrial setting, potentially
for real-time use, would also validate its practical benefits for maintenance decision-making, helping to reduce
operational costs and extend equipment life.
Co-author:许仁礼,方刚
First Author:伍章俊
Indexed by:Journal paper
Correspondence Author:邵海东
Discipline:Engineering
Document Type:J
Volume:47
Issue:1
Page Number:244-259
Translation or Not:no
Date of Publication:2025-01-01
Included Journals:EI
Links to published journals:https://jeit.ac.cn/cn/article/doi/10.11999/JEIT240648