In order to discover abnormal data in the data stream in time and reduce potential threats to the network, a high-dimensional category attribute data stream outlier mining algorithm based on spectral clustering is proposed. The characteristics of orderliness, high speed and high dimensionality of data streams are analyzed, and the main sources of outliers are explored.Using the attribute weight quantization method, introducing information entropy, merging the data streams with strong relevance, and then reducing the dimensionality of the data streams to reduce interference. The spectral clustering algorithm is used to set key scale parameters, the distance between the sample and the target is calculated by the affinity matrix, the spectral clustering is transformed into an undirected graph segmentation problem, the feature matrix is obtained, and the significant outlier features are extracted.Using the distance mining method, data blocks is added to the data stream, the probability distribution between two adjacent data blocks is judged, a sliding window is set, the distance between the data and the sliding window is obtained, and then compare with the set threshold. Outliers are added to the set to complete the mining.
The simulation results show that for data streams of different sizes and dimensions, the execution time required by the algorithm is within 42 s and 40 s respectively, and it has good scalability for the size and dimensions of data streams, and the outlier data mined is consistent with the reality.