Health Monitoring and Prognostics for Computer Servers
shared by Elizabeth Foughty, updated on Sep 10, 2010
Summary
Abstract
Prognostics solutions for mission critical systems require a comprehensive methodology for proactively detecting and isolating failures, recommending and guiding condition-based maintenance actions, and estimating in real time the remaining useful life of critical components and associated subsystems.
A major challenge has been to extend the benefits of prognostics to include computer servers and other electronic components. The key enabler for prognostics capabilities is monitoring time series signals relating to the health of executing components and subsystems. Time series signals are processed in real time using pattern recognition for proactive anomaly detection and for remaining useful life estimation. Examples will be presented of the use of pattern recognition techniques for early detection of a number of mechanisms that are known to cause failures in electronic systems, including: environmental issues; software aging; degraded or failed sensors; degradation of hardware components; degradation of mechanical, electronic, and optical interconnects. Prognostics pattern classification is helping to substantially increase component reliability margins and system availability goals while reducing costly sources of "no trouble found"
events that have become a significant warranty-cost issue.
Bios
Aleksey Urmanov is a research scientist at Sun Microsystems. He earned his doctoral degree in Nuclear Engineering at the University of Tennessee in 2002. Dr. Urmanov's research activities are centered around his interest in pattern recognition, statistical learning theory and ill-posed problems in engineering. His most recent activities at Sun focus on developing health monitoring and prognostics methods for EP-enabled computer servers. He is a founder and an Editor of the Journal of Pattern Recognition Research.
Anton Bougaev holds a M.S. and a Ph.D. degrees in Nuclear Engineering from Purdue University. Before joining Sun Microsystems Inc. in 2007, he was a lecturer in Nuclear Engineering Department and a member of Applied Intelligent Systems Laboratory (AISL), of Purdue University, West Lafayette, USA. Dr. Bougaev is a founder and the Editor-in-Chief of the Journal of Pattern Recognition Research. His current focus is in reliability physics with emphasis on complex system analysis and the physics of failures which are based on the data driven pattern recognition techniques.
Files
Discussions
Elizabeth's Projects (21)
-
-
-
Intelligent Data Understanding Group
5 members
-
Elizabeth's posts
Elizabeth's Tags
Need help?
Visit our help center