LinuxCon Japan is the premiere Linux conference in Asia that brings together a unique blend of core developers, administrators, users, community managers and industry experts.
CloudOpen Japan is a conference celebrating and exploring the open source projects, technologies and companies who make up the cloud. It’s built on a belief that open works: for users, for industry and for technology.
In virtual environment, many guests are running on one hypervisor and reliability of KVM hypervisor is really important. One of the key features is "hardware error handling." In order to minimize area of influence when hardware error, such as Machine Check, is detected, isolating hardware with a failure, shutting down only affected guest, are required. As for hardware error handling of Linux, there are three key features: pre-failure detection, failure isolation, continuity after isolation. These features are generally implemented in upstream kernel, however some important issues are still unresolved.
This presentation will show the current implementation of the three key features, detail of unresolved issues, and current activities to solve those issues will be explained. Target audience is kernel developers who are interested in reliability of virtual environment.