kdump is a feature of the Linux kernel that creates crash dumps in the event of a kernel crash. When triggered, kdump exports a memory image (also known as vmcore) that can be analyzed for the purposes of debugging and determining the cause of a crash. The dumped image of main memory, exported as an Executable and Linkable Format (ELF) object, can be accessed either directly through /proc/vmcore during the handling of a kernel crash, or it can be automatically saved to a locally accessible file system, to a raw device, or to a remote system accessible over network
Install ‘kexec-tools’ using yum command
In RHEL 7 & Centos 7 kexec-tools installed in default. We can make sure by using rpm -qa | grep kexec. If there is no output, we need to install kexec. To install use following commend
# yum install kexec-tools
Update the GRUB2 file to Reserve Memory for Kdump kernel.
Edit the GRUB2 file (/etc/default/grub), add the parameter ‘crashkernel=<Reserved_size_of_RAM>‘ in the line beginning with ‘GRUB_CMDLINE_LINUX‘
GRUB_CMDLINE_LINUX=”console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=128M ”
To re-generate grub2 configuration;
# grub2-mkconfig -o /boot/grub2/grub.cfg
For UEFI firmware;
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
FYI, this commend reserve 128MB RAM in bootloader after reboot.
Once Changes done, we need reboot the server;
# shutdown -r now
Update the dump location & default action in the file (/etc/kdump.conf)
To store crash dump or vmcore file on a local file system, edit the file ‘/etc/kdump.conf‘ and specify the location as per your setup. Recommended to user separate local file system ( /var/crash). It is recommended that size of file system should be equivalent to the size of your system’s RAM or file system should have free space equivalent to the size of RAM. Kdump allows to compress the dump data using ‘core collector’ option (core_collector makedumpfile -c ) where -c is used for compression.
In case if kdump fails to store the dump file to specified location then default action will be perform, which we include in default directive.
Update the below three directives in kdump.conf file.
# vi /etc/kdump.conf
path /var/crash
core_collector makedumpfile -l –message-level 1 -d 31
there is few more options which we can find in kdump.conf file.
To Start and enable kdump service
# systemctl start kdump.service
# systemctl enable kdump.service
Let Test Kdump by manually crashing the system
Before proceed to crash the system make sure kdump service is running by using following commend.
# systemctl is-active kdump.service
Or
# systemctl status kdump.service
To crash the system try following commend
# echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
This will create a crash dump file (vmcore ) under ‘/var/crash‘ file system.
# ls -lR /var/crash
/var/crash:
total 0
drwxr-xr-x. 2 root root 58 Jul 4 09:22 127.0.0.1-2016-07-04-09:22:17
/var/crash/127.0.0.1-2016-07-04-09:22:17:
total 254782
-rw——-. 1 root root 148524896 Jul 4 09:22 vmcore
-rw-r–r–. 1 root root 85248 Jul 4 09:22 vmcore-dmesg.txt
Analyze and Debug crash dumps.
To analyze and debug we can crash command. Crash is the utility or command to debug and analyze the crash dump or vmcore file.
To use the crash, make sure two packages are installed: ‘crash & kernel-debuginfo‘
# yum install crash -y
To install ‘kernel-debuginfo’ package, we need to enable debug repo.
Edit the repo file /etc/yum.repos.d/CentOS-Debuginfo.repo change ‘enabled=0’ to ‘enabled=1’
# yum install kernel-debuginfo -y
Once the kernel-debuginfo is installed, then try to execute below crash command, it will give us a crash prompt where we can run commands to find process info, list of open files when the system got crashed.
# crash /var/crash/127.0.0.1-2016-07-04-16\:40\:12/vmcore /usr/lib/debug/lib/modules/`uname -r`/vmlinux
127.0.0.1-2016-07-04-09:22:17
crash>
crash> ps
“ps” – will display the services which was running while system crash
To view the files that were open when system got crashed, type ‘files’ command at crash prompt.
crash> files
crash> sys
“sys” – will list the system info when it got crashed.
crash>help
“help” will give list of options which we can use to troubleshoot.