Grub2 Issues

Recently I upgraded my Fedora Linux workstation at home. I used a new command (at least for me) rpmconf after the upgrade that cleans up the .rpmsave (et. al.) configuration files. Somehow in this process, my grub installation was corrupted. How? I still don’t understand. In my attempt to recover grub, I found that documentation across the internet is mainly for grub version 0 (sometimes referred to as version 1) with recipes that don’t exactly work with Fedora’s grub version 2 implementation.

Upgrading Fedora –

I upgraded from Fedora 33 to 34. In the process, Grub corrupted. Nothing like being presented the grub > prompt with no menu entries and having to manually feed Grub to get the machine booted. This is my journey for recovering Grub2 on F34.

As background, I have used Fedora for more than two decades. I’ve used other distros, but always come back to Fedora; likely due to my familiarity and the fact that in my work, we use RHEL.

Historically on open source systems (i.e. flavors of UNIX and Linux), upgrading the OS always had issues that would ultimately lead to a fresh installation. You took more time ferret out issues and resolve them than to wipe the slate clean and start fresh configurng accordingly.

I'm a proponent for administrating a network of open systems through the management of the OS configuration globally and deal with exeptions on a case by case basis as it relates to the services hosted on that particular host.

Either way, it requires definition and consistent deployment to make it work over the enterprise. It benefits not only in deployment but in rapid recovery and not if, but when recovery is needed.

There is an old adage that you don’t quit eating because your mom once burnt a batch of biscuits. The same is true with administering OS systems. While upgrading versions on the OS never worked well in the past, a few years ago I thought it would be good to test out upgrading again. In this case, I would go through the process to upgrade from F29 to F30 with the backup plan to do my usual reinstall and configure paradigm as before.

I was quite surprised that the upgrade installed without issue. Since then, I have simply upgraded to the next OS version. There have been a few minor glitches along the way such as when Fedora defaulted Wayland as the default windowing system. It ran slow and some components of the user interface didn’t work. I had to configure to use X11. No major issue has arose that would cause me not to upgrade.

While upgrading recently from F33 to F34, the OS upgraded without a hitch, I ran a utility afterwards that provided an unusual glitch that blindsided me. By default, when an rpm package is updated or upgraded, any configuration file in the package would save off the new configuration file as a .rpmsave, .rpmnew or some such as to avoid overlaying the “production version” config file. This is a mixed blessing. It preserves the old, but provides no help to guide whether that pre-existing config file is compatible with the updated package. In the past, you would need to (1) know what the config file is and (2) compare the two files to figure out what changes in the update would require any changes, if any, is needed. In this upgrade, I noticed a command rpmconf whose purpose is to search for the .rpmsave, et. al. conf files and allow the administrator to remove the new or overlay the old config file with the new.

In running rpmconf -a, some conf files related to Grub 2 appeared in the dialog. I thought I had bypassed them (I didn’t understand why these came up in the conversation as they are “auto generated”), but found that Grub had been corrupted - there were no menu items at boot time. That meant that I had to boot off of a LIVE CD/DVD since there was no “recovery” option available in order to assess the damage.

On the positive side, this exercise was beneficial as it was a good time to dig into Grub 2 and its internals.

As a side note - Grub 1 was straight forward in that you manipulated the menu inside of the grub.cfg file on your /boot partition. With Grub 2, it is the result of a process and provides a hierarchical structure for finding individual conf files supporting the grub menu items.

My journey for getting Grub 2 recovered:

Downloaded and burnt Fedora 34 LIVE CD/DVD.

This was pretty straight forward.

Booted on the Fedora 34 LIVE CD/DVD.

This was straight forward as well.

Assess damage on Grub.

I wanted to view what state my /boot partition was in. Since my OS root was installed on an LVM file system, I performed:

vgscan

lvs

lvdisplay


to detect my LVM volumes. I wanted to mount up the root and boot filesystems so I could probe and possibly chroot and repair.

To stage mounting of the local disk:

mkdir /mnt/sysimage

mount /dev/fedora/root /mnt/sysimage


Root filesystem is now mounted. Locate /boot and get it mounted up off the root filesystem just mounted.

fdisk -l

I looked for the filesystem type on the boot partition to insure that it wasn’t a GPT disk (which it was not - it was an MSDOS file system – the long time default way back when!).

mount /dev/sda1 /mnt/sysimage/boot

All the Grub files seemed to be intact, though the grub.cfg had the F29 menu entries, of which all kernels had long since been removed and Grub was rightfully erroring that those menu items were no longer valid.

Set up chroot environment and chroot in order to re-install Grub and recreate the config file. Before chrooting, I needed to mount up and bind the dev, proc and sys directories to the root filesystem that I was about to chroot to.

mount -o bind /dev/mnt/sysimage/dev

mount -t proc proc /mnt/sysimage/proc

mount -o bind /sys /mnt/sysimage /sys

chroot /mnt/sysimage

lsblk

This is only a check to see if /dev/sda1 was output from lsblk after chrooting.

Re-initialize Grub on /dev/sda

grub2-install /dev/sda

ls -l /boot/grub2

Saved the existing F29 versioned grub.cfg and created a new grub.cfg. (In retrospect, I probably should have done this before the grub2-install.

cp -p /boot/grub2/grub.cfg /boot/grub2/grub.cfg.f29

grub2-mkconfig -o /boot/grub2/grub.cfg

I reviewed the /boot/grub2/grub.cfg. It didn’t see the menu entries embedded in the grub.cfg but noted this comment:

# The blscfg command parses the BootLoaderSpec files stored in /boot/loader/entries and

# populates the boot menu.

The menu entries existed in the /boot/loader/entries. I would assume that they would now be found at boot time.

Reboot without LIVE CD/DVD.

Rebooted without the LIVE CD/DVD. I fully expected to see the Grub menu and have the recovery complete.

On boot, I got the grub > prompt. Ugh!

Try manually booting from the grub > prompt:

Since Grub was not fully repaired, I wanted to see if I could get Grub stage 2 running by entering the commands from the Grub prompt.

set pager=1

ls

Responded with:

(hd0) (hd0,msdos2) (hd0,msdos1) (lvm/fedora-swap) (lvm/fedora-home) (lvm/fedora-root)

I wanted to see which partition the /boot files existed.

ls (hd0,msdos1)/

Notice that the ls command ended with slash ("/") gives the listing of the filesystem.
Without the slash, it only responds with metadata on the filesystem itself.

In order to read the LVM filesystem partitions, Grub needs the LVM module loaded:

insmod lvm

Manually entered the menu entry:

set root=(lvm/fedora-root)

linux16 (hd0,msdos1)/vmlinuz-5.11.18-300.fc34.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root

initrd16 (hd0,msdos1)/initramfs-5.11.18-300.fc34.x86_64.img

boot

I had a successful boot though the menu entries were not found. I was surprised since I had performed those routines in the chrooted environment on that particular partition. Unusual.

I performed a dnf update which had a new kernel available. I figured with whatever I had missed, the fix might be included in the kernel update post-install script.

On reboot, Grub could not find the menu entries again.

Reboot and manually feed Grub again.

I rebooted and entered the Grub menu commands to get me back into the local installed kernel again. In the booted, multi-user environment, I then performed:

grub2-install /dev/sda

On reboot, this fixed the installation and I was able to boot without having to manually enter the Grub menu commands yet again. Double ugh!

Why this worked after booting into the local kernel and not in the chroot environment? Doesn’t make sense to me. Note to self: don’t bother with rpmconf to delete or overlay a conf file. Just use it in a dry-run mode to see what falls out and manually evaluate and change where needed.