Exadata cellboot USB issues

I had the dubious pleasure today of receiving the following alert from the Exadata storage server software on an X3 –


CELL-02666: Cell configuration check for devices encountered the following issues:

USB Errors: [ERROR] Cell USB could not be fixed.

I thought I’d share some of the understanding I’ve gained about this whole feature and the fix for this particular problem too.

So firstly – a quick explanation of what this is. Its a USB pen drive which is permanently attached to the back of all Exadata storage servers. It is generally keep dismounted, and it contains a version of the OS ready to boot from in the event that your storage cell becomes unbootable. Generally it just sits there silently, never really needing any attention and (hopefully) never required for use either. However, in this case it gave me a few little problems.

A search around the storage server logs led me to notice that there was a file system named /mnt mounted but pointing to a device which just didn’t exist.

[root@cell01 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md5 10321144 5426908 4369952 56% /
tmpfs 32981668 0 32981668 0% /dev/shm
/dev/md7 3096272 711264 2227728 25% /opt/oracle
/dev/md4 116576 30795 79762 28% /boot
/dev/md11 5160448 163548 4734764 4% /var/log/oracle
/dev/sdad1 3893880 1210232 2683648 32% /mnt
[root@cell01 ~]# cd
[root@cell01 ~]# cd /mnt
[root@cell01 mnt]# touch test
touch: cannot touch `test': Input/output error

This was the case on all cells. I eventually went on to umount this file system as it clearly shouldn’t be there and didn’t exist on other racks.

I then started to look at what was actually on our CELLBOOT USB device. To to this I wanted to mount the file system on the device. By running “imageinfo” you should be able to see what device to use.

[root@cell01 ~]# imageinfo

Kernel version: 2.6.32-400.11.1.el5uek #1 SMP Thu Nov 22 03:29:09 PST 2012 x86_64
Cell version: OSS_11.2.3.2.1_LINUX.X64_130109
Cell rpm version: cell-11.2.3.2.1_LINUX.X64_130109-1

Active image version: 11.2.3.2.1.130109
Active image activated: 2013-03-14 04:45:00 -0700
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7

In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1
Product version not found in /mnt/usb.image.info/image.id
Unable to get image version of cellboot usb from /mnt/usb.image.info/image.id
Cell boot usb version: undefined

Inactive image version: undefined
Rollback to the inactive partitions: Impossible

So you can see in bold above that the USB device is /dev/sdm1. Interestingly what imageinfo is doing in the background is mounting that device in /mnt/usb.image.info and looking for certain files. It then immediately umounts the device. It happens so fast that I can’t actually see the mounted file system but /var/log/messages confirms this is happening.

Also it is quite interesting that the scripts have saved us from some manual work around identifying which device the USB drive is connected as. This seems to get calculated by a function called find_usb_devices in /opt/oracle.cellos/image_functions but in short it is basically the first USB drive it comes across with a label of CELLBOOT. Another way of seeing this is to take a look in /dev/disk/by-label and look for CELLBOOT.

So imageinfo figures out where to look for the USB drive, and then it peaks in the drive to see what version it is at. It is this step which is failing at the moment, so I run that step manually.

[root@cell01 mnt]# mount /dev/sdm1 /mnt/usb
[root@cell01 mnt]# cd /mnt/usb
[root@cell01 usb]# ls
lost+found
[root@cell01 usb]#

Ah that isn’t good. The drive is totally empty (fresh file system by the looks of it). I checked this was the case on all other cells.

So next step is to rebuild the CELLBOOT device. These steps were provided and tested by Oracle Support prior to me running them on the customer Exadata.

[root@cell01 ~]# cd /opt/oracle.SupportTools/
[root@cell01 oracle.SupportTools]# ./make_cellboot_usb -verbose
Potential USB device: sda
It is not usb-storage driver
Potential USB device: sdaa
It is not usb-storage driver
Potential USB device: sdab
It is not usb-storage driver
Potential USB device: sdac
It is not usb-storage driver
Potential USB device: sdb
It is not usb-storage driver
Potential USB device: sdc
It is not usb-storage driver
Potential USB device: sdd
It is not usb-storage driver
Potential USB device: sde
It is not usb-storage driver
Potential USB device: sdf
It is not usb-storage driver
Potential USB device: sdg
It is not usb-storage driver
Potential USB device: sdh
It is not usb-storage driver
Potential USB device: sdi
It is not usb-storage driver
Potential USB device: sdj
It is not usb-storage driver
Potential USB device: sdk
It is not usb-storage driver
Potential USB device: sdl
It is not usb-storage driver
Potential USB device: sdm
Device sdm has size 7831552
First partition label: CELLBOOT
Added to the list of USB drivers

Potential USB device: sdn
It is not usb-storage driver
Potential USB device: sdo
It is not usb-storage driver
Potential USB device: sdp
It is not usb-storage driver
Potential USB device: sdq
It is not usb-storage driver
Potential USB device: sdr
It is not usb-storage driver
Potential USB device: sds
It is not usb-storage driver
Potential USB device: sdt
It is not usb-storage driver
Potential USB device: sdu
It is not usb-storage driver
Potential USB device: sdv
It is not usb-storage driver
Potential USB device: sdw
It is not usb-storage driver
Potential USB device: sdx
It is not usb-storage driver
Potential USB device: sdy
It is not usb-storage driver
Potential USB device: sdz
It is not usb-storage driver
Candidate for the Oracle Exadata Cell start up boot device : /dev/sdm
Partition on candidate device : /dev/sdm1
The current product version : 11.2.3.2.1.130109
Label of the current Oracle Exadata Cell start up boot device : CELLBOOT
Product version not found in /mnt/usb.make.cellboot/image.id
Unable to get image version of cellboot usb from /mnt/usb.make.cellboot/image.id

The current CELLBOOT USB product version :
It is a dry run. No action performed

So here the script has attempted to identify the correct USB drive to use. It has obviously found /dev/sdm and then confirmed it has the correct drive label. It then checks the version of the software on this drive, and aborts because it doesn’t find anything. This is expected, so we proceed again with the -force option.

[root@cell01 oracle.SupportTools]# ./make_cellboot_usb -verbose -force

The script goes on to create a new partition table on the device, and then copy across the necessary software and grub configuration. It then umounts the device.

Our imageinfo commands now work.

[root@cell01 oracle.SupportTools]# imageinfo

Kernel version: 2.6.32-400.11.1.el5uek #1 SMP Thu Nov 22 03:29:09 PST 2012 x86_64
Cell version: OSS_11.2.3.2.1_LINUX.X64_130109
Cell rpm version: cell-11.2.3.2.1_LINUX.X64_130109-1

Active image version: 11.2.3.2.1.130109
Active image activated: 2013-03-14 04:45:00 -0700
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7

In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.3.2.1.130109

So we now have a working USB drive, standing by in case we ever need it. All of this was undertaken with no service disruption to the cells, so doesn’t require any downtime at all.

Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: