2010 年 3 月 24 日acheng

Solaris 10 boot archive issue

今天一台Solaris服务器突然进入维护模式，用户无法通过SSH连接，所有服务停止。
幸好系统已经配置了ILOM，通过ILOM界面reboot系统，重启时提示如下错误：

========================================================
WARNING: add_spec: No major number for
NOTICE: nxge0: xcvr addr:0x1d – link is down
NOTICE: nxge1: xcvr addr:0x1c – link is down
Hostname: solaris10_sparc
VxVM sysboot INFO V-5-2-3409 starting in boot mode…
NOTICE: VxVM vxdmp V-5-0-34 added disk array 04717, datype = TagmaStore-USP

NOTICE: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk

NOTICE: VxVM vxdmp V-5-3-1700 dmpnode 308/0x0 has migrated from enclosure FAKE_ENCLR_SNO to enclosure DISKS

VxVM sysboot INFO V-5-2-3390 Starting restore daemon…
NOTICE: nxge0: xcvr addr:0x1d – link is up 1000 Mbps full duplex
NOTICE: nxge1: xcvr addr:0x1c – link is up 1000 Mbps full duplex

WARNING: The following files in / differ from the boot archive:

changed /kernel/drv/qlc.conf

The recommended action is to reboot to the failsafe archive to correct
the above inconsistency. To accomplish this, on a GRUB-based platform,
reboot and select the “Solaris failsafe” option from the boot menu.
On an OBP-based platform, reboot then type “boot -F failsafe”. Then
follow the prompts to update the boot archive. Alternately, to continue
booting at your own risk, you may clear the service by running:
“svcadm clear system/boot-archive”

Mar 23 19:43:16 svc.startd[7]: svc:/system/boot-archive:default: Method “/lib/svc/method/boot-archive” failed with exit status 95.
Mar 23 19:43:16 svc.startd[7]: system/boot-archive:default failed fatally: transitioned to maintenance (see ‘svcs -xv’ for details)
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run

Root password for system maintenance (control-d to bypass):
========================================================

还好，可以进入维护模式。
在上面的提示符下，输入root密码，进入维护模式。然后到OK模式下：

# init 0

系统退回到ok模式。根据错误中的提示，起动到failsafe模式下：

{ok} boot -F failsafe

然后，系统会给出提示，启动到SHELL模式下，然后根据提示运行一系列命令更新boot_archive：

首先fsck检查磁盘状态（根据你的情况替换c1t0d0s0）：
# fsck /dev/dsk/c1t0d0s0

很可能检查的时候会发现一些问题，这个时候你就要小心了。你需要对fsck命令比较熟悉，了解它可能会产生的后果，不然可能导致系统完全崩溃，数据丢失。

如果fsck没有报错，或者你修复了那些错误，将它挂载至/a:
# mount /dev/dsk/c1t0d0s0 /a

如果你的根系统使用了mirror，要先break mirror。具体的操作请参看：
http://docs.sun.com/app/docs/doc/817-1985/gglaj?a=view

更新boot_archive:
# bootadm update-archive -R /a

卸载/a：
# umount /a

重新启动：
# shutdown -i6 -g0 -y

如果系统问题不大，应该可以重新启动了。
如果还不行那就说明系统或硬件还存在更大的问题，需要进一步查证。

acheng

专业Linux/Unix/Windows系统管理员，开源技术爱好者。对操作系统底层技术，TCP/IP协议栈以及信息系统安全有强烈兴趣。电脑技术之外，则喜欢书法，古典诗词，数码摄影和背包行。

Solaris 10 boot archive issue

No comments yet.

Leave a Reply Cancel reply