Skip to content

Solaris 10 boot archive issue

今天一台Solaris服务器突然进入维护模式,用户无法通过SSH连接,所有服务停止。
幸好系统已经配置了ILOM,通过ILOM界面reboot系统,重启时提示如下错误:

========================================================
WARNING: add_spec: No major number for
NOTICE: nxge0: xcvr addr:0x1d – link is down
NOTICE: nxge1: xcvr addr:0x1c – link is down
Hostname: solaris10_sparc
VxVM sysboot INFO V-5-2-3409 starting in boot mode…
NOTICE: VxVM vxdmp V-5-0-34 added disk array 04717, datype = TagmaStore-USP

NOTICE: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk

NOTICE: VxVM vxdmp V-5-3-1700 dmpnode 308/0x0 has migrated from enclosure FAKE_ENCLR_SNO to enclosure DISKS

VxVM sysboot INFO V-5-2-3390 Starting restore daemon…
NOTICE: nxge0: xcvr addr:0x1d – link is up 1000 Mbps full duplex
NOTICE: nxge1: xcvr addr:0x1c – link is up 1000 Mbps full duplex

WARNING: The following files in / differ from the boot archive:

changed /kernel/drv/qlc.conf

The recommended action is to reboot to the failsafe archive to correct
the above inconsistency. To accomplish this, on a GRUB-based platform,
reboot and select the “Solaris failsafe” option from the boot menu.
On an OBP-based platform, reboot then type “boot -F failsafe”. Then
follow the prompts to update the boot archive. Alternately, to continue
booting at your own risk, you may clear the service by running:
“svcadm clear system/boot-archive”

Mar 23 19:43:16 svc.startd[7]: svc:/system/boot-archive:default: Method “/lib/svc/method/boot-archive” failed with exit status 95.
Mar 23 19:43:16 svc.startd[7]: system/boot-archive:default failed fatally: transitioned to maintenance (see ‘svcs -xv’ for details)
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run

Root password for system maintenance (control-d to bypass):
========================================================

还好,可以进入维护模式。
在上面的提示符下,输入root密码,进入维护模式。然后到OK模式下:

# init 0

系统退回到ok模式。根据错误中的提示,起动到failsafe模式下:

{ok} boot -F failsafe

然后,系统会给出提示,启动到SHELL模式下,然后根据提示运行一系列命令更新boot_archive:

首先fsck检查磁盘状态(根据你的情况替换c1t0d0s0):
# fsck /dev/dsk/c1t0d0s0

很可能检查的时候会发现一些问题,这个时候你就要小心了。你需要对fsck命令比较熟悉,了解它可能会产生的后果,不然可能导致系统完全崩溃,数据丢失。

如果fsck没有报错,或者你修复了那些错误,将它挂载至/a:
# mount /dev/dsk/c1t0d0s0 /a

如果你的根系统使用了mirror,要先break mirror。具体的操作请参看:
http://docs.sun.com/app/docs/doc/817-1985/gglaj?a=view

更新boot_archive:
# bootadm update-archive -R /a

卸载/a:
# umount /a

重新启动:
# shutdown -i6 -g0 -y

如果系统问题不大,应该可以重新启动了。
如果还不行那就说明系统或硬件还存在更大的问题,需要进一步查证。

Avatar

专业Linux/Unix/Windows系统管理员,开源技术爱好者。对操作系统底层技术,TCP/IP协议栈以及信息系统安全有强烈兴趣。电脑技术之外,则喜欢书法,古典诗词,数码摄影和背包行。

No comments yet.

Leave a Reply

SidebarComments (0)