Vsanobserver Sh Vsantraced Is Not Started Can T Start Vsanobserver

Forums
Bits & Bytes
Virtualized Computing

You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an alternative browser.

Services Failing on ESXi 5.5 Host - Can't Restart Services

Thread starter KapsZ28
Start date Aug 12, 2014

Joined: May 29, 2009

Messages: 2,114

One of our servers in a cluster experienced an issue this morning. It can't be managed through vCenter or directly to the server with vSphere Client. SSH is working. I ran "./sbin/services.sh restart" to restart all the services, but there are many failures and I can't get it to the point where it can be managed through vSphere Client or vCenter. Obviously HA is not working since it is no longer in the cluster. All I care about is getting it to the point where I can manage it through vCenter and perform vMotions. Otherwise I am stuck shutting down all the VMs in order to move them.

When I run "./sbin/services.sh restart", below is what I am getting.

                                                              Running vmware-fdm stop  Stopping vmware-fdm:success   Running xorg stop   Running wsman stop   Stopping openwsmand   Openwsmand is not running.   Running sfcbd stop   This operation is not supported.   Please use /etc/init.d/sfcbd-watchdog stop   Running snmpd stop   root: snmpd is not running.   Running sfcbd-watchdog stop   sh: bad number   sh: you need to specify whom to kill   pkill: failed to kill /sbin/sfcbd (8739627): No such process   pkill: Failure   pkill: failed to kill /sbin/sfcbd (8739627): No such process   pkill: Failure   pkill: failed to kill /sbin/sfcbd (8739627): No such process   pkill: Failure   Connect to localhost failed: Connection failure   Connect to localhost failed: Connection failure   Running vpxa stop   vpxa is not running   Connect to localhost failed: Connection failure   Running vobd stop   vobd is not running   Running lacp stop   LACP daemon is not running   Running memscrubd stop   memscrubd is not running   Running nscd stop   nscd is not running   Running smartd stop   smartd is not running   Running dcbd stop   dcbd is not running   Running cdp stop   cdp is not running   Running slpd stop   Stopping slpd   Running rhttpproxy stop   rhttpproxy is not running.   Running vsantraced stop   watchdog-vsantraced: PID file /var/run/vmware/watchdog-vsantraced.PID does not e                                                                                                                                                             xist   watchdog-vsantraced: Unable to terminate watchdog: No running watchdog process f                                                                                                                                                             or vsantraced   vsantraced is not running   Failed to clear vsantraced memory reservation   Running swapobjd stop   swapobjd is not running   Running vmfstraced stop   watchdog-vmfstracegd: PID file /var/run/vmware/watchdog-vmfstracegd.PID does not                                                                                                                                                              exist   watchdog-vmfstracegd: Unable to terminate watchdog: No running watchdog process                                                                                                                                                              for vmfstracegd   vmfstracegd is not running   Failed to clear vmfstracegd memory reservation   Running sensord stop   sensord is not running   Running lbtd stop   net-lbt is not running   Running hostd stop   hostd is not running.   Running storageRM stop   storageRM is not running   Running sdrsInjector stop   sdrsInjector is not running   Running DCUI stop   Disabling DCUI logins   VobUserLib_Init failed with -1   Running SSH stop   SSH login disabled   VobUserLib_Init failed with -1   Connect to localhost failed: Connection failure   Errors:   Invalid operation requested: This ruleset is required and connot be disabled   Running ntpd stop   Stopping ntpd   watchdog-ntpd: Terminating watchdog process with PID 13010437   Connect to localhost failed: Connection failure   Running iomemory-vsl stop   Running iomemory-vsl restart   Running ntpd restart   Connect to localhost failed: Connection failure   Starting ntpd   Running SSH restart   Connect to localhost failed: Connection failure   SSH login enabled   VobUserLib_Init failed with -1   Running DCUI restart   Enabling DCUI login: runlevel =   VobUserLib_Init failed with -1   Running sdrsInjector restart   sdrsInjector started   Running storageRM restart   storageRM started   Running hostd restart   Ramdisk 'hostd' with estimated size of 1053MB already exists   hostd started.   Running lbtd restart   net-lbt started   Running sensord restart   sensord started   Running vmfstraced restart   VMFS Global Tracing is not enabled.   Running swapobjd restart   swapobjd started   Running vsantraced restart   Storing traces to /scratch/vsantraces   mkdir: can't create directory '/scratch/vsantraces': Connection timed out   Failed to mkdir: /scratch/vsantraces   Running rhttpproxy restart   rhttpproxy started.   Running slpd restart   Starting slpd   Running cdp restart   cdp started   Running dcbd restart   dcbd started   Running smartd restart   smartd started   Running nscd restart   nscd started   Running memscrubd restart   The checkPages boot option is FALSE, hence memscrubd could not be started.   Running lacp restart   LACP daemon started   Running vobd restart   vobd started   Running vpxa restart   Connect to localhost failed: Connection failure   Running sfcbd-watchdog restart   Connect to localhost failed: Connection failure   Connect to localhost failed: Connection failure   sfcbd is running.   Running snmpd restart   root: snmpd opening firewall port(s) for notifications.   Running sfcbd restart   This operation is not supported.   Please use /etc/init.d/sfcbd-watchdog start   Running wsman restart   Starting openwsmand   Running xorg restart   Running vmware-fdm restart   Starting vmware-fdm:success

Any recommendations?

Joined: Oct 11, 2001

Messages: 31,956

looks normalish to me.

Hostd and vpxa both started. Go see what's in /var/log/vmkernel.log, /var/log/hostd.log, and /var/log/vpxa.log

See if they're throwing errors (especially vmkernel).

Joined: May 29, 2009

Messages: 2,114

looks normalish to me.
Hostd and vpxa both started. Go see what's in /var/log/vmkernel.log, /var/log/hostd.log, and /var/log/vpxa.log

See if they're throwing errors (especially vmkernel).

Ha, I used the word "stableish" this morning to describe the server.

The services restart took much longer than normal and kept getting stuck on some processes and some show as failed. Although I never fully looked at the list before since it normal goes through pretty quickly. It was also getting stuck at "Running usbarbitrator restart" so I had to run "chkconfig usbarbitrator off" just to get it to complete. I will check the logs shortly.

Joined: Oct 11, 2001

Messages: 31,956

Yeah, it does that hwen something has gone sideways. The problem is, it also does that when things are normal sometimes too - limitations of busybox means the script is pretty bloody simple (eg: Is it up? Lets check 5 times with 30 second delays and log failures when it fails, 'cauz that's smart).

Joined: May 29, 2009

Messages: 2,114

This server is setup with syslog to our Windows vCenter server. What is the easiest way to review the logs? I can't seem to view them in /var/logs because the files don't actually exist there.

If I go to the Windows server. C:\ProgramData\VMware\VMware Syslog Collector\Data\10.20.10.50 and open the syslog file, there is tons of information but I don't know what exactly I should be looking for.

Joined: Oct 11, 2001

Messages: 31,956

dig through it for anything in vmkernel - look for "error" or "warning"

Joined: May 29, 2009

Messages: 2,114

Here is some stuff that pertains to storage, but obviously storage is working otherwise the VMs wouldn't still be running.

                              <182>2014-08-12T15:54:49.908Z vps24.corp.domain.net vmkernel: cpu39:32860)ScsiDeviceIO: 2337: Cmd(0x41300b6bba40) 0x1a, CmdSN 0x26977 from world 0 to dev "naa.600605b0070ec9c01b0613ea1266322e" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. <180>2014-08-12T15:54:49.908Z vps24.corp.domain.net vmkwarning: cpu26:13028837)WARNING: ScsiDeviceIO: 7793: READ CAPACITY on device "naa.600605b0070ec9c01b0613ea1266322e" from Plugin "NMP" failed. I/O error <182>2014-08-12T15:54:49.908Z vps24.corp.domain.net vmkernel: cpu26:13028837)WARNING: ScsiDeviceIO: 7793: READ CAPACITY on device "naa.600605b0070ec9c01b0613ea1266322e" from Plugin "NMP" failed. I/O error <182>2014-08-12T15:54:49.908Z vps24.corp.domain.net vmkernel: cpu26:13028837)Vol3: 2174: Could not open device 'naa.600605b0070ec9c01b0613ea1266322e:5' for probing: I/O error <182>2014-08-12T15:54:49.908Z vps24.corp.domain.net vmkernel: cpu35:9109179)ScsiDeviceIO: 2324: Cmd(0x41300b6bba40) 0x9e, CmdSN 0x26978 from world 0 to dev "naa.600605b0070ec9c01b0613ea1266322e" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.

Joined: Oct 11, 2001

Messages: 31,956

You have a stuck reservation or someting similar.

from the locked up host:

vmkfstools --lock lunreset /vmfs/devices/disks/naa.600605b0070ec9c01b0613ea1266322e

However, that's partition 5 on that NAA - local disk? or you all using extents

Joined: May 29, 2009

Messages: 2,114

                            <166>2014-08-12T15:56:20.801Z vps24.corp.domain.net Rhttpproxy: [FFE1ED70 warning 'Proxy Req 00897'] Connection to localhost : 8307 failed with error N7Vmacore15SystemExceptionE(Connection refused). <166>2014-08-12T15:56:31.007Z vps24.corp.domain.net Vpxa: [FFD7A1A0 verbose 'commonvpxXml'] [VpxXml] Error fetching /sdk/vimServiceVersions.xml: 503 (Service Unavailable) <166>2014-08-12T15:56:31.015Z vps24.corp.domain.net Vpxa: [FFD7A1A0 verbose 'commonvpxXml'] [VpxXml] Error fetching /definitions/import/@namespace from /sdk/vimService?wsdl: 503 (Service Unavailable)

Joined: May 29, 2009

Messages: 2,114

You have a stuck reservation or someting similar.
from the locked up host:

vmkfstools --lock lunreset /vmfs/devices/disks/naa.600605b0070ec9c01b0613ea1266322e

However, that's partition 5 on that NAA - local disk? or you all using extents

Local disks in RAID 1. So it is safe to run that command while VMs are running?

Joined: May 29, 2009

Messages: 2,114

Eh, I ran it anyway but no luck.

# vmkfstools --lock lunreset /vmfs/devices/disks/naa.600605b0070ec9c01b0613ea1266322e
Command lunreset failed
Error: Unable to access device, please check your connection to the device.

Possibly a RAID card issue?

Joined: Oct 11, 2001

Messages: 31,956

Might be - it may have gone out to lunch in a creative way that is keeping it from responding, but isn't "dead". Hostd is probably trying to inventory the devices in the system, and can't, and htat's locking the buffers that it uses to also report to VPX with. Vmkernel can't clear it since it's responding, just not responding in a "sane" way.

When you do esxcfg-mpath -b | grep -i naa.600605b0070ec9c01b0613ea1266322e -A4
what do you get back?

Joined: May 29, 2009

Messages: 2,114

~ # esxcfg-mpath -b | grep -i naa.600605b0070ec9c01b0613ea1266322e -A4
naa.600605b0070ec9c01b0613ea1266322e : Local LSI Disk (naa.600605b0070ec9c01b0613ea1266322e)
vmhba0:C2:T0:L0 LUN:0 state:active Local HBA vmhba0 channel 2 target 0

Joined: Oct 11, 2001

Messages: 31,956

wellp, it's there, just won't take a reset. Reboot is sadly your only notable option at this point. HA is still running, so you could just kill it and let it bring htem back up, but I can't guarantee it came back sane from the services.sh restart (it should have - only seen once it didn't).

Joined: May 29, 2009

Messages: 2,114

Yeah, I kind of figured. Just need to schedule it and let people know their VMs will be down temporarily. Hopefully the server will stay running until it can be scheduled.

Joined: Oct 11, 2001

Messages: 31,956

should be fine.

I've seen them in that state for a year before.

Joined: May 29, 2009

Messages: 2,114

Cool. Then I will hold off until Friday for maintenance.

Joined: May 29, 2009

Messages: 2,114

It looks like this issue is most likely the RAID controller.

Awhile back on this same server with had an issue with the vSphere Flash Read Cache. The latency on the SSD literally went to 40,000 or higher. Being that we had other cache issues with Veeam, and this was a new technology to VMware, I just removed the Flash Cache from all servers. That SSD is connected to the same RAID controller that has two hard drives in RAID 1 with ESXi installed.

Even after this server was rebooted, I still had some issues and decided to delete the RAID volume, re-create it, and reinstall ESXi. Again, started to see some issues.

Had someone insert a USB thumb drive into the back of the server, installed the same exact version of ESXi 5.5, and haven't seen any problems since. What is annoying is that as far as diagnostics, alerts, etc. Nothing shows that there is a problem with the RAID controller.

Forums
Bits & Bytes
Virtualized Computing

Vsanobserver Sh Vsantraced Is Not Started Can T Start Vsanobserver

Source: https://hardforum.com/threads/services-failing-on-esxi-5-5-host-cant-restart-services.1829769/

Stoddard Doely1980

Vsanobserver Sh Vsantraced Is Not Started Can T Start Vsanobserver

Services Failing on ESXi 5.5 Host - Can't Restart Services

Menu Halaman Statis