Power Failure / Raid Failure/ HUD Failure / Now What --
TB 2.4.0 Aterisk 1.4.16 / CentOS 5, No YUMs.
Have a system that seems to be in a bad power area new UPS & New battries in other UPS unts helped some, but power was down just to long and the system failed hard.. What tool are people usinng to watch APC UPS units and shutdown when they go low ?
Now I have what appears to be a failed Raid set..
http://www.trixbox.org/forums/trixbox-forums/open-discussion/sata...
The IRCD and therefore HUDLite did not come back up had to restart by Hand, anyone write a cron to watch these two taskes and restart them if one or the other fails ?
/var/log/messages is filling up with this error.. An I have no idea what CentOS wants done, has anyone seen it before and know what the correction action should be ? Is it related to the raid failure ?
TIA....
----------------------------------------------------------------------------------------------------
Jan 30 18:20:47 trixbox1 smartd[2730]: Device: /dev/sda, not capable of SMART self-check
Jan 30 18:20:47 trixbox1 smartd[2730]: Device: /dev/sda, failed to read SMART Attribute Data
Jan 30 18:50:47 trixbox1 smartd[2730]: Device: /dev/sda, not capable of SMART self-check
Jan 30 18:50:47 trixbox1 smartd[2730]: Device: /dev/sda, failed to read SMART Attribute Data
Jan 30 19:20:47 trixbox1 smartd[2730]: Device: /dev/sda, not capable of SMART self-check
Jan 30 19:20:47 trixbox1 smartd[2730]: Device: /dev/sda, failed to read SMART Attribute Data
Jan 30 19:50:47 trixbox1 smartd[2730]: Device: /dev/sda, not capable of SMART self-check
Jan 30 19:50:47 trixbox1 smartd[2730]: Device: /dev/sda, failed to read SMART Attribute Data
I could really use some input here ...
Hi, can't really help you with your immediate problem, but no, it appears the problem is not related with the RAID itself but rather with the SMART monitoring: if you google for "Device: /dev/sda, not capable of SMART self-check" you will come across several peolpe who report a lot of SMART messages like yours. Their cure has been to disable smartd altogether.
Besides IRCD and HUDLite, is there anything else going wrong? Can you access all of your disks? Perhaps posting a dmesg would be appropriate.
Regarding your other question, look up www.networkupstools.org or, since you are using APC, www.apcupsd.org.
Is it possible to setup Trixbox so it will shutdown itself when UPS power is lost?
Well,
I would think it should be but I am going to research the two links referenced above and hopefully have something working soon. The Raid and smartd errors are more trouble some right now.
Other than my nervousness about the raid because power at this facility/ office is so unstable the system has actually been real stable.
------------------------
Is it possible to setup Trixbox so it will shutdown itself when UPS power is lost?
I use apcupsd with APC ups's on mine, works fine.
Dave,
Which version of TB / CentOS are you running ? Did you build the apcupsd from scratch or use a prebuild ?
Looks like this will be my best solution to solve the power detection issue, now to fight the raid problem.
---------------------------
Hoping someone with more CentOS experience than I have can explain these messages from dmesg regarding md2 & sda3 after a post power failure boot....
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 1:0:0:0: Attached scsi generic sg1 type 0
hda: ATAPI 48X CD-ROM drive, 96kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
FDC 0 is a National Semiconductor PC87306
lp: driver loaded but no devices found
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sda3 ...
md: adding sda3 ...
md: md2 already running, cannot run sda3
md: export_rdev(sda3)
md: ... autorun DONE.
tia -------------------
To install apcupsd from source:
-------------------------------------------------------------------------------
yum install gcc-c++
cd /usr/src
wget http://downloads.sourceforge.net/apcupsd/apcupsd-3.14.3.tar.gz
tar xvzf apcupsd-3.14.3.tar.gz
cd apcupsd-3.14.3
CFLAGS="-g -O2" LDFLAGS="-g" ./configure --enable-usb --with-upstype=usb --with-upscable=usb --prefix=/usr --sbindir=/sbin --with-cgi-bin=/var/www/cgi-bin --enable-cgi --with-css-dir=/var/www/docs/css --with-log-dir=/etc/apcupsd --enable-pthreads --enable-powerflute
make
make install
-----------------------------------------------------------------------------
to check location of apcupsd:
whereis apcupsd
If compiled correctly one of the locations will be /sbin/apcupsd
Then edit UPS device type and connection type:
nano /etc/apcupsd/apcupsd.conf
I use some older model APC Smart UPS 620 with serial interface, so I use
UPSTYPE = apcsmart
UPSCABLE = smart
To start apcupsd monitoring, type:
/etc/init.d/apcupsd start
To check status (including battery runtime etc):
/etc/init.d/apcupsd status
Dell SC-440 can run on cheap SmartUPS 620 (you can get it for $70) for ~50-60 mins.
Thank you,
Vadim
My APC ups at home is connected to one primary server.
When a power failure is detected, the server runs a script that sends shutdown commands to every computer in the house. Including trixbox.
Then, after all the workstations have shutdown, the domain controllers themselves shutdown.
Finally, UPS powers off.
I do not run any software on the trixbox or client computers for this. Just send it ssh command "poweroff".
Not only that, but when power is restored, the primary domain controller powers on all alone.
Once is comes up, stabilizes, and DNS is up, internet is up, then it sends wake-on-lan commands to all the other servers (like trixbox) to start booting up.
Basildane,
That is pretty slick what you have set up there. Would you mind sharing more info on all that? What the script looks like, what software you're using to shut down the remote PCs?
The shutdown is easy.
The UPS runs this script to shutdown the client machines.
@START "" "shutdown.exe" /m \\Worf /s /f /d p:6:12 /c "Power Failure" @START "" "shutdown.exe" /m \\Wangchung /s /f /d p:6:12 /c "Power Failure" @START "" "shutdown.exe" /m \\itchy /s /f /d p:6:12 /c "Power Failure" @START "" "shutdown.exe" /m \\calculon /s /f /d p:6:12 /c "Power Failure" @START "" "shutdown.exe" /m \\hope /s /f /d p:6:12 /c "Power Failure" @START "" "shutdown.exe" /m \\lorax /s /f /d p:6:12 /c "Power Failure" @START "" "plink.exe" -ssh -pw ***** root@scratchy /sbin/shutdown -hP now Power Failure
All the windows servers use "shutdown.exe", but the trixbox uses "plink.exe". Then the UPS shuts itself off...
For startup, the main server is set to auto-power-on when the UPS starts back up. (After the batteries charge at least 20%).
Once the server is up (windows server 2003), it has a scheduled job that runs like this:
"C:\Program Files\Aquila Technology\WakeOnLan\WakeOnLanC.exe" -w -m Itchy "C:\Program Files\Aquila Technology\WakeOnLan\WakeOnLanC.exe" -w -m Scratchy "C:\Program Files\Aquila Technology\WakeOnLan\WakeOnLanC.exe" -w -m Wangchung "C:\Program Files\Aquila Technology\WakeOnLan\WakeOnLanC.exe" -w -m Lorax
The wakeonlan program I wrote myself. It is freely available for anyone to use. You can download it here.
http://www.aquilatech.com/wakeonlan
You can use any program, but I think mine is the best. It does much more than simple wake ups.
This stuff is not optional. If you aren't doing this, its just a matter of time before you lose everything.
Slightly off topic, but WangChung is a backup server.
Its job is to run incremental backups of the entire domain every day, and full backups once a month.
The backups are grandfathered, so there are always 3 full backups on the SAN at any given time.
Wow, thanks for all the info and the link. I will have to check into plink.exe.
Basildane,
what UPS software you are running on Windows Domain Controller and what event or software feature is used to execute shutdown script?
Apcupsd also supports the PowerChute Network Shutdown protocol. It can be used to shutdown network servers running apcupsd (including windows computers).
apple01 any reason you posted to a 2 month old thread?
why? is this considered as not good practice?
I wanted to find out how the previous poster running shutdown scripts
I am running APC powerchute on the domain controller.
That software runs a script on shutdown. The script (which I posted above) calls shutdown.exe to shutdown all the workstations under it. (plink for linux machines).
I don't run anything on the client machines or servers.
They would love to sell you "network shutdown" clients that run on every machine, or an expensive serial port interface to connect servers together. That is inefficient and unnecessary.
Yes your setup is cost effective but I don't want to load APC software on mission critical server and enter root account password into the script
Good point Apple. This is for my home. I wouldn't use this architecture at work.
If you didn't want to use the root password you could preshare a certificate.
You can also remote exec without invoking a shell.
Engineer Tim in "Securing trixbox CE" and other security experts recommend disabling ssh root access.
I've found good tutorial on how to enable passwordless shutdown for remote non-privileged user:
http://www.voipphreak.ca/2007/10/22/shutdown-linux-from-windows-r...
However some additional steps required:
1) using visudo comment out line "Default requiretty", otherwise sudo will return the error “sudo: sorry, you must have a tty to run sudo”
2) set the following permissions as per http://wiki.centos.org/HowTos/Network/SecuringSSH:
$ chmod 700 ~/.ssh
$ chmod 600 ~/.ssh/authorized_keys



Member Since:
2007-02-15