Power Failure / Raid Failure/ HUD Failure / Now What --

phonebuff
Posts: 445
Member Since:
2007-02-15

TB 2.4.0 Aterisk 1.4.16 / CentOS 5, No YUMs.

Have a system that seems to be in a bad power area new UPS & New battries in other UPS unts helped some, but power was down just to long and the system failed hard.. What tool are people usinng to watch APC UPS units and shutdown when they go low ?

Now I have what appears to be a failed Raid set..
http://www.trixbox.org/forums/trixbox-forums/open-discussion/sata...

The IRCD and therefore HUDLite did not come back up had to restart by Hand, anyone write a cron to watch these two taskes and restart them if one or the other fails ?

/var/log/messages is filling up with this error.. An I have no idea what CentOS wants done, has anyone seen it before and know what the correction action should be ? Is it related to the raid failure ?

TIA....
----------------------------------------------------------------------------------------------------
Jan 30 18:20:47 trixbox1 smartd[2730]: Device: /dev/sda, not capable of SMART self-check
Jan 30 18:20:47 trixbox1 smartd[2730]: Device: /dev/sda, failed to read SMART Attribute Data
Jan 30 18:50:47 trixbox1 smartd[2730]: Device: /dev/sda, not capable of SMART self-check
Jan 30 18:50:47 trixbox1 smartd[2730]: Device: /dev/sda, failed to read SMART Attribute Data
Jan 30 19:20:47 trixbox1 smartd[2730]: Device: /dev/sda, not capable of SMART self-check
Jan 30 19:20:47 trixbox1 smartd[2730]: Device: /dev/sda, failed to read SMART Attribute Data
Jan 30 19:50:47 trixbox1 smartd[2730]: Device: /dev/sda, not capable of SMART self-check
Jan 30 19:50:47 trixbox1 smartd[2730]: Device: /dev/sda, failed to read SMART Attribute Data



phonebuff
Posts: 445
Member Since:
2007-02-15
Bump --

I could really use some input here ...



biquad
Posts: 37
Member Since:
2006-11-11
Hi, can't really help you

Hi, can't really help you with your immediate problem, but no, it appears the problem is not related with the RAID itself but rather with the SMART monitoring: if you google for "Device: /dev/sda, not capable of SMART self-check" you will come across several peolpe who report a lot of SMART messages like yours. Their cure has been to disable smartd altogether.

Besides IRCD and HUDLite, is there anything else going wrong? Can you access all of your disks? Perhaps posting a dmesg would be appropriate.

Regarding your other question, look up www.networkupstools.org or, since you are using APC, www.apcupsd.org.



apple01
Posts: 178
Member Since:
2007-05-17
UPS - power loss shutdown

Is it possible to setup Trixbox so it will shutdown itself when UPS power is lost?



phonebuff
Posts: 445
Member Since:
2007-02-15
It should be --

Well,

I would think it should be but I am going to research the two links referenced above and hopefully have something working soon. The Raid and smartd errors are more trouble some right now.

Other than my nervousness about the raid because power at this facility/ office is so unstable the system has actually been real stable.

------------------------



dave99
Posts: 30
Member Since:
2006-08-22
Quote:Is it possible to
Quote:
Is it possible to setup Trixbox so it will shutdown itself when UPS power is lost?

I use apcupsd with APC ups's on mine, works fine.



phonebuff
Posts: 445
Member Since:
2007-02-15
Did you build from code.

Dave,

Which version of TB / CentOS are you running ? Did you build the apcupsd from scratch or use a prebuild ?

Looks like this will be my best solution to solve the power detection issue, now to fight the raid problem.

---------------------------



phonebuff
Posts: 445
Member Since:
2007-02-15
More HD Issues ??

Hoping someone with more CentOS experience than I have can explain these messages from dmesg regarding md2 & sda3 after a post power failure boot....

sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 1:0:0:0: Attached scsi generic sg1 type 0
hda: ATAPI 48X CD-ROM drive, 96kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
FDC 0 is a National Semiconductor PC87306
lp: driver loaded but no devices found
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sda3 ...
md: adding sda3 ...
md: md2 already running, cannot run sda3
md: export_rdev(sda3)
md: ... autorun DONE.

tia -------------------



apple01
Posts: 178
Member Since:
2007-05-17
apcupsd install on Trixbox 2.4.2

To install apcupsd from source:
-------------------------------------------------------------------------------
yum install gcc-c++
cd /usr/src
wget http://downloads.sourceforge.net/apcupsd/apcupsd-3.14.3.tar.gz
tar xvzf apcupsd-3.14.3.tar.gz
cd apcupsd-3.14.3
CFLAGS="-g -O2" LDFLAGS="-g" ./configure --enable-usb --with-upstype=usb --with-upscable=usb --prefix=/usr --sbindir=/sbin --with-cgi-bin=/var/www/cgi-bin --enable-cgi --with-css-dir=/var/www/docs/css --with-log-dir=/etc/apcupsd --enable-pthreads --enable-powerflute
make
make install
-----------------------------------------------------------------------------
to check location of apcupsd:
whereis apcupsd

If compiled correctly one of the locations will be /sbin/apcupsd

Then edit UPS device type and connection type:
nano /etc/apcupsd/apcupsd.conf

I use some older model APC Smart UPS 620 with serial interface, so I use
UPSTYPE = apcsmart
UPSCABLE = smart

To start apcupsd monitoring, type:

/etc/init.d/apcupsd start

To check status (including battery runtime etc):
/etc/init.d/apcupsd status

Dell SC-440 can run on cheap SmartUPS 620 (you can get it for $70) for ~50-60 mins.

Thank you,
Vadim



jfinstrom
Posts: 1959
Member Since:
2007-03-07
added apple01's post to the

added apple01's post to the wiki http://trixbox.org/wiki/ups-control-apcupsd

--



Basildane
Posts: 210
Member Since:
2007-06-30
My APC ups at home is

My APC ups at home is connected to one primary server.
When a power failure is detected, the server runs a script that sends shutdown commands to every computer in the house. Including trixbox.
Then, after all the workstations have shutdown, the domain controllers themselves shutdown.
Finally, UPS powers off.

I do not run any software on the trixbox or client computers for this. Just send it ssh command "poweroff".

Not only that, but when power is restored, the primary domain controller powers on all alone.
Once is comes up, stabilizes, and DNS is up, internet is up, then it sends wake-on-lan commands to all the other servers (like trixbox) to start booting up.



domiflichi
Posts: 160
Member Since:
2007-04-06
Basildane, That is pretty

Basildane,

That is pretty slick what you have set up there. Would you mind sharing more info on all that? What the script looks like, what software you're using to shut down the remote PCs?



Basildane
Posts: 210
Member Since:
2007-06-30
The shutdown is easy. The

The shutdown is easy.
The UPS runs this script to shutdown the client machines.

@START "" "shutdown.exe" /m \\Worf      /s /f /d p:6:12 /c "Power Failure"
@START "" "shutdown.exe" /m \\Wangchung /s /f /d p:6:12 /c "Power Failure"
@START "" "shutdown.exe" /m \\itchy     /s /f /d p:6:12 /c "Power Failure"
@START "" "shutdown.exe" /m \\calculon  /s /f /d p:6:12 /c "Power Failure"
@START "" "shutdown.exe" /m \\hope      /s /f /d p:6:12 /c "Power Failure"
@START "" "shutdown.exe" /m \\lorax     /s /f /d p:6:12 /c "Power Failure"
@START "" "plink.exe" -ssh -pw ***** root@scratchy /sbin/shutdown -hP now Power Failure

All the windows servers use "shutdown.exe", but the trixbox uses "plink.exe". Then the UPS shuts itself off...

For startup, the main server is set to auto-power-on when the UPS starts back up. (After the batteries charge at least 20%).
Once the server is up (windows server 2003), it has a scheduled job that runs like this:

"C:\Program Files\Aquila Technology\WakeOnLan\WakeOnLanC.exe" -w -m Itchy
"C:\Program Files\Aquila Technology\WakeOnLan\WakeOnLanC.exe" -w -m Scratchy
"C:\Program Files\Aquila Technology\WakeOnLan\WakeOnLanC.exe" -w -m Wangchung
"C:\Program Files\Aquila Technology\WakeOnLan\WakeOnLanC.exe" -w -m Lorax

The wakeonlan program I wrote myself. It is freely available for anyone to use. You can download it here.
http://www.aquilatech.com/wakeonlan

You can use any program, but I think mine is the best. It does much more than simple wake ups.

This stuff is not optional. If you aren't doing this, its just a matter of time before you lose everything.

Slightly off topic, but WangChung is a backup server.
Its job is to run incremental backups of the entire domain every day, and full backups once a month.
The backups are grandfathered, so there are always 3 full backups on the SAN at any given time.



domiflichi
Posts: 160
Member Since:
2007-04-06
Wow, thanks for all the info

Wow, thanks for all the info and the link. I will have to check into plink.exe.



apple01
Posts: 178
Member Since:
2007-05-17
Basildane, what UPS software

Basildane,

what UPS software you are running on Windows Domain Controller and what event or software feature is used to execute shutdown script?

Apcupsd also supports the PowerChute Network Shutdown protocol. It can be used to shutdown network servers running apcupsd (including windows computers).



jfinstrom
Posts: 1959
Member Since:
2007-03-07
apple01 any reason you

apple01 any reason you posted to a 2 month old thread?

--



apple01
Posts: 178
Member Since:
2007-05-17
why? is this considered as

why? is this considered as not good practice?
I wanted to find out how the previous poster running shutdown scripts



Basildane
Posts: 210
Member Since:
2007-06-30
I am running APC powerchute

I am running APC powerchute on the domain controller.
That software runs a script on shutdown. The script (which I posted above) calls shutdown.exe to shutdown all the workstations under it. (plink for linux machines).

I don't run anything on the client machines or servers.

They would love to sell you "network shutdown" clients that run on every machine, or an expensive serial port interface to connect servers together. That is inefficient and unnecessary.



apple01
Posts: 178
Member Since:
2007-05-17
Yes your setup is cost

Yes your setup is cost effective but I don't want to load APC software on mission critical server and enter root account password into the script



Basildane
Posts: 210
Member Since:
2007-06-30
Good point Apple. This is

Good point Apple. This is for my home. I wouldn't use this architecture at work.



SkykingOH
Posts: 8082
Member Since:
2007-12-17
If you didn't want to use

If you didn't want to use the root password you could preshare a certificate.

You can also remote exec without invoking a shell.

--

Scott

aka "Skyking"



apple01
Posts: 178
Member Since:
2007-05-17
Engineer Tim in "Securing

Engineer Tim in "Securing trixbox CE" and other security experts recommend disabling ssh root access.

I've found good tutorial on how to enable passwordless shutdown for remote non-privileged user:
http://www.voipphreak.ca/2007/10/22/shutdown-linux-from-windows-r...

However some additional steps required:

1) using visudo comment out line "Default requiretty", otherwise sudo will return the error “sudo: sorry, you must have a tty to run sudo”

2) set the following permissions as per http://wiki.centos.org/HowTos/Network/SecuringSSH:

$ chmod 700 ~/.ssh
$ chmod 600 ~/.ssh/authorized_keys



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.