SATARAID - How to check if it's working

tommk
Posts: 30
Member Since:
2007-05-25

Hi,

I have just installed Trixbox 2.2 using the sataraid install option. 2 x 80GB SATA drives.

It seems to have worked. If I plug a HDD out it continues working off the other drive.

My question is, how do you check the state of both drives? Is there a command in CentOS? How would I know if a drive failed? How do I know if they have synched?

My problem is I am not 100% sure of how the sataraid option works.

Any help much appreciated.

Thanks



kerryg
Posts: 6793
Member Since:
2006-05-31
Within about a week or two

Within about a week or two we will have a Raid Status Module that will give you all the info you would want.

--

Kerry Garrison
http://www.VoipStore.com - http://3cxbook.com
(888) VOIPSTORE - (888) 864-7786



tommk
Posts: 30
Member Since:
2007-05-25
Well that's just

Well that's just super!

Thanks Kerry.

How will I know when it has been released?



jahyde
Posts: 2002
Member Since:
2006-06-02
you can do that 2 ways right

you can do that 2 ways right now - install webmin from the package manager and go to Hardware>Linux Raid - it will tell you there.

Or you can do a
watch cat /proc/mdstat

at the command line - that will even show you live rebuild status, once its fully mirrored it should say 2/2 on all the partitions. ctrl+C to exit of course.

--

--my PBX is run on 2 V8's



tommk
Posts: 30
Member Since:
2007-05-25
Thanks but still confused

Jahyde,

Thanks, I tried both those approaches and they both seem to agree:

I Get:

[root@asterisk1 ~]# watch cat /proc/mdstat
Every 2.0s: cat /proc/mdstat Mon Jul 23 10:42:55 2007

Personalities : [raid1]
md1 : active raid1 sdb2[1] sda2[0]
1052160 blocks [2/2] [UU]

md2 : active raid1 sdb3[1]
79256576 blocks [2/1] [_U]

md0 : active raid1 sda1[0]
104320 blocks [2/1] [U_]

unused devices:

I left it all weekend and still much the same.

What is?
md0/1/2 and sda1/2 and sdb2/3

I would have assumed sda1/2 were my Sata drives. Bare in mind I only have 2 drives.

Using Webmin I had a look too.

I'm going to guess that Linux breaks the mirroring up into 3 partitions in my case:
/
/swap
/root

The /swap partition has synched.
I'm assuming the other partitions aren't synched becuase I pulled a HDD out to test RAID.

It's back in now, so how do I force a synch again?

Thanks.



tommk
Posts: 30
Member Since:
2007-05-25
Raid Status Module

Kerry,

Any update on the RAID Statud Module?

Anyone,

Comments on my previous post re. getting RAID to synchronise?

Many thanks.



solutions4
Posts: 32
Member Since:
2006-09-15
I think I have the exact

I think I have the exact same issue.

The 'sataraid' setup gave me three partitions (md0, md1, md2) on each disk (sda1, sdb1) and when one disk was unplugged and then replugged in later, this is what I see.

From Webmin-> Hardware->Linux Raid-> md0

RAID device options
Device file /dev/md0
RAID level Mirrored (RAID1)
Filesystem status Mounted on /boot
Usable size 104320 blocks (101.88 MB)
Persistent superblock? Yes
Chunk size Default
RAID errors 1 disks have failed
RAID status clean, degraded
Partitions in RAID SCSI device B partition 1

Similar screens as above for md1 and md2 for / and /swap.

Webmin->Hardware->Partition manager

Disk Partitions
Location SCSI device A
Cylinders 19457
Size 149.05 GB
Model ATA ST3160815AS
Controller 0
Target 0

No. Type Extent Start End Use Free
1 Linux RAID 1 13
2 Linux RAID 14 144
3 Linux RAID 145 19457
Add primary partition. | Add extended partition.

Location SCSI device B
Cylinders 19457
Size 149.05 GB
Model ATA ST3160815AS
Controller 1
Target 0

No. Type Extent Start End Use Free
1 Linux RAID 1 13 /dev/md0
2 Linux RAID 14 144 /dev/md1
3 Linux RAID 145 19457 /dev/md2
Add primary partition. | Add extended partition.

Other useful output...

[root@voip4smb ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[1]
1052160 blocks [2/1] [_U]

md2 : active raid1 sdb3[1]
155131584 blocks [2/1] [_U]

md0 : active raid1 sdb1[1]
104320 blocks [2/1] [_U]

unused devices:
[root@voip4smb ~]#

So my quandry is what is the mdadmin command(s) to rebuild the array?

mdadm /dev/md0 -a /dev/sda1 --verbose
mdadm /dev/md1 -a /dev/sda2 --verbose
mdadm /dev/md2 -a /dev/sda3 --verbose

Thanks for any help!



euser4life
Posts: 180
Member Since:
2006-07-16
Try this to add drives back

For each partiton do the following as user root:

mdadm --manage /dev/md1 --add /dev/sda2

mdadm --manage /dev/md2 --add /dev/sda3

mdadm --manage /dev/md0 --add /dev/sda1

I believe you can go ahead and run each one of those commands one after the other, but you will only be able to rebuild one at a time.

You can of course check the progress by typing:
cat /proc/mdstat
I believe you can also issue
watch /proc/mdstat for realtime rebuild

To increase the raid rebuild time you can try issuing:

echo 50000 >/proc/sys/dev/raid/speed_limit_min

Let us know how you make out.

Best regards,

Will



solutions4
Posts: 32
Member Since:
2006-09-15
Stellar advice

Will,

Where do I send the beer?

Worked like a charm. Let me run them one after another.

Thanks again.

D



euser4life
Posts: 180
Member Since:
2006-07-16
Good deal

Good to hear it worked out for you. I'm just glad to be able to help for once.

Cheers,

Will



TheShniz
Posts: 213
Member Since:
2006-06-01
This thread is very helpful,

This thread is very helpful, but I'm needing a lil direction w/ some easy Q's...

How is sda1, sda2, and sda3 created... do I manually create them?

All RAID configurations that I'm familiar with are on a disk by disk level, whereas any and all partitions get copied on that particular member... not on a partition level.

Assuming I'm to create them (correct me if not)...

Looking at the choices in the Partition Manager in Webmin, I think I can safely assume to use the same Start/End, but...

Do I create them all as 'Linux Raid', Primary Partitions?
Do I format the partitions with a filesystem?

Just looking for some direction on rebuilding array with clean/new drive,
- J



jahyde
Posts: 2002
Member Since:
2006-06-02
You are pretty much right

You are pretty much right on-

Here is how your partition structure is laid out:

100mb Boot Partition: /dev/md0 = /dev/sda1 + /dev/sdb1
1GB Swap Partition: /dev/md1 = /dev/sda2 + /dev/sdb2
Root Partition (aka- rest of the drive): /dev/md2 = /dev/sda3 + /dev/sdb3

If you have a blank drive that you need to rebuild from you can easily create the partitions in Webmin:
Hardware>Partitions on Local Disks (as you seem to have found)

And yes - the type needs to be Linux RAID, and they all need to be Primary Partitions (not extended)

Next just go back into Hardware>Linux RAID and add the partitions back into the md groups. Just click on each md device and you will have the option to add partitions into the array. Just make sure to match the appropriate sdxx partition with the right mdx group as shown in the structure above. Start with md0 and work your way up, the first 2 will rebuild within seconds, and root will take anywhere from 20 minutes to 3-4 hours.

You can then watch the rebuild status from the console if you like:
watch cat /proc/mdstat

since it is software raid, a little more work is involved, but its still fairly easy.

--

--my PBX is run on 2 V8's



phonebuff
Posts: 445
Member Since:
2007-02-15
TB 2.4 CentOS5.

Saw some weird boot messages last night and now I see
--------------------------------------------------
Every 2.0s: cat /proc/mdstat Thu Jan 24 09:11:49 2008

Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
1052160 blocks [2/2] [UU]

md2 : active raid1 sdb3[1]
242983040 blocks [2/1] [_U]

unused devices:
------------------------------------------------

sd 0:0:0:0: SCSI error: return code = 0x08000002
sda: Current [descriptor]: sense key: Medium Error
Add. Sense: Unrecovered read error - auto reallocate failed

Descriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
1d 1a 8d 94
end_request: I/O error, dev sda, sector 488279444
ata1: EH complete
md: disabled device sda3, could not read superblock.
md: sda3 has invalid sb, not importing!
SCSI device sda: 488281250 512-byte hdwr sectors (250000 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 488281250 512-byte hdwr sectors (250000 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
md: autorun ...
md: considering sdb3 ...
md: adding sdb3 ...
md: sdb2 has different UUID to sdb3
md: sdb1 has different UUID to sdb3
md: sda2 has different UUID to sdb3
md: sda1 has different UUID to sdb3
md: created md2
md: bind
md: running:
md: md2: raid array is not clean -- starting background reconstruction
raid1: raid set md2 active with 1 out of 2 mirrors
md: considering sdb2 ...
md: adding sdb2 ...
md: sdb1 has different UUID to sdb2
md: adding sda2 ...
md: sda1 has different UUID to sdb2
md: created md1
md: bind
md: bind
md: running:
raid1: raid set md1 active with 2 out of 2 mirrors
md: considering sdb1 ...
md: adding sdb1 ...
md: adding sda1 ...
md: created md0
md: bind
md: bind
md: running:
raid1: raid set md0 active with 2 out of 2 mirrors
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: (BMDMA stat 0x25)
ata1.00: cmd 25/00:06:92:8d:1a/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 3072 in
res 51/40:00:94:8d:1a/40:00:1d:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: (BMDMA stat 0x25)
ata1.00: cmd 25/00:06:92:8d:1a/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 3072 in
res 51/40:00:94:8d:1a/40:00:1d:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/133
ata1: EH complete
SCSI device sda: 488281250 512-byte hdwr sectors (250000 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 488281250 512-byte hdwr sectors (250000 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: (BMDMA stat 0x25)
ata1.00: cmd 25/00:08:00:94:1a/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 in
res 51/40:00:03:94:1a/40:00:1d:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: (BMDMA stat 0x25)
ata1.00: cmd 25/00:08:00:94:1a/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 in
res 51/40:00:03:94:1a/40:00:1d:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/133
---------------------
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: (BMDMA stat 0x25)
ata1.00: cmd 25/00:06:92:8d:1a/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 3072 in
res 51/40:00:96:8d:1a/40:00:1d:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: (BMDMA stat 0x25)
ata1.00: cmd 25/00:06:92:8d:1a/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 3072 in
res 51/40:00:96:8d:1a/40:00:1d:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/133
sd 0:0:0:0: SCSI error: return code = 0x08000002
sda: Current [descriptor]: sense key: Medium Error
Add. Sense: Unrecovered read error - auto reallocate failed

Descriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
1d 1a 8d 96
end_request: I/O error, dev sda, sector 488279446
Buffer I/O error on device sda3, logical block 242983043
ata1: EH complete
SCSI device sda: 488281250 512-byte hdwr sectors (250000 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 488281250 512-byte hdwr sectors (250000 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: md2: orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 51348279
ext3_orphan_cleanup: deleting unreferenced inode 4653065
ext3_orphan_cleanup: deleting unreferenced inode 4653100
ext3_orphan_cleanup: deleting unreferenced inode 4653099
ext3_orphan_cleanup: deleting unreferenced inode 4653098
ext3_orphan_cleanup: deleting unreferenced inode 4653096
ext3_orphan_cleanup: deleting unreferenced inode 4653094
ext3_orphan_cleanup: deleting unreferenced inode 4653093
ext3_orphan_cleanup: deleting unreferenced inode 4653092
ext3_orphan_cleanup: deleting unreferenced inode 4653087
ext3_orphan_cleanup: deleting unreferenced inode 4653085
ext3_orphan_cleanup: deleting unreferenced inode 4653083
ext3_orphan_cleanup: deleting unreferenced inode 4653081
ext3_orphan_cleanup: deleting unreferenced inode 4653078
ext3_orphan_cleanup: deleting unreferenced inode 4653077
ext3_orphan_cleanup: deleting unreferenced inode 4653064
ext3_orphan_cleanup: deleting unreferenced inode 4653063
ext3_orphan_cleanup: deleting unreferenced inode 4653062
ext3_orphan_cleanup: deleting unreferenced inode 4653061
ext3_orphan_cleanup: deleting unreferenced inode 4653060
EXT3-fs: md2: 20 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: Disabled at runtime.
SELinux: Unregistering netfilter hooks
audit(1201114669.903:2): selinux=0 auid=4294967295
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 1:0:0:0: Attached scsi generic sg1 type 0

So do I need to now do this to get the MD2 back in sync ?
mdadm --manage /dev/md2 --add /dev/sda3



phonebuff
Posts: 445
Member Since:
2007-02-15
Recover Raid --- Bump --

Can anyone help me with the this issue..

TIA...



phonebuff
Posts: 445
Member Since:
2007-02-15
Can anyone offer help --

Can anyone advise the correct way to get this raid back in sysnc ....

Is it as easy as
mdadm --manage /dev/md2 --add /dev/sdb3
mdadm --manage /dev/md1 --add /dev/sdb2
mdadm --manage /dev/md0 --add /dev/sdb1
mdadm --manage /dev/md1 --add /dev/sda2
mdadm --manage /dev/md0 --add /dev/sda1

This is what I get when I check it ....

watch cat /proc/mdstat
Every 2.0s: cat /proc/mdstat Wed Feb 20 23:37:33 2008

Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
1052160 blocks [2/2] [UU]

md2 : active raid1 sdb3[1]
242983040 blocks [2/1] [_U]

unused devices:



euser4life
Posts: 180
Member Since:
2006-07-16
What about webmin?

So did issuing:

mdadm --manage /dev/md2 --add /dev/sda3

Not bring md2 back to healthy status?

If not have you tried using webmin to add it back... (never used this before)



eeknz
Posts: 173
Member Since:
2006-08-13
Phonebuff beware

If I were you, I would be very alarmed at this part of your log:
sd 0:0:0:0: SCSI error: return code = 0x08000002
sda: Current [descriptor]: sense key: Medium Error
Add. Sense: Unrecovered read error - auto reallocate failed

When a disk encounters a bad block, it tries to re-map it into some spare good blocks that the disk has 'spare'. Your disk has either run out of extra spaces to remap bad blocks to, or more likely, the bad bit is very bad and no longer readable. It does say Unrecovered read error.

If it was my disk, I'd low level format it, put it back and see if it builds without error. If it still has errors, throw it away.

Others may think I'm just being paranoid, if so, yell out.



phonebuff
Posts: 445
Member Since:
2007-02-15
Have not tried --

Enduser -- Have not tried was looking for some feedback first.. No Webmin is not installed.

Eknz -- Once I have the mirror resynced, I plan to do just that but hoping I can wait till TB 2.6 is available and kill two birds at once..

Thanks..



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.