Understanding the boot process.
Understanding the CentOS/trixbox boot process and runlevels
Understanding the Linux boot process is a key element in being able to fully troubleshoot and administer a Linux based server. Without this understanding, the boot process just seems like magic, and being able to figure out what's really going on can take a very, very long time.
At its most simplified, the boot process is pretty basic:
1) the bootloader loads the kernel
2) the kernel mounts the root filesystem and runs init
3) init runs everything else
What's actually happening is a bit more than that, though:
1) the bootloader may present options to the operator, who can then choose different operating systems, kernels and kernel options for the system to boot. We will be assuming a RHEL (Red Hat Enterprise Linux) based system from here on out, other systems may be slightly different.
2) the selected kernel is passed information about where to find the things it needs to boot up. This information includes:
a) the location of the root filesystem
b) kernel options (including the runlevel, otherwise init decides)
c) the location of the initial RAM disk (initrd.)
3) the initial RAM disk (initrd) contains a minimal Linux system with modules that the kernel may need to mount the root filesystem. There will be more on initrd later.
4) once the kernel has loaded everything it needs from the initial RAM disk, it mounts the "real" root file system (/) and runs process 1: init.
5) init determines the default runlevel (again, more on this later) and runs the programs, applications and daemons associated with that runlevel.
6) your system is now "up."
Let's dig a little deeper now. I apologize if there are concepts you don't yet understand that I gloss over. I have to sacrifice some detail for brevity and readability. If there's some subject you'd like to know more about, just ask and I'll see if I can write something up.
We're going to cover the following subjects:
1) the bootloader
2) initrd
3) runlevels
4) init and RC scripts
5) an example rc script and chkconfig
6) rc.local
The boot loader
The boot loader (technically a "second stage boot loader") has the job of reading the OS from disk and getting it running. RHEL, and hence CentOS, use a loader called GRUB (which stands for "GRand Unified Bootloader") to load the OS. Other bootloaders that you may have heard of (or sworn at) include the Windows NT (and later) bootloader ntldr, and the older Linux Loader, LILO. There are plenty more, but this isn't a history, it's a howto.
Regardless of the specific second stage bootloader, it's usually installed on the primary disk drive's master boot record "MBR" and invoked by the BIOS' first stage bootloader. That's really all you need to know about that, except that it's possible to make a modern BIOS boot from any drive in the system, and if the drive you pick doesn't have an MBR with a second stage boot loader that's configured correctly then the boot process will fail. This catches some people who re-use a hard drive from another system off-guard sometimes when the BIOS tries to boot that drive.
GRUB on trixbox is configured to be menu driven. It's important to understand that GRUB really has a command line interface and that the menu is an interface to that. That is to say, the GRUB menu isn't really GRUB, it's simply an interface to it (much like FreePBX is an interface to Asterisk, but that's pushing the analogy pretty far.)
That being said, you can use the grub menu to edit a boot string for troubleshooting or system maintenance purposes.
For example, if you boot up a GRUB system you might see something like:

One common task you would do from the GRUB screen (besides wait for the default boot) is to boot into single user mode. We will do that now as an example.
The first thing you do is press 'e' to edit the entry (if you had multiple entries you could use the arrow keys to move to the one you want to edit.)
Now you should see something like this:

The three lines you see here are as follows:
1) root (hd0,0)
2) kernel /vmlinuz-2.6.18-53.1.4.el5 ro root=LABEL=/ acpi=off noapic nosmp nolapic clock=pit
3) initrd /initrd-2.6.18-53
(Yours will probably be slightly different, this is a vmware install.)
Line one is the root of GRUB. That's important to note, it is not the root filesystem. hd0,0 means the first partition on the first drive is the root of the /boot directory, everything that comes after is relative to that directory.
Line two tells grub which kernel we're booting, where it's located, and what the options are.
The /vmlinuz... tells us that the kernel is in the root of the filesystem located on hd0,0. You can verify this by doing "ls /boot" when the system is up. (Incidentally, the vmlinuz indicates that the kernel is compressed -- they all are these days to get around the 640k limit that's still imposed in PC hardware. If it was a non compressed kernel, the convention is to call it vmlinux... instead.)
The options are:
a) ro -- this means mount read only
b) root=LABEL=/ -- this is the actual root filesystem.
c) acpi=off noapic, etc -- these are options needed for vmware.
The read only option is just a safety issue, if everything is OK the system will remount the root filesystem as read-write later in the boot process.
The LABEL=/ is simply a filesystem label that was applied when you installed trixbox. This is handy (usually) because if you specified, say /dev/hda2 instead of a label, then if you moved the position of your hard drive it would no longer boot. The drawback is that if you have two drives that are labeled the same you'll run into problems.
You can safely ignore the rest of the options for now, if you'd like to know what they do, feel free to google them.
Finally, the third line (initrd) tells GRUB which initrd to use and where it's located, much like the kernel, but there are no options.
If we want to boot into single user mode, we move the cursor over the "kernel" line and press "e" again, then go to the end of the line, add the word "single" then press enter.

Finally, press "b" to boot the selected item (the cursor should be on the "kernel" line after you hit enter, but it doesn't have to be to boot -- you can verify that your change saved by editing the line again and verifying that the word "single" is still there.
NOTE: the "single" option does not save permanently, any change you make interactively is temporary and will revert back on the next boot. If you want to make permanant changes, you'll need to edit the grub config file "/boot/grub/grub.conf".
Once you've booted, you'll notice you'll be dumped right into a shell with root access. This right here is why servers should be physically protected: an attacker with physical access can easily gain root privileges by doing what we just did. There are ways around this, such as adding bios passwords, grub passwords, encrypting file systems, etc. But it's best to just lock up the damn server.
initrd
As mentioned, initrd is the "initial RAM disk." This is where modules are stored that the kernel needs to boot. There are two ways that Linux "drivers" can work, they can be compiled into the kernel, giving a small performance boost, or can be dynamically loaded when needed, leading to a slimmer kernel that only loads what it needs. Things can be mixed and matched, support for standard HD controllers can be compiled in, while non-standard hardware can be supported by modules loaded at runtime.
Modules present a problem, however, when they're stored on a disk that the kernel can't access at boot time, presenting a catch-22 situation.
Say, for example, you have a 3-ware RAID card that you've compiled modules for. Now let's say you need to boot off of a drive attached to that card, but when your kernel boots it needs a module stored on that drive to access that drive so you can get the module stored on that drive. See the problem?
This is where initrd comes in. Normally the system installer or kernel compilation and installation process will create your initial ramdisk for you, but there are some rare occasions where you may be called on to create one yourself. That's a little bit outside the scope of this document, but there is plenty of information on the Internet to help you with that, should you need it. The command you use is "mkinitrd" but you have to pass it a lot of options, then edit your grub.conf, and possibly do some other things I'm not thinking of at the moment.
Regardless, just know that that's what initrd is for, so if you ever see the dreaded message "can't find initial ram disk" you at least know it's related to initrd.
runlevels
runlevels are a pretty simple concept. They define system states in which the system can be running. The runlevels are as follows:
0) Halt -- halts the system, all processes are stopped and the system halts.
1) Single user mode (as we booted into in the GRUB section.)
2) unused in default CentOS, but you can use it if you want
3) Multi-user mode -- the default for trixbox
4) another unused mode in CentOS
5) Mult-user mode with the X Window System.
6) Reboot -- all processes are stopped and then the system reboots.
You switch runlevels by calling init with the runlevel:
init 5
will switch to runlevel 5 and
init 3
will switch you back to runlevel 3.
Also, "init 0" and "init 6" are equivalent of doing "shutdown -h now" and "shutdown -r now" respectively.
init and RC scripts
Init is process number 1. The great grand-daddy of all running processes on a system. Everything running is a sub-process of init, and you can see this for yourself if you run the command "pstree".
When init first runs, it checks in /etc/inittab to determine what to run next. /etc/inittab tells init that system initialization (si) is done by running /etc/rc.d/rc.sysinit. init will also continue to run everything else in /etc/inittab. init will then figure out the runlevel. Once it knows which runlevel it needs to be in it will stop anything that needs to be stopped and start anything that needs to be started for that particular runleve.
It does this by looking in the directory /etc/rcX.d where X is the runlevel number. By default, your trixbox boots into runlevel 3, so init looks in /etc/rc3.d and finds all the symlinks that start with S and a number, for example "S85httpd." If you look in that directory you may see symlinks that start with a K. You've probably guessed by now that S means start and K means kill. The numbers tell init in which order the scripts should run. "S85httpd" will run after "S64mysqld".
When switching from one runlevel to another, init is smart enough to know that if a program should run in both it will stay running, but if it should run in one but not the other it will kill or start the process, depending on the actual need.
All of the SXXdaemon files are actually symlinks to the actual scripts that are kept in /etc/init.d. Most scripts can take arguments (arguments are flags or commands included on the command line) that can determine what the script should actually do.
Some common arguments are: start, stop, restart and status. Some scripts may provide other options, depending on their purpose, but all scripts should support start and stop and the very minimum (and, in my opinion, restart, since that operation can just be stop then start called within the script.)
You invoke these from the command line by doing:
service SERVICENAME ACTION
So, for example, to restart httpd you would do:
service httpd restart
And Apache would stop and start.
An example rc script and the chkconfig command
Let's say we have the hypothetical new command "ecute" that needs to run at startup. It's a program that we've created to send a command to a piece of hardware that will electrocute users through a wire we've inserted into their chairs. This is so we can react in the proper way if they complain about voice quality.
How do we get this to run at boot?
The first thing we do is create a file called "ecute" in /etc/init.d by doing this:
touch /etc/init.d/ecute
and we also want it to be executable so we do:
chmod 755 /etc/init.d/ecute
and if it's not already owned by root:
chown root.root /etc/init.d/ecute
Now we need to create the actual file.
Use your favorite editor to open up /etc/init.d/ecute and the first line we want is a special line that tells the system that we have an executable shell script:
#!/bin/bash
NOTE: rc scripts don't have to be written in bash or sh, that's just the convention. You could write them in perl, or python, or it could be a C binary, as long as it acts like init expects it to. (A binary is a bad idea, but it can be done.)
We probably want some sort of header with some information in it, so we do:
# # ecute Starts the ecute daemon # # # chkconfig: 2345 15 99 # description: the ecute daemon runs the custom hardware that electrocutes \ # end users who irritate us.
Let's explain that chkconfig line: that lets chkconfig know how to handle your init script so you can use the chkconfig command to have it start and stop in certain runlevels.
The "2345" means we want it to run in runlevels 2, 3, 4 and 5 and the 15 and 99 are the start and stop priorities, respectively. This means that when chkconfig is used, it will create an S15ecute and K99ecute files in the chosen runlevels.
Both the chkconfig and description lines are necessary, and note the use of a \ to combine lines for the description.
Red Hat is kind enough to provide us with some functions we can use to make our lives easier. If you'd like to know more about these functions you'll have to do a bit of research on your own. Suffice it to say we're going to use them:
#Source function library. . /etc/init.d/functions
We need to set some values, RETVAL is a variable we'll use to store a return code (which tells us if something worked or not) and umask sets a file creation mask (google it):
RETVAL=0 umask 0777
And now here are our actual functions:
start() {
echo -n $"Starting ecute: "
daemon ecute $SYSLOGD_OPTIONS
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && touch /var/lock/subsys/ecute
return $RETVAL
}
stop() {
echo -n $"Shutting down ecute: "
killproc ecute
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/ecute
return $RETVAL
}
restart() {
stop
start
}
Now our case statement:
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
*)
echo $"Usage: $0 {start|stop|restart}"
exit 1
esac
And we exit:
exit $?
This is just one of many ways to write an init script, and there are many other examples on the Internet that you can look at.
Now that the script has been created, you need to make chkconfig register the script and do its magic:
chkconfig --add ecute
And to change the configuration later you can do something like:
chkconfig --levels 24 ecute off
which will turn off ecute in runlevels 2 and 4.
rc.local
There is one last thing I want to cover, and that's the rc.local file. This is an rc script that gets run very last during the boot sequence. For quick one-offs or testing, you can always throw a command at the end of rc.local to start something. It's almost always better to write an init script, but sometimes time constraints dictate this for you. (Although it's probably not a bad idea to have a few handy template rc scripts around to help.)
Anwyay, the default trixbox rc.local script looks like this:
#!/bin/sh # # This script will be executed *after* all the other init scripts. # You can put your own initialization stuff in here if you don't # want to do the full Sys V style init stuff. touch /var/lock/subsys/local /etc/trixbox/runonce /home/rhino/php-bin/generateConfigs.php /home/rhino/php-bin/usbprobe.php /home/rhino/bin/rsli_wrapper& /usr/local/sbin/motd.sh > /etc/motd /sbin/modprobe r1t1 /sbin/udevstart /usr/sbin/ztcfg /usr/sbin/fxotune -s /usr/sbin/amportal start
And you can already see that the trixbox devs have decided that rc.local is a good place for this sort of thing. It's not, in my opinion, but opinions differ, and this is what they've chosen.
Let's say, for example, that you have a custom daemon you want to run as a test, you can just tack it on to the end of rc.local:
#!/bin/sh # # This script will be executed *after* all the other init scripts. # You can put your own initialization stuff in here if you don't # want to do the full Sys V style init stuff. touch /var/lock/subsys/local /etc/trixbox/runonce /home/rhino/php-bin/generateConfigs.php /home/rhino/php-bin/usbprobe.php /home/rhino/bin/rsli_wrapper& /usr/local/sbin/motd.sh > /etc/motd /sbin/modprobe r1t1 /sbin/udevstart /usr/sbin/ztcfg /usr/sbin/fxotune -s /usr/sbin/amportal start /root/ecute/ecute &
And you're good to go.
Hopefully you now understand more than you did about how your trixbox boots and how services are stopped and started. If you have questions, feel free to post in the trixbox forums and then PM me a link to the thread. Please don't email me or PM me directly with questions, it's better if everyone can see the discussion. Unless you want to pay me, then email away.
