Mar 272010
 

Sponsored Link

** Guide Updated **

This guide is an evolution from this original guide. Unless the Kerrighed Team comes up with a substantially different version, this is the only update to this guide I will ever make as the steps are pretty much the same for all svn versions I have tested.

On this version:
- Added changes for the latest Kerrighed svn 5586
- Fixed some steps to make them more readable and error free.
- Added simple MPI example to see how your program interacts with the cluster.
- Added troubleshooting section for some situations in which the nodes do not receive the image from the controller.

Thank you all for your previous comments and emails.

Rodrigo Sarpi

------------
| internet |
------------
|
router1
|
v
+--------------------------------------------------+
| eth1 --- controller: 192.168.1.106 (given by router1)|
| eth0 --- controller: 10.11.12.1 (manually set) |
+--------------------------------------------------+
|
router2
| |
| + -->eth0--node1: 10.11.12.101 (static IP Address)
|
v
eth0--node2: 10.11.12.102 (static IP Address)

--------------------------------------------------------
Debian Lenny with default kernel 2.6.26-2-686

All steps done as root on the controller

==
Step 1:

- dhcp server will provide ip addresses to the nodes.
- tftpd-hpa will deliver the image to the nodes
- portmap converts RPC (Remote Procedure Call) program numbers into port numbers.
NFS uses that to make RPC calls.
- syslinux is a boot loader for Linux which simplifies first-time installs
- nfs will be used to export directory structures to the nodes

When installing these packages accept the default settings presented for dhcp3 and TFTP.

#apt-get install dhcp3-server tftpd-hpa portmap syslinux nfs-kernel-server nfs-common

These packages are for MPI (see under TESTING below). You can install them on the controller to compile your MPI programs, then move them to any of the nodes and start the program from the node; or you can create, compile, and execute your MPI programs on any of the nodes. Either way, you need these packages on the node to execute your MPI code no matter option you choose:

#apt-get install openmpi-bin openmpi-common libopenmpi1 libopenmpi-dev

==
Step 2:

Identify ethernet interfaces which will be used by the dhcp server.
For this setup, we are setting up "eth0″ as the network card that's
feeding the nodes of the internal network.

#nano /etc/default/dhcp3-server

INTERFACES="eth0″

==

Step 3:

General configuration for the DHCP server.
Make a backup of original configuration file in case you want to use it as a reference later on.
cat /etc/dhcp3/dhcpd.conf > /etc/dhcp3/dhcpd.conf.bkp

#nano /etc/dhcp3/dhcpd.conf

# General options
option dhcp-max-message-size 2048;
use-host-decl-names on;
deny unknown-clients;
deny bootp;

# DNS settings
option domain-name "nibiru_system"; # any name will do
option domain-name-servers 10.11.12.1; # server’s IP address: dhcp and tftp

# network
subnet 10.11.12.0 netmask 255.255.255.0 {
option routers 10.11.12.1; # server IP as above.
option broadcast-address 10.11.12.255; # broadcast address
}

# ip addresses for nodes

group {
filename "pxelinux.0″; # PXE bootloader in /var/lib/tftpboot
option root-path "10.11.12.1:/nfsroot/kerrighed"; # bootable system

#the other laptop
host node1 {
fixed-address 10.11.12.101; # first node
hardware ethernet 00:0B:DB:1B:E3:89;
}

#desktop
host node2 {
fixed-address 10.11.12.102;
hardware ethernet 00:16:76:C1:F7:D4;

}

server-name "nibiru_headnode"; # Any name will do
next-server 10.11.12.1; # Server IP where the image is. For this network it's the same machine
}

==
Step 4:

Configure the TFTP server.

#nano /etc/default/tftpd-hpa

RUN_DAEMON="yes"
OPTIONS="-l -s /var/lib/tftpboot"

==
Step 5:

Configure inetd for TFTP server.

nano /etc/inetd.conf

tftp dgram udp wait root /usr/sbin/in.tftpd /usr/sbin/in.tftpd -s /var/lib/tftpboot

==
Step 6:

This directory will hold the image for the nodes to boot from.

#mkdir /var/lib/tftpboot/pxelinux.cfg

==
Step 7:

Copy PXE bootloader to the TFTP server.

#cp -p /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot/

==
Step 8:

Fallback configuration. If the TFTP client cannot find a PXE bootload configuration
for a specific node, it will use this one.

#nano /var/lib/tftpboot/pxelinux.cfg/default

LABEL linux
KERNEL vmlinuz-2.6.20-krg
APPEND console=tty1 root=/dev/nfs nfsroot=10.11.12.1:/nfsroot/kerrighed ip=dhcp rw session_id=1

==
Step 9:

This step is optional but recommended.
In /var/lib/tftpboot/pxelinux.cfg create separate files for *each* node.
The filename should be the IP address of the node represented in HEX format.
Example: 10 --> A; 11 -->B; 12 -->C; 101 -->65.
So for 10.11.12.101 it should be 0A0B0C65.

#nano /var/lib/tftpboot/pxelinux.cfg/0A0B0C65

LABEL linux
KERNEL vmlinuz-2.6.20-krg
APPEND console=tty1 root=/dev/nfs nfsroot=10.11.12.1:/nfsroot/kerrighed ip=10.11.12.101 rw session_id=1

==
Step 10:

Future node system. This directory will have the node’s bootable files, etc.

#mkdir /nfsroot/ && mkdir /nfsroot/kerrighed

==
Step 11:

Tell NFS what to export

#nano /etc/exports

/nfsroot/kerrighed 10.11.12.0/255.255.255.0(rw,no_subtree_check,async,no_root_squash)

==
Step 12:

Tell NFS to export above file system

#exportfs -avr

==
Step 13:

Create bootable system.
some developers reported that they needed the trailing "/" after "kerrighed"
as in: debootstrap --arch i386 lenny /nfsroot/kerrighed/ http://ftp.us.debian.org/debian

#apt-get install debootstrap

debootstrap --arch i386 lenny /nfsroot/kerrighed http://ftp.us.debian.org/debian

You should get this output:
I: Retrieving Release
I: Retrieving Packages
I: Validating Packages
I: Resolving dependencies of required packages...
I: Resolving dependencies of base packages...
I: Checking component main on http://ftp.us.debian.org/debian...
I: Retrieving libacl1
I: Validating libacl1
[..]
I: Configuring tasksel-data...
I: Configuring tasksel...
I: Base system installed successfully.

==
Step 14:

Isolate our node system to configure Kerrighed.

#chroot /nfsroot/kerrighed

==
Step 15:

Set root password for isolated system

#passwd

Enter new UNIX password: (nibirucluster)
Retype new UNIX password: (nibirucluster)
passwd: password updated successfully

==
Step 16:

Use the /proc directory of the node’s image as the bootable system’s /proc directory


mount -t proc none /proc

==
Step 17:

You might get Perl related errors when installing packages on to the node. To suppress those errors, type in the console:


nano .profile

export LC_ALL=C

or

just copy and paste into console:

export LC_ALL=C

==
Step 18:

Add basic packages needed by the node to communicate with the controller


nano /etc/apt/sources.list

deb http://ftp.us.debian.org/debian/ lenny main non-free contrib
deb-src http://ftp.us.debian.org/debian/ lenny main non-free contrib

deb http://security.debian.org/ lenny/updates main
deb-src http://security.debian.org/ lenny/updates main


apt-get update
apt-get install automake autoconf libtool pkg-config gawk rsync bzip2 libncurses5 libncurses5-dev wget lsb-release xmlto patchutils xutils-dev build-essential subversion dhcp3-common nfs-common nfsbooted openssh-server

You need these packages on the node to compile and execute your MPI code (see under TESTING below).


apt-get install openmpi-bin openmpi-common libopenmpi1 libopenmpi-dev

libopenmpi-dev may not be required if you only want to execute your code on the node. However, it is needed if you want to compile your program on the node itself.

==
Step 19:

Preparing mount points


mkdir /config

==
Step 20:

Set mount points


nano /etc/fstab

# UNCONFIGURED FSTAB FOR BASE SYSTEM
proc /proc proc defaults 0 0
/dev/nfs / nfs defaults 0 0
configfs /config configfs defaults 0 0

==
Step 21

Set hosts to lookup


nano /etc/hosts

127.0.0.1 localhost

10.11.12.1 nibiru_headnode
10.11.12.101 node1
10.11.12.102 node2

==
Step 22:

Create a symlink to automount the bootable filesystem.


ln -sf /etc/network/if-up.d/mountnfs /etc/rcS.d/S34mountnfs

==
Step 23:

Configure network interfaces


nano /etc/network/interfaces

auto lo
iface lo inet loopback
iface eth0 inet manual

==
Step 24:

The username you will be using to connect to the node.


adduser (clusteruser)

Adding user `clusteruser' ...
Adding new group `clusteruser' (1000) ...
Adding new user `clusteruser' (1000) with group `clusteruser' ...
Creating home directory `/home/clusteruser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password: (nodepasswd)
Retype new UNIX password: (nodepasswd)
passwd: password updated successfully
Changing the user information for clusteruser
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y

==
Step 25

Get latest svn version 5586 as of this writing.


svn checkout svn://scm.gforge.inria.fr/svn/kerrighed/trunk /usr/src/kerrighed -r 5586

[..]
A /usr/src/kerrighed/NEWS
A /usr/src/kerrighed/linux_version.sh
U /usr/src/kerrighed
Checked out revision 5586.

==
Step 26:

Kerrighed uses linux 2.6.0. Kerrighed ignores any other version.


wget -O /usr/src/linux-2.6.20.tar.bz2 http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.20.tar.bz2 && tar jxf /usr/src/linux-2.6.20.tar.bz2 && cd /usr/src/kerrighed && ./autogen.sh && ./configure && cd kernel && make defconfig

==
Step 27:

Make sure these settings are in place. By default, b), c), d) are
enabled but it wouldn’t hurt if you double-check. a) you have to pick
the network cards of your nodes and make sure they are loadable at boot
time (* not M)

a. Device Drivers -> Network device support --> Ethernet (10 or 100Mbit)

b. File systems -> Network File Systems and enabling NFS file system support,
NFS server support, and Root file system on NFS. Make sure that the NFSv3 options
are also enabled, and again, make sure they are part of the kernel and not loadable
modules (asterisks and not Ms).

c. To enable the scheduler framework, select “Cluster support” --> “Kerrighed
support for global scheduling” --> “Run-time configurable scheduler framework”
(CONFIG_KRG_SCHED_CONFIG). You should also enable the “Compile components needed
to emulate the old hard-coded scheduler” option to mimic the legacy scheduler
(CONFIG_KRG_SCHED_COMPAT). This last option will compile scheduler components
(kernel modules) together with the main Kerrighed module, that can be used to
rebuild the legacy scheduler, as shown below.

d. To let the scheduler framework automatically load components’ modules,
select “Loadable module support” --> “Automatic kernel module loading”
(CONFIG_KMOD). Otherwise, components’ modules must be manually loaded
on each node before components that they provide can be configured.
*/


make menuconfig

==
Step 28:

Kernel compilation with Kerrighed support


cd .. && make kernel && make && make kernel-install && make install && ldconfig

==
Step 29:

Configuring Kerrighed


nano /etc/kerrighed_nodes

session=1 #Value can be 1 -- 254
nbmin=2 #2 nodes starting up with the Kerrighed kernel.
10.11.12.101:1:eth0
10.11.12.102:2:eth0


nano /etc/default/kerrighed

# Start kerrighed cluster
ENABLE=true
#ENABLE=false

# Enable/Disable legacy scheduler behaviour
LEGACY_SCHED=true
#LEGACY_SCHED=false

==
Step 30:

Exit chrooted system


exit

==
Step 31:

Out of your chrooted system copy bootable kernel.


cp -p /nfsroot/kerrighed/boot/vmlinuz-2.6.20-krg /var/lib/tftpboot/

==
Step 32:

Configure the controller to use eth0 card.
eth0 will be used by the DHCP server to feed the nodes.


ifconfig eth0 10.11.12.1
/etc/init.d/tftpd-hpa start
/etc/init.d/dhcp3-server start
/etc/init.d/portmap start
/etc/init.d/nfs-kernel-server start

==
Step 33:

Make sure nodes are connected to the router.
From the controller do:


ssh [email protected]

Then from any connected node as "clusteruser":


krgadm nodes

output:
101:online
102:online

Double-check as root from the node:


tail -f /var/log/messages

node1 kernel: Proc initialisation: done
node1 kernel: EPM initialisation: start
node1 kernel: EPM initialisation: done
node1 kernel: Init Kerrighed distributed services: done
node1 kernel: scheduler initialization succeeded!
node1 kernel: Kerrighed... loaded!

These commands are helpful. Do these as a regular node user "clusteruser".


krgcapset -d +CAN_MIGRATE
krgcapset -k $$ -d +CAN_MIGRATE
krgcapset -d +USE_REMOTE_MEMORY
krgcapset -k $$ --inheritable-effective +CAN_MIGRATE

To monitor your cluster:

top

(toggle 1 to see cpus)

Also:

cat /proc/cpuinfo | grep “model name”
cat /proc/meminfo | grep “MemFree”
cat /proc/stat

==
Step 34:

This is step is needed so you do not have to enter a password when triggering your MPI programs from the node.
If you do not generate a key, you will have to enter the node[n] password manually in order to migrate the processes.

You may not need to enter a password when generating the key. The assumption is that the controller is secure enough from the outside (no rerouting packets from eth1 --the other network card.)

Alternatively, if you feel paranoid you may enter a password then tell ssh-agent to remember it. The password will remembered for that session only.

After you log on to one of the nodes via ssh

ssh-keygen -t dsa (don't enter password)
cp /home/clusteruser/.ssh/id_dsa.pub /home/clusteruser/.ssh/authorized_keys

or


ssh-keygen -t dsa (do enter password)
cp /home/clusteruser/.ssh/id_dsa.pub /home/clusteruser/.ssh/authorized_keys
eval `ssh-agent`
ssh-add /home/clusteruser/.ssh/id_dsa (type in password associated with keys)

==
Step 35 TESTING:

A simple ‘hello world' programs that calls the MPI library.

I will create a config file where MPI can lookup information for running jobs on the cluster.
I am creating this config file on the home directory of the cluster user "clusteruser" --which is the same account we created earlier. It will be readable to the node so you can create the file as your own user from the controller. You can also log on to the any of the nodes where you will be triggering your programs from and create the file there using the "clusteruser" account:

In this situation, I opted for Door A
at controller as a regular user --your regular system username:


nano /nfsroot/kerrighed/home/clusteruser/mpi_file.conf

#Contents of mpi_file.conf. I'm listing the nodes of the cluster.
node1
node2

--------START CODE---------
/*
hello world
This "hello world" program does not deviate much from any other hello world program you have seen before. The only difference is that it has MPI calls.
*/

#include
#include
#include

int main(int argc, char *argv[])
{
char *boxname;
int rank, processes;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &processes);

boxname = (char *)calloc(100,sizeof(char));
gethostname(boxname,100);

printf("\n\nProcess: %i\nMessage: hello cruel world!\nCluster Node: %s \n\n", rank, boxname, processes);

MPI_Finalize();

return(0);
}

--------END CODE---------

On the controller compile your program using the MPI library:


mpicc hello_world.c -o hello_world

Put the MPI program in the user's home directory on one of the nodes.
In this example, I put it in /nfsroot/kerrighed/home/clusteruser:


cp hello_world /nfsroot/kerrighed/home/clusteruser/

open another shell and ssh any of the nodes. Here I log on to node1:


ssh [email protected]


mpirun -np 2 --hostfile mpi_file.conf hello_world

output:
Process: 1
Message: hello cruel world!
Cluster Node: node2

Process: 0
Message: hello cruel world!
Cluster Node: node1

============
Troubleshooting:
============
"PXE-E32: TFTP open timeout" error. It can be either that your network card is not supported or that you have something blocking the way for the TFTP server to distribute the image.

Try booting your node from CD:


cd /tmp
wget http://kernel.org/pub/software/utils/boot/gpxe/gpxe-1.0.0.tar.bz2
bunzip2 gpxe-1.0.0.tar.bz2
tar xvpf gpxe-1.0.0.tar
cd /tmp/gpxe-1.0.0/src/bin/gpxe.iso
make bin/gpxe.iso

Then burn gpxe.iso to a CD and boot the client off of it.

If still no joy try below. It might that something is blocking the way to the TFTP server.

On the controller:

in.tftpd -l
tail -1 /var/log/syslog

recvfrom: Socket operation on non-socket
cannot bind to local socket: Address already in use
solution: you can use the package rcconf to disable dhcp, portmap, nfs server, and tftp-hpa at boot time. Then start manually each server when needed.

If problem persists try disabling firewall settings
(make a backup of existing rules iptables-save > /root/firewall.rules)

iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT

[ to restore after you find out what the problem is use iptables-restore < /root/firewall.rules]

You can also try this:


netstat -anp | grep 69

udp6 0 0 :::69 :::*

note: this output looks suspicious "udp6″?

Connect with any TFTP client from the controller and on a second shell do tail -f /var/log/syslog

tftp 127.0.0.1

tftp> get pxelinux.0
Transfer timed out


tail -f /var/log/syslog

in.tftpd[2881]: received address was not AF_INET, please check your inetd config
inetd[2441]: /usr/sbin/in.tftpd: exit status 0x4c00

note: Check inet.conf file and disable IPv6

To disable IPv6 add these lines to /etc/modprobe.d/aliases

alias net-pf-10 off
alias ipv6 off

Also in /etc/hosts put a comment on these lines:

#::1 localhost ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters
#ff02::3 ip6-allhosts

Reboot and try again from head node.


tftp 127.0.0.1

tftp> get pxelinux.0
Received 15987 bytes in 0.0 seconds

All ok now, try booting your nodes.

Sponsored Link

 Posted by at 6:13 pm
  • noval

    Dear sir:
    for below step:
    svn checkout svn://scm.gforge.inria.fr/svn/kerrighed/trunk /usr/src/kerrighed -r 5586

    wget -O /usr/src/linux-2.6.20.tar.bz2 http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.20.tar.bz2 && tar jxf /usr/src/linux-2.6.20.tar.bz2

    actually what version for kerrighed that you used?

    is it The latest release is 2.4.4?. It is based on Linux 2.6.20.?

    can we just download from http://www.kerrighed.org/wiki/index.php/Download ?
    then copy it to usr/src/kerrighed?

    because my network is use proxy here, so difficult to execute those :
    step 25 & 26

    Thanks & best regards

    Noval YR

  • noval

    Dear sir,

    I would like to ask question, I already finish do install kerrighed & compile kernel on my laptop following the guidance & check all the file needed by the sistem cluster server :

    /boot/vmlinuz-2.6.20-krg (Kerrighed kernel)
    /boot/System.map (Kernel symbol table)
    /lib/modules/2.6.20-krg (Kerrighed kernel module)
    /etc/init.d/kerrighed (Kerrighed service script)
    /etc/default/kerrighed (Kerrighed service configuration file)
    /usr/local/share/man/* (Look inside these subdirectories for Kerrighed man pages)
    /usr/local/bin/krgadm (The cluster administration tool)
    /usr/local/bin/krgcapset (Tool for setting capabilities of processes on the cluster)
    /usr/local/bin/krgcr-run (Tool for checkpointing processes)
    /usr/local/bin/migrate (Tool for migrating processes)
    /usr/local/lib/libkerrighed-* (Libraries needed by Kerrighed)
    /usr/local/include/kerrighed (Headers for Kerrighed libraries)

    result all have
    but then while try to boot hostnode1 & hostnode2 (client node) can’t get connected, eventough it already get it’s own IP static

    here is the error :

    intel 810_AC97 Audio,version 1.01,05:13:06 May 16 2010
    oprofile :using timer interrupt
    TCP cubic registered
    NET: Registered protocol family 1
    NET: Registered protocol family 17
    TIPC:Activated (version 1.7.5 compiled May 16 20101 05:18:38)
    NET: Registered protocol family 30
    TIPC: Started in single node mode
    acpi_processor-0571 [00] processor_get_psd : invalid _PSD data
    acpi_processor-0571 [00] processor_get_psd : invalid _PSD data
    Using IPI Shortcut mode
    Time : tsc clocksource has been installed.
    r8169 : eth0: link up
    Sending DHCP request…………timed out!
    IP-Config:Retriying forever (NFS root)…
    r8169 : eth0: link up
    Sending DHCP request…………timed out!
    IP-Config:Retriying forever (NFS root)…

    Best regards
    thanks
    Noval YR

  • noval

    Dear sir,

    I would like to ask question, I already finish do install kerrighed & compile kernel on my laptop following the guidance & check all the file needed by the system cluster server :

    /boot/vmlinuz-2.6.20-krg (Kerrighed kernel)
    /boot/System.map (Kernel symbol table)
    /lib/modules/2.6.20-krg (Kerrighed kernel module)
    /etc/init.d/kerrighed (Kerrighed service script)
    /etc/default/kerrighed (Kerrighed service configuration file)
    /usr/local/share/man/* (Look inside these subdirectories for Kerrighed man pages)
    /usr/local/bin/krgadm (The cluster administration tool)
    /usr/local/bin/krgcapset (Tool for setting capabilities of processes on the cluster)
    /usr/local/bin/krgcr-run (Tool for checkpointing processes)
    /usr/local/bin/migrate (Tool for migrating processes)
    /usr/local/lib/libkerrighed-* (Libraries needed by Kerrighed)
    /usr/local/include/kerrighed (Headers for Kerrighed libraries)

    result all have
    but then while try to boot hostnode1 & hostnode2 (client node) can’t get connected, eventough it already get it’s own IP static

    here is the error :

    intel 810_AC97 Audio,version 1.01,05:13:06 May 16 2010
    oprofile :using timer interrupt
    TCP cubic registered
    NET: Registered protocol family 1
    NET: Registered protocol family 17
    TIPC:Activated (version 1.7.5 compiled May 16 20101 05:18:38)
    NET: Registered protocol family 30
    TIPC: Started in single node mode
    acpi_processor-0571 [00] processor_get_psd : invalid _PSD data
    acpi_processor-0571 [00] processor_get_psd : invalid _PSD data
    Using IPI Shortcut mode
    Time : tsc clocksource has been installed.
    r8169 : eth0: link up
    Sending DHCP request…………timed out!
    IP-Config:Retriying forever (NFS root)…
    r8169 : eth0: link up
    Sending DHCP request…………timed out!
    IP-Config:Retriying forever (NFS root)…

    Best regards
    thanks
    Noval YR

  • pezscu

    Hi my friend!

    I have an issue compiling the kernel, with the exact OS, Kerrighed, and kernel version that you are using, maybe you came across this and know the solution:

    When I found myself compiling the kernel (step 28 from this guide), I get the following error:
    ld: Relocatable linking with relocations from format elf64-x86-64 (application.o) to format elf32-i386 (kerrighed.o) is not supported

    thanks in advance for your help

  • togueter

    Hi, I have a question. You need to install Kerrighed in the “fronted” or only in the nodes?. I followed the guide step by step, but i had installed only in Kerrighed nodes (/ root / Kerrighed /) but not in the fronted. So what
    fronted install and changed the menu.lst to boot with the kernel Kerrighed, but attempts to boot from ntfs or from diskette. When I press the enter y. attempts to boot. “Kernel Panic”

  • rsarpi

    @pezscu: you should try the 32-bit kernel version.

    @togueter: yes, as the guide suggests, you need to install kerrighed on the frontend node (the master if you will) the clients are just clients that would receive the kerrighed image via tftp.

  • togueter

    ok thanks, but.. when I boot my ubuntu with kerrighed kernel, don’t work. the master node (fronted) try boot from VNFS or diskette but don’t boot from hard disk.