How to Set Up a High Performance Cluster (HPC) Using Debian Lenny and Kerrighed

There are many guides found on the net describing Kerrighed and how to set it up using Ubuntu and others.  However, to the best of my knowledge there isn’t a step by step guide specifically designed for Kerrighed using Debian Lenny.  So here it is.  You can set up your own Beowulf Cluster using Debian Lenny and Kerrighed in about 50 steps.  Many of the steps described here are taken straight from other guides because there are applicable here (see sources at the bottom.)  Many other steps are based on my own trial and error.

Please review, enjoy, and provide feedback if you can.  As I’m well aware of the limitations of this guide, your friendly and polite input is highly appreciated.
Thanks, Rodrigo Sarpi.

My Setup
————

Cluster controller or Head node: Debian Lenny or any other distribution. The single image passed on to the nodes will be Debian Lenny though.
Cluster controller uses eth1 to get its own NATed IP address from the router/DHCP server (which in turn gets it from the cable modem.)
Cluster controller uses eth0 (static address 10.11.12.1) to be connected to internal network via switch or router. eth0 is also the net device
used by the DHCP server that feeds IP addresses to the connected nodes.

—-
node1: eth0 (10.11.12.101) —> connected to internal network via switch or router
—-
node2: eth0 (10.11.12.102) —> connected to internal network via switch or router
—-

————
| internet |
————
|
router1
|
v
eth1
Head node gets 192.168.1.106 from router
Head node is also a DHCP
eth0
|
router2
|
|–> 10.11.12.101 (static ip for node 1: eth0)
|
v
10.11.12.102 (static ip for node2: eth0)

All steps done as root on the headnode
——————————————————–

0. apt-get update

#dhcp server will provide ip addresses to the nodes.
#tftpd-hpa will deliver the image to the nodes
#portmap converts RPC (Remote Procedure Call) program numbers into port numbers. NFS uses that to make RPC calls.
#syslinux is a boot loader for Linux which simplifies first-time installs
#nfs will be used to export dir structs to the nodes

1.

apt-get install dhcp3-server tftpd-hpa portmap syslinux nfs-kernel-server nfs-common

#identify ethernet interfaces which will be used by the dhcp server
2. vi /etc/default/dhcp3-server
INTERFACES=”eth0″

#general configuration for the dhcp server
3. /etc/dhcp3/dhcpd.conf
# General options
option dhcp-max-message-size 2048;
use-host-decl-names on;
deny unknown-clients;
deny bootp;

# DNS settings
option domain-name “nibiru_system”; # “any name will do”
option domain-name-servers 10.11.12.1; # server’s IP address

# network
subnet 10.11.12.0 netmask 255.255.255.0 {
option routers 10.11.12.1; # server IP as above.
option broadcast-address 10.11.12.255; # broadcast address
}

# ip addresses for nodes
group {
filename “pxelinux.0”; # PXE bootloader.
option root-path “10.11.12.1:/nfsroot/kerrighed”; # bootable system

#the other laptop
host node1 {
fixed-address 10.11.12.101; # first node
hardware ethernet 11:11:DF:3C:E5:99;
}
#desktop
host node2 {
fixed-address 10.11.12.102;
hardware ethernet 11:33:77:C1:F7:D3;

}

server-name “nibiru_headnode”; # “Any name will do”
next-server 10.11.12.1; # Server IP
}

#configure trivial ftp server.
4. vi /etc/default/tftp-hpa
RUN_DAEMON=”yes”
OPTIONS=”-l -s /var/lib/tftpboot”

#configure inetd for tftp server. It should be all set but double-check anyway.
#it should look like below.
5. vi /etc/inetd.conf
tftp dgram udp wait root /usr/sbin/in.tftpd /usr/sbin/in.tftpd -s /var/lib/tftpboot

#copy PXE bootloader to the TFTP server
6. cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot

#default configuration for all the nodes.
7. mkdir /var/lib/tftpboot/pxelinux.cfg

#fallback configuration. If the TFTP cannot find a PXE bootload configuration for a specific node it will use this one.
8. vi /var/lib/tftpboot/pxelinux.cfg/default
LABEL linux
KERNEL vmlinuz-2.6.20-krg
APPEND console=tty1 root=/dev/nfs nfsroot=10.11.12.1:/nfsroot/kerrighed ip=dhcp rw session_id=1

#For some reason this step didn’t make any difference on my cluster. Meaning, if you just do step 8 above will do just fine.
# if you have any inputs please let me know.

9. In /var/lib/tftpboot/pxelinux.cfg create separate files for *each* node. The filename should be the IP address of the node represented in HEX format.
Example: 10 –> A; 11 –>B; 12 –>C; 101 –>65. Any scientific calculator can do it

vi /var/lib/tftpboot/pxelinux.cfg/ABC65
LABEL linux
KERNEL vmlinuz-2.6.20-krg
APPEND console=tty1 root=/dev/nfs nfsroot=10.11.12.1:/nfsroot/kerrighed ip=dhcp rw session_id=1

vi /var/lib/tftpboot/pxelinux.cfg/ABC66
LABEL linux
KERNEL vmlinuz-2.6.20-krg
APPEND console=tty1 root=/dev/nfs nfsroot=10.11.12.1:/nfsroot/kerrighed ip=dhcp rw session_id=1

#future isolated system. This dir will have the node’s bootable files, etc.
10. mkdir /nfsroot/kerrighed

#tell NFS what to export
11. vi /etc/exports
/nfsroot/kerrighed 10.11.12.0/255.255.255.0(rw,no_subtree_check,async,no_root_squash)

#tell NFS to do export above file system
12. exportfs -avr

#creating bootable system
13 apt-get install debootstrap
debootstrap –arch i386 lenny /nfsroot/kerrighed http://ftp.us.debian.org/debian

#will create and isolate our future system
14. chroot /nfsroot/kerrighed

#set root password for isolated system
15. passwd

#use the /proc directory of the node’s image as the bootable system’s /proc directory
16. mount -t proc none /proc

#in case you’ll want to install new /more packages
17. vi /etc/apt/sources.list
deb http://security.debian.org/ lenny/updates main
deb-src http://security.debian.org/ lenny/updates main

deb http://volatile.debian.org/debian-volatile lenny/volatile main
deb-src http://volatile.debian.org/debian-volatile lenny/volatile main

#multimedia
deb ftp://ftp.debian-multimedia.org stable main

#you might get Perl related errors. To suppress those errors, type in the console:
17b. vi .profile or just copy and paste into console
export LC_ALL=C

18. apt-get update

#packages needed by the node to communicate with the controller
19. apt-get install dhcp3-common nfs-common nfsbooted openssh-server

#set mount points
20. mkdir /configs

21. vi /etc/fstab
#
proc /proc proc defaults 0 0
/dev/nfs / nfs defaults 0 0
configfs /config configfs defaults 0 0

#note: if above doesn’t do it (you will see a message regarding that on the nodes’ screen)
# you just do on the ALL nodes manually
mkdir /configfs
mount -t configfs none /config

#set hosts to lookup
21. vi /etc/hosts
127.0.0.1 localhost

10.11.12.1 nibiru_headnode
10.11.12.101 node1
10.11.12.102 node2

#create a symlink to automount the bootable filesystem.
#It will use /dev/nfs on the controller when it starts up.
#double-check you are not using any other service that starts with /etc/rcS.d/S34*
#In Debian Lenny with all the defaults (no apt-get upgrade) you should be all right by doing below
22. ln -sf /etc/network/if-up.d/mountnfs /etc/rcS.d/S34mountnfs

#configure net interfaces.
23. vi /etc/network/interfaces
auto lo
iface lo inet loopback
iface eth0 inet manual

# The primary network interface
#allow-hotplug eth0
#iface eth0 inet dhcp

#allow-hotplug eth1
#iface eth1 inet dhcp

#the username you will be using on the node.
24. adduser

#add basic packages
25.apt-get install automake autoconf libtool pkg-config gawk rsync bzip2 libncurses5 libncurses5-dev wget lsb-release xmlto patchutils xutils-dev build-essential subversion

#get that version. I have not tested these steps on any other version.
26. svn checkout svn://scm.gforge.inria.fr/svn/kerrighed/trunk /usr/src/kerrighed -r 5426

#get the kernel. Kerrighed uses linux 2.6.0. Any other will kernel version will fail. Actually, any other kernel will be #ignored by Kerrighed. Period. It will insist that you need to download that version.

27. wget -O /usr/src/linux-2.6.20.tar.bz2 http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.20.tar.bz2

28. tar jxf /usr/src/linux-2.6.20.tar.bz2

29. cd /usr/src/kerrighed

30. ./autogen.sh

31. ./configure

32. cd kernel

33. make defconfig

34. make menuconfig

Make sure these settings are in place. By default, b), c), d) are enabled but it wouldn’t hurt if you double-check. a) you have to pick the network cards of your nodes and make sure they are loadable at boot time (* not M)

a. Device Drivers -> Network device support

b. File systems -> Network File Systems and enabling NFS file system support, NFS server support, and Root file system on NFS. Make sure that the NFSv3 options are also enabled, and again, make sure they are part of the kernel and not loadable modules (asterisks and not Ms). Once this is done, exit by pressing Escape twice.

c. To (re-)enable the scheduler framework, select “Cluster support” –> “Kerrighed support for global scheduling” –> “Run-time configurable scheduler framework” (CONFIG_KRG_SCHED_CONFIG). You should also enable the “Compile components needed to emulate the old hard-coded scheduler” option to mimic the legacy scheduler (CONFIG_KRG_SCHED_COMPAT). This last option will compile scheduler components (kernel modules) together with the main Kerrighed module, that can be used to rebuild the legacy scheduler, as shown below.

d. To let the scheduler framework automatically load components’ modules, select “Loadable module support” –> “Automatic kernel module loading” (CONFIG_KMOD). Otherwise, components’ modules must be manually loaded on each node before components that they provide can be configured.

35. cd ..

36. make kernel

37. make

38. make kernel-install

39. make install

40. ldconfig

##Configuring Kerrighed
41. vi /etc/kerrighed_nodes
session=1 #Value can be 1 – 254
nbmin=2 #2 nodes starting up with the Kerrighed kernel.
10.11.12.101:1:eth0
10.11.12.102:2:eth0

42. vi /etc/default/kerrighed
# If true, enable Kerrighed module loading
ENABLE=true

#exit chrooted system
43 exit

#out of your chrooted system.
44. cp /nfsroot/kerrighed/boot/vmlinuz-2.6.20-krg /var/lib/tftpboot/

45.
#You need configure your eth0 card. That one that will be used by the dhcp server to feed the nodes.
ifconfig eth0 10.11.12.1

#pretty self-explanatory below
/etc/init.d/tftpd-hpa start
/etc/init.d/dhcp3-server start
/etc/init.d/portmap start
/etc/init.d/nfs-kernel-server start

46.
##Make sure nodes are connected to the router. From head node:
ssh @10.11.12.101

#pick a node any node…
krgadm nodes
krgadm cluster start
/usr/local/bin/krg_legacy_scheduler
krgcapset -d +CAN_MIGRATE
krgcapset -k $$ -d +CAN_MIGRATE
krgcapset -d +USE_REMOTE_MEMORY
krgcapset -k $$ –inheritable-effective +CAN_MIGRATE

———–
Side notes
———–
If you feel like a machine is getting overworked/overheated you can transfer processes to a diff node.
example:
#command

migrate
102:1
/usr/local/bin/krg_legacy_scheduler

#test with this. Launch it from the node
perl -e ‘for (1..10000000){for (1..10000000){}}’&

top
#(toggle 1 to see cpus)

you can also use:
cat /proc/cpuinfo | grep “model name”
cat /proc/meminfo | grep “MemFree”
cat /proc/stat

If you need more applications,
export LC_ALL=C
chroot /nfsroot/kerrighed
apt-get install

exit chrooted system, restart nfs dhcp tftp daemons, and restart cluster nodes.

On the controller or server, you may want to reroute traffic to the nodes. Let’s say you want to browse the www from one the nodes.

#forward traffic between eth1/eth0
echo 1 > /proc/sys/net/ipv4/ip_forward
#test
cat /proc/sys/net/ipv4/ip_forward

iptables -A PREROUTING -t nat -i eth1 -p tcp –dport 80 -j DNAT –to 10.11.12.101:80
iptables -A FORWARD -p tcp -i eth1 -o eth0 -d 10.11.12.101 –dport 80 -j ACCEPT

Also, on the headnode you can install “rcconf”. Deactivate the dhcp, nfs, portmap, and tftp daemons. You may not need those daemons at boot time. Also, they may hang for a little while when booting up.
In my case, I just start them up when needed.

on one of the nodes, make sure the cluster is running
tail /var/log/messages
Oct 5 11:00:48 node1 kernel: EPM initialisation: done
Oct 5 11:00:48 node1 kernel: Init Kerrighed distributed services: done
Oct 5 11:00:48 node1 kernel: scheduler initialization succeeded!
Oct 5 11:00:48 node1 kernel: Kerrighed… loaded!
Oct 5 11:00:48 node1 kernel: Try to enable bearer on lo:<5>TIPC: Enabled bearer , discovery domain <1.1.0>, priority 10
Oct 5 11:00:48 node1 kernel: ok
Oct 5 11:00:48 node1 kernel: Try to enable bearer on eth0:<5>TIPC: Enabled bearer , discovery domain <1.1.0>, priority 10
Oct 5 11:00:48 node1 kernel: ok
Oct 5 11:00:48 node1 kernel: TIPC: Established link <1.1.102:eth0-1.1.103:eth0> on network plane B
Oct 5 11:00:48 node1 kernel: Kerrighed is running on 2 nodes

############################

TODO:
* A script that creates all PXE bootloader files automatically for each node’s ip address I have on my network. Unnecessary for now since I only have two nodes.
* A script to automate this tedious process of typing each step by hand. Perhaps that can evolve into a custom made LiveCD although there are similar projects such as PelicanHPC.
* Explore more parallel application programming and provide examples. For instance, provide examples using LAM. Get started with apt-get install lam4-dev lam-mpidoc

Many thanks to:
——————–
https://wiki.ubuntu.com/EasyUbuntuClustering/UbuntuKerrighedClusterGuide
http://www.kerrighed.org/wiki/index.php/Main_Page
http://www.stevekelly.eu/cluster.shtml
http://joaomatosf.com/blog/index.php?option=com_content&view=article&id=57&catid=43&Itemid=60
http://bioinformatics.rri.sari.ac.uk/drupal/?q=wiki/tutorial_kerrighed
http://ubuntuforums.org/showthread.php?p=6495259

Sponsored Link

8 thoughts on “How to Set Up a High Performance Cluster (HPC) Using Debian Lenny and Kerrighed

  1. Great howto, I’m attempting it with AMD64, please be aware that the debootstrap did not work for my install. There was a missing – in the first option and for some reason i needed the trailing slash on the /nfsroot/kerrighed ie. debootstrap –arch i386 /nfsroot/kerrighed/

    Again thanks!

  2. Hi, my english is poor, so discuss in spanish:

    Primero, darte las gracias por el material publicado, y ahora hacer una correcciones, o por lo menos en mi caso fueron necesarias:

    1 – en el paso 9:
    el nombre del archivo en lugar de ser /var/lib/tftpboot/pxelinux.cfg/ABC65 debería ser 0A0B0C65, yo use la ip 10.0.0.11 entonces mi archivos se llamaba 0A00000B.
    Cuando se crea el archivo en la linea
    APPEND console=tty1 root=/dev/nfs nfsroot=10.11.12.1:/nfsroot/kerrighed ip=dhcp rw session_id=1
    en lugar de poner ip=dhcp escribir ip=10.X.X.X.

    2 – en el paso 11:
    al escribir la linea
    /nfsroot/kerrighed 10.11.12.0/255.255.255.0(rw,no_subtree_check,async,no_root_squash) la dirección ip debería ser del servidor y no de la red, quedaría 10.11.12.1.

    3- el el paso 20:
    un detalle pequeño, al crear el directorio con mkdir /configs la letra “s” de configs no va, es sólo /config.

    Saludos

  3. dear sir,

    I have try your step by step to install kerrighed on Debian Lenny,but got “problem” until step 35 while doing make kernel:
    “make” on see my root console
    note: i do it on

    debian:~# chroot /nfsroot/kerrighed

    debian:/usr/src/linux-2.6.20/linux-2.6.20# make
    CHK include/linux/version.h
    CHK include/linux/utsrelease.h
    CHK include/linux/compile.h
    dnsdomainname: Unknown host
    GEN .version
    CHK include/linux/compile.h
    dnsdomainname: Unknown host
    UPD include/linux/compile.h
    CC init/version.o
    LD init/built-in.o
    LD .tmp_vmlinux1
    kernel/built-in.o: In function ‘getnstimeofday’:
    (.text+0xdef1): undefined reference to ‘__umoddi3’
    kernel/built-in.o: In function ‘do_gettimeofday’:
    (.text+0xdfa6): undefined reference to ‘__udivdi3’
    kernel/built-in.o: In function ‘do_gettimeofday’:
    (.text+0xdfc9): undefined reference to ‘__umoddi3’
    kernel/built-in.o: In function ‘do_timer’:
    (.text+0xe978): undefined reference to ‘__udivdi3’
    kernel/built-in.o: In function ‘do_timer’:
    (.text+0xe99b): undefined reference to ‘__umoddi3’
    make: *** [.tmp_vmlinux1] Error 1

    debian:/usr/src/linux-2.6.20/linux-2.6.20# make kernel install
    CHK include/linux/version.h
    CHK include/linux/utsrelease.h
    sh /usr/src/linux-2.6.20/linux-2.6.20/arch/i386/boot/install.sh 2.6.20 arch/i386/boot/bzImage System.map “/boot”

    *** Missing file: arch/i386/boot/bzImage
    *** You need to run “make” before “make install”.
    make[1]: *** [install] Error 1
    make: *** [install] Error 2

    Thats the problem please help me to solve this what should I do coz I’m new on linux
    Thanks
    Best Regards

    Noval YR

  4. dear Mr Rodrigo Sarpi,

    I have try your step by step to install kerrighed on Debian Lenny,but I still got “problem” even I follow “How to Set Up a High Performance Cluster (HPC) Using Debian Lenny and Kerrighed -UPDATED — Debian Admin

    still got problem while “make” on see my root console
    note: i do it on

    debian:~# chroot /nfsroot/kerrighed

    debian:/usr/src/linux-2.6.20/linux-2.6.20# make
    CHK include/linux/version.h
    CHK include/linux/utsrelease.h
    CHK include/linux/compile.h
    dnsdomainname: Unknown host
    GEN .version
    CHK include/linux/compile.h
    dnsdomainname: Unknown host
    UPD include/linux/compile.h
    CC init/version.o
    LD init/built-in.o
    LD .tmp_vmlinux1
    kernel/built-in.o: In function ‘getnstimeofday’:
    (.text+0xdef1): undefined reference to ‘__umoddi3?
    kernel/built-in.o: In function ‘do_gettimeofday’:
    (.text+0xdfa6): undefined reference to ‘__udivdi3?
    kernel/built-in.o: In function ‘do_gettimeofday’:
    (.text+0xdfc9): undefined reference to ‘__umoddi3?
    kernel/built-in.o: In function ‘do_timer’:
    (.text+0xe978): undefined reference to ‘__udivdi3?
    kernel/built-in.o: In function ‘do_timer’:
    (.text+0xe99b): undefined reference to ‘__umoddi3?
    make: *** [.tmp_vmlinux1] Error 1

    debian:/usr/src/linux-2.6.20/linux-2.6.20# make kernel install
    CHK include/linux/version.h
    CHK include/linux/utsrelease.h
    sh /usr/src/linux-2.6.20/linux-2.6.20/arch/i386/boot/install.sh 2.6.20 arch/i386/boot/bzImage System.map “/boot”

    *** Missing file: arch/i386/boot/bzImage
    *** You need to run “make” before “make install”.
    make[1]: *** [install] Error 1
    make: *** [install] Error 2

    Thats the problem please-please help me to solve this what should I do coz I’m new on linux

    Thanks
    Best Regards

    Noval YR

  5. Noval,
    checks the version of cpp and gcc, I had a similar problem and solved it by installing the correct version of these libraries.

    regards

  6. Leonel,

    gcc & cpp,what version I should install? could you please share it, where I can get it, & how to install.

    Thanks & Best regards

Leave a comment

Your email address will not be published. Required fields are marked *