MPI on ARM!!

In the last months I’ve heard and read about many people trying to get MPI parallel computing systems running on ARM microcontrollers. Of course this is a good idea. This allows us to build clusters that consumes less energy.

Well, I decided to give it a try and started trying to compiled OpenMPI for the ARM Cortex A9. After some tries I gave up and started trying mpich2. Well, it worked. Here’s the prove:

root@pandaboard:~/mpich2-1.4.1/examples# mpirun -hosts
192.168.1.123,192.168.1.117 -l -n 4 ./cpi 100 0
[0] Process 0 of 4 is on pandaboard
[2] Process 2 of 4 is on pandaboard
[3] Process 3 of 4 is on opportunity
[1] Process 1 of 4 is on opportunity
[0] pi is approximately 3.1415926544231252, Error is 0.0000000008333321 [0] wall clock time = 0.005402

That’s a MPI program (just a simple one) running on an ARM Cortex A9 (PandaBoard). Even more interesting is that I compiled “cpi” for both ARM and x86 architectures with mpicc. This means that the nodes of this 2-machine-cluster is made of one ARM and one x86 (opportunity). Here we have a program running in a cluster where the nodes have different architectures!

By the way, mpich2 worked like a charm on the PandaBoard (after some tricks)

This entry was posted in Uncategorized. Bookmark the permalink.

11 Responses to MPI on ARM!!

  1. Harald says:

    Hi

    I also trying to run MPICH2 on a simulation platform with a arm 926 microprocessor. On the platform is a linux os running and I am able to write normal programms for the platform that are able to run. For compiling the normal programms I use the buildroot toolchain. My first try was to compile the mpich2 sources with the compiler for the arm platform ( CC=arm-linux-gcc ./configure –prefix=/../mpich2-install –with-pm=hydra –disable-f77 –disable-fc –host=arm-linux –with-device=ch3:sock) but it dont works. He compiles the mpich2 sources and the hello world program but i get an segmentation fault when I start the program on the arm processor. So can please give me a short advice what I made wrong?! Thanks.

    • Rafael Aroca says:

      Hello, there are a lot of problems that may cause segmentation fault. Some common issues are related to mismatched libraries because of wrong versions. Unfortunately it is not easy to solve this problem without information. I suggest running the program you compiled with GDB, STRACE and LDD. GDB will show exactly where segfault occurs, strace will show each system call made until the crash and ldd the libs.

      Did you read MPICH manual? I remember that you have to set some environment variables, and you have to run the program with “mpirun”.

  2. Harald says:

    Hi

    I am using the buildtools 2010.05 and the arm-linux-gcc 4.3.4. I set the environment variable to the mpich2-install/bin path so that the systems knows the mpirun and mpiexec command. During the make process everything is fine but the make install command show the following error.

    libtool: install: arm-linux-ranlib /../../mpich2-install/lib/libmpl.a
    ./libtool: line 944: arm-linux-ranlib: command not found
    make[2]: *** [install-libLTLIBRARIES] error 127

    Thnaks

  3. Rafael Aroca says:

    Hi Harald,

    In my ARM environment I do have arm-linux-ranlib. Seems like something is missing with your toolchain. Could you try another compiler/toolchain?

    Also, maybe arm-linux-ranlib is not in your path. Try to find the tool in your filesystem and add to path

  4. Harald says:

    Hi
    I solved the problem and now the make install process finished without error. But when I compile a programm and run it on the arm926 simulator it returns with a segmentation fault. Do you also use the hydra process manager and the sock communication?! I also tried out to configure the CFLAGS with -static but the mpiexec.hydra says that he is dynamically linked.

  5. Harald says:

    For testing the MPICH2 Framework on the arm processor I use a little hello world that prints out the rank and the hostname of the prozessor. If I start the program with mpirun -n 1 hello I got a segmentation fault and if I start the program only with ./hello I got the outputmessage with the right hostname. So I think the segmentation faul is caused by mpirun or?

  6. Rafael Aroca says:

    Hi Harald,

    Yes, seems like the problem is related to the MPI implementation. Probably not the mpirun specifically. I am sorry but I don’t know exactly how to help as I don’t know this ARM simulator you are using neither your distribution, libraries and versions.

    Wouldn’t be possible for you to test with a more classical setup such as Debian for ARM running on qemu or Ubuntu for ARM running also on the qemu emulator?

    If you still don’t find a way to test your program, I could give you SSH access to a PandaBoard (ARM Cortex-A9) running Ubuntu with both classical MPI and MPICH2 that I have compiled and installed on the board. I will give a workshop with the board next week, so if you’d like to do that, this would happen by the end of next week.

  7. Harald says:

    Hi
    So with the MPICH2 version 1.3.2 it works fine when I start the applications with -launcher fork. This is a big step forward and my next step is do start to controllers and let them communicate together. Until now it is possible to ping one controller from the other but i can not start Mpi via ssh. I installed openssh with make menuconfig in the buildroot butI always get the error “connection refused”. I think it is a problem with the configuration of ssh but i have to lock. If you have any advice please write back. Thanks
    Harald

  8. Rafael Aroca says:

    After installing ssh, you must start the ssh server with sshd or /etc/init.d/ssh start. You can check if the server is running with telnet 22 or netstat -lnp on an ARM terminal, and check if TCP port 22 is opened. Seems like your problem is related to this issue.

  9. Harald says:

    Hi

    After a lot of unsuccessfully test ssh is not running yet. I installed the openssh packet by the make menuconfig command in the buildroot. Then I create a new root file system with the buildroot. That works and the linux now knows the ssh command. When i go to /etc/init.d I found S50sshd and I executed ./S50sshd restart. After restarting he said ok but it doesn’t works. When I start ssh root@localhost i got an connection refused message. Is there a special configuration necessary?!

    Best regards
    Harald

  10. Rafael Aroca says:

    Harald, sorry for the delay. Seems like you are almost there. You can execute also sh -x ./S50sshd restart to see if something goes wrong (all execution steps will be shown).

    I also suggest that you use netstat -lnp |grep ssh to check if ssh is listening.

    Could you please also check the syslog and ssh log files in /var/log to verify if any error message appears?

    regards

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s