[Beowulf] mpich mpd ring on a network of 2 pcs
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Manal Helal manalorama at gmail.comSat Dec 30 08:21:18 PST 2006
- Previous message: [Beowulf] picking out a job scheduler
- Next message: [Beowulf] Which distro for the cluster?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi I am trying to setup a small cluster incrementally, to run mpi programs only. I have 4 PCs with linux fedora core, 2 with core 5, and one with core 6, and I will install the new one with core 6. I installed mpich2 on fedora core 6, and I can run mpd and the mpi programs on this machine fine, and I can ping and ssh from and to all machines, then I added an smb share to the install bin path, and can access it from the other machine, and updated the mpd.hosts file (in the user folder on the mpich2 installation machine) with the names of both machines for now, (I copied .mpf.conf to the user folder on both machines, and same about the mpd.hosts - not sure if this is right or not) on the second machine, I can read and write the mpich2 bin folder, and I can run mpd command only and when I try to mpdtrace, it says no mpd is running, when I try to run mpd on the installation machine, and can mpdtrace it and get the port number, and run on the other machine, mpd -h hostname -p port & I receive: ********************** [1] 7007 [mhelal at manal mhelal]# manal.localhits_45668: conn error in connect_rhs: Connection refused manal.localhits_45668 (connect_rhs 726): failed to connect to rhs at 127.0.0.1 56317 manal.localhits_45668 (enter_ring 633): rhs connect failed manal.localhits_45668 (run 245): failed to enter ring ********************** and on the installation machine I keep getting: lot rhs; re-entering ring ..... back in ring ********************** another scenario, I tried on the installation machine: ********************** [mhelal at manallpt ~]$ mpdboot -n 2 mhelal at manal's password: mpdboot_manallpt.localhits (handle_mpd_output 388): from mpd on manal, invalid port info: /home/mhelal: Permission denied. /home/mhelal/mpich2-install/bin/mpd.py: Command not found. ********************** how can I debug this problem, any help is highly appreciated, I only have the mpich2 README and it says refer to the installation guide for more information, and I can not find that. It would be really helpful if anyone points me to a tutorial (detailed step by step) on how to create a small simple network to run mpi jobs, and the things I need to take care of, Thank you in advance, Kind Regards, Manal
- Previous message: [Beowulf] picking out a job scheduler
- Next message: [Beowulf] Which distro for the cluster?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
