[Beowulf] Re: python2.4 error when loose MPICH2 TI with Grid Engine
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Reuti reuti at staff.uni-marburg.deSun Mar 2 01:45:06 PST 2008
- Previous message: [Beowulf] Open source Job Scheduler for Apple Leopard 10.5.2 server that will work with Open Directory?
- Next message: [Beowulf] Three questions on a new Beowulf Cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, Am 22.02.2008 um 09:23 schrieb Sangamesh B: > Dear Reuti & members of beowulf, > > I need to execute a parallel job thru grid engine. > > MPICH2 is installed with Process Manager:mpd. > > Added a parallel environment MPICH2 into SGE: > > $ qconf -sp MPICH2 > pe_name MPICH2 > slots 999 > user_lists NONE > xuser_lists NONE > start_proc_args /share/apps/MPICH2/startmpi.sh -catch_rsh > $pe_hostfile > stop_proc_args /share/apps/MPICH2/stopmpi.sh > allocation_rule $pe_slots > control_slaves FALSE > job_is_first_task TRUE > urgency_slots min > > > Added this PE to the default queue: all.q. > > mpdboot is done. mpd's are running on two nodes. > > The script for submitting this job thru sge is: > > $ cat subsamplempi.sh > #!/bin/bash > > #$ -S /bin/bash > > #$ -cwd > > #$ -N Samplejob > > #$ -q all.q > > #$ -pe MPICH2 4 > > #$ -e ERR_$JOB_NAME.$JOB_ID > > #$ -o OUT_$JOB_NAME.$JOB_ID > > date > > hostname > > /opt/MPI_LIBS/MPICH2-GNU/bin/mpirun -np $NSLOTS -machinefile > $TMP_DIR/machines ./samplempi > > echo "Executed" > > exit 0 > > > The job is getting submitted, but not executing. The error and > output file contain: > > cat ERR_Samplejob.192 > /usr/bin/env: python2.4: No such file or directory > > $ cat OUT_Samplejob.192 > -catch_rsh /opt/gridengine/default/spool/compute-0-0/active_jobs/ > 192.1/pe_hostfile > compute-0-0 > compute-0-0 > compute-0-0 > compute-0-0 > Fri Feb 22 12:57:18 IST 2008 > compute-0-0.local > Executed > > So the problem is coming for python2.4. > > $ which python2.4 > /opt/rocks/bin/python2.4 > > I googled this error. Then created a symbolic link: > > # ln -sf /opt/rocks/bin/python2.4 /bin/python2.4 > > After this also same error is coming. > > I guess the problem might be different. i.e. gridengine might not > getting the link to running mpd. > > And the procedure followed by me to configure PE might be wrong. > > So, I expect from you to clear my doubts and help me to resolve > this error. > > 1. Is the PE configuration of MPICH2 + grid engine right? if you want to integrate MPICH2 with MPD it's similar to a PVM setup. The daemons must be started in start_proc_args on every node with a dedicated port number per job. You don't say what your startmpi.sh is doing. > 2. Without Tight integration, is there a way to run a MPICh2(mpd) > based job using gridengine? Yes. > 3. In smpd-daemon based and daemonless MPICH2 tight integration, > which one is better? Depends: if you have just one mpirun per job which will run for days, I would go for the daemonless startup. But if you issue many mpirun calls in your jobscript which will just run for seconds I would go for the daemon based startup, as the mpirun will be distributed to the slaves faster. > 4. Can we do mvapich2 tight integration with SGE? Any differences > with process managers wrt MVAPICH2? Maybe, if the startup is similar to standard MPICH2. -- Reuti > Thanks & Best Regards, > Sangamesh B -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080302/a3484f8f/attachment.html
- Previous message: [Beowulf] Open source Job Scheduler for Apple Leopard 10.5.2 server that will work with Open Directory?
- Next message: [Beowulf] Three questions on a new Beowulf Cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
