Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • E expred
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 6
    • Issues 6
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Maximilian Reimer
  • expred
  • Issues
  • #8
Closed
Open
Issue created Apr 09, 2021 by Maximilian Reimer@mreimerOwner

Runs on same node seem to block each other

Some runs (if shorly started on the same node seem to block each other This seems to happen even before the output-dir is created! The only things done before that are adding args and parsing them, as well as wandb.init(entity="explainable-nlp", project="expred") I suspect it a problem of wandb and will file an issue.

condor_q --nobatch:

-- Schedd: deken1.local : <192.168.1.80:9618?... @ 04/09/21 10:25:04
 ID       OWNER            SUBMITTED     RUN_TIME ST PRI SIZE  CMD
22315.0   mreimer         4/8  13:32   0+20:46:59 R  0   318.0 run_with_env.sh expred /home/mreimer/projects/expred scripts/run_movies_from_config.sh movies_0.3_23
22316.0   mreimer         4/8  13:32   0+20:46:59 R  0   318.0 run_with_env.sh expred /home/mreimer/projects/expred scripts/run_movies_from_config.sh movies_0.3_44
22317.0   mreimer         4/8  13:32   0+20:46:59 R  0   318.0 run_with_env.sh expred /home/mreimer/projects/expred scripts/run_movies_from_config.sh movies_0.4_24
22318.0   mreimer         4/8  13:32   0+20:46:59 R  0   318.0 run_with_env.sh expred /home/mreimer/projects/expred scripts/run_movies_from_config.sh movies_0.4_45

When connection via condor_ssh_to_job and running ps -fu mreimer --sort cmd you get: (all on node01)

UID          PID    PPID  C STIME TTY          TIME CMD
mreimer   982070  982054  0 Apr08 ?        00:00:00 /bin/bash /var/lib/condor/execute/dir_982054/condor_exec.exe expred /home/mreimer/projects/expred scripts/run_movies_from_config.sh movies
mreimer   982071  982055  0 Apr08 ?        00:00:00 /bin/bash /var/lib/condor/execute/dir_982055/condor_exec.exe expred /home/mreimer/projects/expred scripts/run_movies_from_config.sh movies
mreimer   982078  982056  0 Apr08 ?        00:00:00 /bin/bash /var/lib/condor/execute/dir_982056/condor_exec.exe expred /home/mreimer/projects/expred scripts/run_movies_from_config.sh movies
mreimer   982082  982059  0 Apr08 ?        00:00:00 /bin/bash /var/lib/condor/execute/dir_982059/condor_exec.exe expred /home/mreimer/projects/expred scripts/run_movies_from_config.sh movies
mreimer   982152  982128  0 Apr08 ?        00:00:00 git cat-file --batch-check
mreimer   982164  982127  0 Apr08 ?        00:00:00 git cat-file --batch-check
mreimer   982165  982124  0 Apr08 ?        00:00:00 git cat-file --batch-check
mreimer   982171  982126  0 Apr08 ?        00:00:00 git cat-file --batch-check
mreimer  1051119 1050833  0 10:30 pts/0    00:00:00 ps -fu mreimer --sort cmd
mreimer   982075  982072  0 Apr08 ?        00:00:02 python expred/train.py --seed 100 --data_dir /home/mreimer/datasets/eraser/movies_0.3_23 --output_dir outputs/movies_0.3_23/21_08_04_13_38
mreimer   982077  982074  0 Apr08 ?        00:00:02 python expred/train.py --seed 100 --data_dir /home/mreimer/datasets/eraser/movies_0.3_44 --output_dir outputs/movies_0.3_44/21_08_04_13_38
mreimer   982081  982079  0 Apr08 ?        00:00:02 python expred/train.py --seed 100 --data_dir /home/mreimer/datasets/eraser/movies_0.4_24 --output_dir outputs/movies_0.4_24/21_08_04_13_38
mreimer   982085  982083  0 Apr08 ?        00:00:02 python expred/train.py --seed 100 --data_dir /home/mreimer/datasets/eraser/movies_0.4_45 --output_dir outputs/movies_0.4_45/21_08_04_13_38
mreimer   982107  982075  0 Apr08 ?        00:00:00 /home/mreimer/envs/expred/bin/python -c from multiprocessing.semaphore_tracker import main;main(3)
mreimer   982108  982081  0 Apr08 ?        00:00:00 /home/mreimer/envs/expred/bin/python -c from multiprocessing.semaphore_tracker import main;main(3)
mreimer   982109  982077  0 Apr08 ?        00:00:00 /home/mreimer/envs/expred/bin/python -c from multiprocessing.semaphore_tracker import main;main(3)
mreimer   982110  982085  0 Apr08 ?        00:00:00 /home/mreimer/envs/expred/bin/python -c from multiprocessing.semaphore_tracker import main;main(3)
mreimer   982124  982085  0 Apr08 ?        00:00:01 /home/mreimer/envs/expred/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=4, pipe_handle=11) --multiproc
mreimer   982126  982075  0 Apr08 ?        00:00:01 /home/mreimer/envs/expred/bin/python -c from multiprocessing.spawn imporps -fu mreimer --sort cmdt spawn_main; spawn_main(tracker_fd=4, pipe_handle=11) --multiproc
mreimer   982127  982081  0 Apr08 ?        00:00:01 /home/mreimer/envs/expred/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=4, pipe_handle=11) --multiproc
mreimer   982128  982077  0 Apr08 ?        00:00:01 /home/mreimer/envs/expred/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=4, pipe_handle=11) --multiproc
mreimer   982072  982070  0 Apr08 ?        00:00:00 /bin/bash scripts/run_movies_from_config.sh movies_0.3_23
mreimer   982074  982071  0 Apr08 ?        00:00:00 /bin/bash scripts/run_movies_from_config.sh movies_0.3_44
mreimer   982079  982078  0 Apr08 ?        00:00:00 /bin/bash scripts/run_movies_from_config.sh movies_0.4_24
mreimer   982083  982082  0 Apr08 ?        00:00:00 /bin/bash scripts/run_movies_from_config.sh movies_0.4_45
``
Edited Apr 12, 2021 by Maximilian Reimer
Assignee
Assign to
Time tracking