Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more methods of WDL task disk specification #5001

Merged
merged 78 commits into from
Oct 10, 2024
Merged
Changes from 1 commit
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
3971be6
better disk logic and add logic to mount specific points
stxue1 Jun 29, 2024
48990d0
cromwell compatibility
stxue1 Jun 29, 2024
e8d223c
Convert from wdl string to normal string
stxue1 Jun 29, 2024
fbe0eef
Merge branch 'master' into issues/4995-disk-spec-wdl
stxue1 Jun 29, 2024
8f7199a
floats
stxue1 Jul 1, 2024
166bf41
Merge branch 'issues/4995-disk-spec-wdl' of github.com:DataBiosphere/…
stxue1 Jul 1, 2024
d7719b9
Satisfy mypy
stxue1 Jul 1, 2024
0c131e0
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 2, 2024
b4714ec
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 10, 2024
4443728
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 11, 2024
8da0d7d
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 16, 2024
9379715
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 17, 2024
85a4df9
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 18, 2024
6aa73b1
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 19, 2024
cc36f71
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 19, 2024
36fd277
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 19, 2024
c7bec56
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 20, 2024
4b4e7f0
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 25, 2024
6bf2a4e
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 25, 2024
31b7e27
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 25, 2024
98bba55
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 25, 2024
25e0e51
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 26, 2024
42fedf6
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 29, 2024
d35d033
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 30, 2024
9d75e4b
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Jul 31, 2024
f752535
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 1, 2024
4ce33d5
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 3, 2024
e9df2f9
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 8, 2024
01b8102
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 14, 2024
2955c4d
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 15, 2024
a364601
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 19, 2024
02d873f
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 19, 2024
b65b315
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 21, 2024
6f30676
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 22, 2024
58196ce
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 22, 2024
66d3e50
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 22, 2024
ea19cb6
Follow new spec
stxue1 Aug 23, 2024
32c65dd
mypy
stxue1 Aug 23, 2024
63b4410
Merge branch 'issues/4995-disk-spec-wdl' of github.com:DataBiosphere/…
stxue1 Aug 23, 2024
a1a8651
Support cromwell disks attributes for backwards compatibility
stxue1 Aug 23, 2024
29ffd3f
Deal with pipes deprecation
stxue1 Aug 23, 2024
7068810
Update md5sum test to be compatible with newer docker/singularity ver…
stxue1 Aug 23, 2024
c090823
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Aug 27, 2024
14e2ee1
Merge branch 'master' of github.com:DataBiosphere/toil into issues/49…
stxue1 Aug 29, 2024
901c4c2
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 3, 2024
aa58e2f
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 10, 2024
8b15af6
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 10, 2024
e04f5c1
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 12, 2024
ae2f169
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 16, 2024
eb56ef9
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 16, 2024
a21fc3a
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 16, 2024
1a098b4
Address comments
stxue1 Sep 17, 2024
ceccb07
Merge branch 'issues/4995-disk-spec-wdl' of github.com:DataBiosphere/…
stxue1 Sep 17, 2024
839e09b
Update src/toil/wdl/wdltoil.py
stxue1 Sep 17, 2024
89ca0d4
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 24, 2024
bf3ca2a
Fix missing reverse iteration loop and make local-disk disambiguation…
stxue1 Sep 24, 2024
4130ab8
move out disk parse into a function
stxue1 Sep 24, 2024
d194edc
Fix issues with cromwell compatibility
stxue1 Sep 24, 2024
f2080a8
Move local-disk into parse and dont convert_units in parse function
stxue1 Sep 24, 2024
f0dbe3f
Fix edge case where only size is requested
stxue1 Sep 24, 2024
e6a9082
Add tests
stxue1 Sep 24, 2024
6fbef8c
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 26, 2024
68fb254
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 26, 2024
8d093dd
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 26, 2024
b184386
Remove redef
stxue1 Sep 27, 2024
67b2554
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Sep 30, 2024
7f4b452
Add docstring and remove dead comment
stxue1 Oct 1, 2024
07d6b31
Merge branch 'master' of github.com:DataBiosphere/toil into issues/49…
stxue1 Oct 2, 2024
f3a3a2f
Add back dropped mount_spec argument
stxue1 Oct 2, 2024
9ab5c9c
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 3, 2024
0f247a6
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 3, 2024
067ee8d
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 3, 2024
a7a1459
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 4, 2024
c7ac131
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 4, 2024
806ff9c
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 7, 2024
acaa894
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 8, 2024
accf831
Merge branch 'master' of github.com:DataBiosphere/toil into issues/49…
stxue1 Oct 9, 2024
a0b65f4
Merge master into issues/4995-disk-spec-wdl
github-actions[bot] Oct 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 60 additions & 40 deletions src/toil/wdl/wdltoil.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@
from toil.jobStores.abstractJobStore import (AbstractJobStore, UnimplementedURLException,
InvalidImportExportUrlException, LocatorException)
from toil.lib.accelerators import get_individual_local_accelerators
from toil.lib.conversions import convert_units, human2bytes
from toil.lib.conversions import convert_units, human2bytes, VALID_PREFIXES
from toil.lib.io import mkdtemp
from toil.lib.memoize import memoize
from toil.lib.misc import get_user_name
Expand All @@ -91,9 +91,9 @@


class InsufficientMountDiskSpace(Exception):
def __init__(self, mount_target: str, desired_bytes: int, available_bytes: int) -> None:
super().__init__("Not enough available disk space for the target mount point %s. Needed %d bytes but there is only %d available."
% (mount_target, desired_bytes, available_bytes))
def __init__(self, mount_targets: List[str], desired_bytes: int, available_bytes: int) -> None:
super().__init__("Not enough available disk space for the target mount points %s. Needed %d bytes but there is only %d available."
% (", ".join(mount_targets), desired_bytes, available_bytes))

@contextmanager
def wdl_error_reporter(task: str, exit: bool = False, log: Callable[[str], None] = logger.critical) -> Generator[None, None, None]:
Expand Down Expand Up @@ -1921,7 +1921,7 @@ def run(self, file_store: AbstractFileStore) -> Promised[WDLBindings]:
memory_spec = human2bytes(memory_spec)
runtime_memory = memory_spec

mount_spec: Dict[str, int] = dict()
mount_spec: Dict[Optional[str], int] = dict()
if runtime_bindings.has_binding('disks'):
# Miniwdl doesn't have this, but we need to be able to parse things like:
# local-disk 5 SSD
Expand All @@ -1942,24 +1942,29 @@ def run(self, file_store: AbstractFileStore) -> Promised[WDLBindings]:
# their commas when separating the list, like in Cromwell's
# examples, so we strip whitespace.
spec_parts = spec.strip().split(' ')

# First check that this is a format we support. Both the WDL spec and Cromwell allow a max 3-piece specification
# So if there are more than 3 pieces, raise an error
if len(spec_parts) > 3:
raise RuntimeError(f"Could not parse disks = {disks_spec} because {all_specs} contains more than 3 parts")
part_size = None
# default to GiB as per spec
part_suffix: str = "GiB"
part_suffix: str = "GiB" # The WDL spec's default is 1 GiB
# default to the execution directory
specified_mount_point = None
# first get the size, since units should always be some nonnumerical string, get the last numerical value
for i, part in enumerate(spec_parts):
stxue1 marked this conversation as resolved.
Show resolved Hide resolved
if part.replace(".", "", 1).isdigit():
# round down floats
part_size = int(float(part))
continue
if i == 0:
# mount point is always the first
specified_mount_point = part
continue
if part_size is not None:
# suffix will always be after the size, if it exists
part_suffix = part
continue
spec_parts.pop(i)
break
# unit specification is only allowed to be at the end
if spec_parts[-1].lower() in VALID_PREFIXES:
part_suffix = spec_parts[-1]
spec_parts.pop(-1)
# The last remaining element, if it exists, is the mount point
if len(spec_parts) > 0:
specified_mount_point = spec_parts[0]

if part_size is None:
# Disk spec did not include a size
Expand All @@ -1979,15 +1984,23 @@ def run(self, file_store: AbstractFileStore) -> Promised[WDLBindings]:

per_part_size = convert_units(part_size, part_suffix)
total_bytes += per_part_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The total space needed (including any local-disk or mount-point-less size) doesn't ever get copied to runtime_disk and so isn't actually used for scheduling.

if specified_mount_point is not None:
if mount_spec.get(specified_mount_point) is not None:
if mount_spec.get(specified_mount_point) is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check doesn't account for how local-disk and no mount point mean the same thing, so will still let you specify both.

If there was a test for this validation logic, that would catch the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mapped local-disk and no mount point to the same thing in a conditional branch below. I changed the logic a bit so it's more clear and so there's only one conditional branch that does this duplication check.

if specified_mount_point is not None:
# raise an error as all mount points must be unique
raise ValueError(f"Could not parse disks = {disks_spec} because the mount point {specified_mount_point} is specified multiple times")

# TODO: we always ignore the disk type and assume we have the right one.
if specified_mount_point != "local-disk":
# Don't mount local-disk. This isn't in the spec, but is carried over from cromwell
mount_spec[specified_mount_point] = int(per_part_size)
else:
if mount_spec.get(specified_mount_point) is not None:
stxue1 marked this conversation as resolved.
Show resolved Hide resolved
raise ValueError(f"Could not parse disks = {disks_spec} because the mount point is omitted more than once")

# TODO: we always ignore the disk type and assume we have the right one.
if specified_mount_point != "local-disk":
# Don't mount local-disk. This isn't in the spec, but is carried over from cromwell
# When the mount point is omitted, default to the task's execution directory, which None will represent
mount_spec[specified_mount_point] = int(per_part_size)
else:
# local-disk is equivalent to an omitted mount point
mount_spec[None] = int(per_part_size)
runtime_disk = int(total_bytes)

if not runtime_bindings.has_binding("gpu") and self._task.effective_wdl_version in ('1.0', 'draft-2'):
# For old WDL versions, guess whether the task wants GPUs if not specified.
Expand Down Expand Up @@ -2022,7 +2035,9 @@ def run(self, file_store: AbstractFileStore) -> Promised[WDLBindings]:
runtime_accelerators = [accelerator_requirement]

# Schedule to get resources. Pass along the bindings from evaluating all the inputs and decls, and the runtime, with files virtualized.
run_job = WDLTaskJob(self._task, virtualize_files(bindings, standard_library), virtualize_files(runtime_bindings, standard_library), self._task_id, self._namespace, self._task_path, mount_spec, cores=runtime_cores or self.cores, memory=runtime_memory or self.memory, disk=runtime_disk or self.disk, accelerators=runtime_accelerators or self.accelerators, wdl_options=self._wdl_options)
run_job = WDLTaskJob(self._task, virtualize_files(bindings, standard_library), virtualize_files(runtime_bindings, standard_library), self._task_id, self._namespace,
self._task_path, mount_spec, cores=runtime_cores or self.cores, memory=runtime_memory or self.memory, disk=runtime_disk or self.disk,
accelerators=runtime_accelerators or self.accelerators, wdl_options=self._wdl_options)
# Run that as a child
self.addChild(run_job)

Expand All @@ -2045,7 +2060,8 @@ class WDLTaskJob(WDLBaseJob):
All bindings are in terms of task-internal names.
"""

def __init__(self, task: WDL.Tree.Task, task_internal_bindings: Promised[WDLBindings], runtime_bindings: Promised[WDLBindings], task_id: List[str], namespace: str, task_path: str, mount_spec: Dict[str, int], **kwargs: Any) -> None:
def __init__(self, task: WDL.Tree.Task, task_internal_bindings: Promised[WDLBindings], runtime_bindings: Promised[WDLBindings], task_id: List[str], namespace: str,
task_path: str, mount_spec: Dict[Optional[str], int], **kwargs: Any) -> None:
"""
Make a new job to run a task.

Expand Down Expand Up @@ -2248,7 +2264,7 @@ def can_mount_proc(self) -> bool:
"""
return "KUBERNETES_SERVICE_HOST" not in os.environ

def ensure_mount_point(self, file_store: AbstractFileStore, mount_spec: Dict[str, int]) -> Dict[str, str]:
def ensure_mount_point(self, file_store: AbstractFileStore, mount_spec: Dict[Optional[str], int]) -> Dict[str, str]:
"""
Ensure the mount point sources are available.

Expand All @@ -2271,20 +2287,22 @@ def ensure_mount_point(self, file_store: AbstractFileStore, mount_spec: Dict[str
# The only defect of this regex is if the target mount point is the same format as the df output
# It is likely reliable enough to trust the user has not created a mount with a df output-like name
regex_df = re.compile(r".+ \d+ +\d+ +(\d+) +\d+% +.+")
total_mount_size = sum(mount_spec.values())
try:
for mount_target, mount_size in mount_spec.items():
# Use arguments from the df POSIX standard
df_line = subprocess.check_output(["df", "-k", "-P", tmpdir], encoding="utf-8").split("\n")[1]
m = re.match(regex_df, df_line)
if m is None:
logger.debug("Output of df may be malformed: %s", df_line)
logger.warning("Unable to check disk requirements as output of 'df' command is malformed. Will assume storage is always available.")
continue
# Use arguments from the df POSIX standard
df_line = subprocess.check_output(["df", "-k", "-P", tmpdir], encoding="utf-8").split("\n")[1]
m = re.match(regex_df, df_line)
if m is None:
logger.debug("Output of df may be malformed: %s", df_line)
logger.warning("Unable to check disk requirements as output of 'df' command is malformed. Will assume storage is always available.")
else:
# Block size will always be 1024
available_space = int(m[1]) * 1024
if available_space < mount_size:
if available_space < total_mount_size:
# We do not have enough space available for this mount point
raise InsufficientMountDiskSpace(mount_target, mount_size, available_space)
# An omitted mount point is the task's execution directory so show that to the user instead
raise InsufficientMountDiskSpace([mount_point if mount_point is not None else "/mnt/miniwdl_task_container/work" for mount_point in mount_spec.keys()],
total_mount_size, available_space)
except subprocess.CalledProcessError as e:
# If df somehow isn't available
logger.debug("Unable to call df. stdout: %s stderr: %s", e.stdout, e.stderr)
Expand All @@ -2293,7 +2311,9 @@ def ensure_mount_point(self, file_store: AbstractFileStore, mount_spec: Dict[str
# Create a new subdirectory for each mount point
source_location = os.path.join(tmpdir, str(uuid.uuid4()))
os.mkdir(source_location)
mount_src_mapping[mount_target] = source_location
if mount_target is not None:
# None represents an omitted mount point, which will default to the task's work directory. MiniWDL's internals will mount the task's work directory by itself
mount_src_mapping[mount_target] = source_location
return mount_src_mapping

@report_wdl_errors("run task command", exit=True)
Expand All @@ -2304,14 +2324,14 @@ def run(self, file_store: AbstractFileStore) -> Promised[WDLBindings]:
super().run(file_store)
logger.info("Running task command for %s (%s) called as %s", self._task.name, self._task_id, self._namespace)

# Create mount points and get a mapping of target mount points to locations on disk
mount_mapping = self.ensure_mount_point(file_store, self._mount_spec)

# Set up the WDL standard library
# UUID to use for virtualizing files
# We process nonexistent files in WDLTaskWrapperJob as those must be run locally, so don't try to devirtualize them
standard_library = ToilWDLStdLibBase(file_store, self._task_path, enforce_existence=False)

# Create mount points and get a mapping of target mount points to locations on disk
mount_mapping = self.ensure_mount_point(file_store, self._mount_spec)

# Get the bindings from after the input section
bindings = unwrap(self._task_internal_bindings)
# And the bindings from evaluating the runtime section
Expand Down