From bb0bba74234f7130feedeaef95c7d40ebc19be4f Mon Sep 17 00:00:00 2001 From: maxulysse Date: Mon, 30 Sep 2024 11:35:07 +0200 Subject: [PATCH 1/9] add comments in test --- tests/getAllFilesFromDir/main.nf.test | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/getAllFilesFromDir/main.nf.test b/tests/getAllFilesFromDir/main.nf.test index b3593f5..902204e 100644 --- a/tests/getAllFilesFromDir/main.nf.test +++ b/tests/getAllFilesFromDir/main.nf.test @@ -16,7 +16,9 @@ nextflow_pipeline { def stable_name = getAllFilesFromDir(params.outdir, false, timestamp) def stable_content = getAllFilesFromDir(params.outdir, false, timestamp + [/stable_name\.txt/] ) assert snapshot( + // Only snapshot name stable_name*.name, + // Snapshot content stable_content ).match() } From 7538e66671f13ce951dc13f72b5f46c808d8e15c Mon Sep 17 00:00:00 2001 From: maxulysse Date: Mon, 30 Sep 2024 11:37:38 +0200 Subject: [PATCH 2/9] add docs --- README.md | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/README.md b/README.md index 49cfe90..f1bfe9c 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,37 @@ # nft-utils + nf-test utility functions + +This repository contains utility functions for nf-test. + +These functions are used to help capture level tests using nf-test. + +## `removeNextflowVersion()` + +Remove the Nextflow version from the yml pipeline created file. + +Usage: + +```groovy +assert snapshot(removeNextflowVersion("$outputDir/pipeline_info/nf_core_pipeline_software_mqc_versions.yml")).match() +``` + +Only argument is path to the file. + +## `getAllFilesFromDir()` + +Get all files (can include folders too) from a directory, not matching a regex pattern. + +```groovy +def timestamp = [/.*\d{4}-\d{2}-\d{2}_\d{2}-\d{2}-\d{2}.*/] +def stable_name = getAllFilesFromDir(params.outdir, false, timestamp) +def stable_content = getAllFilesFromDir(params.outdir, false, timestamp + [/stable_name\.txt/] ) +assert snapshot( + // Only snapshot name + stable_name*.name, + // Snapshot content + stable_content +).match() +``` + +First argument is the directory path, second is a boolean to include folders, and the third is a list of regex patterns to exclude. From 7b6a4a8df395fdd3b9f06a123a6d022107842363 Mon Sep 17 00:00:00 2001 From: maxulysse Date: Mon, 30 Sep 2024 12:06:24 +0200 Subject: [PATCH 3/9] better explanations --- README.md | 41 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 39 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index f1bfe9c..ac4ff89 100644 --- a/README.md +++ b/README.md @@ -8,12 +8,49 @@ These functions are used to help capture level tests using nf-test. ## `removeNextflowVersion()` -Remove the Nextflow version from the yml pipeline created file. +nf-core pipelines create a yml file listing all the versions of the software used in the pipeline. + +Here is an example of this file coming from the rnaseq pipeline. + +```yaml +BBMAP_BBSPLIT: + bbmap: 39.01 +CAT_FASTQ: + cat: 8.3 +CUSTOM_CATADDITIONALFASTA: + python: 3.9.5 +CUSTOM_GETCHROMSIZES: + getchromsizes: 1.2 +FASTQC: + fastqc: 0.12.1 +GTF2BED: + perl: 5.26.2 +GTF_FILTER: + python: 3.9.5 +GUNZIP_ADDITIONAL_FASTA: + gunzip: 1.1 +GUNZIP_GTF: + gunzip: 1.1 +STAR_GENOMEGENERATE: + star: 2.7.10a + samtools: 1.18 + gawk: 5.1.0 +TRIMGALORE: + trimgalore: 0.6.7 + cutadapt: 3.4 +UNTAR_SALMON_INDEX: + untar: 1.34 +Workflow: + nf-core/rnaseq: v3.16.0dev + Nextflow: 24.04.4 +``` + +This function remove the Nextflow version from this yml file, as it is not relevant for the snapshot. Usage: ```groovy -assert snapshot(removeNextflowVersion("$outputDir/pipeline_info/nf_core_pipeline_software_mqc_versions.yml")).match() +assert snapshot(removeNextflowVersion("$outputDir/pipeline_info/nf_core_rnaseq_software_mqc_versions.yml")).match() ``` Only argument is path to the file. From 6e03a4f2739062fa1ef57912a139ae31cd2dbbf9 Mon Sep 17 00:00:00 2001 From: maxulysse Date: Mon, 30 Sep 2024 13:16:08 +0200 Subject: [PATCH 4/9] regex -> glob + update README --- README.md | 35 +++++++-- .../java/nf_core/nf/test/utils/Methods.java | 74 ++++++++++--------- tests/getAllFilesFromDir/main.nf | 2 +- tests/getAllFilesFromDir/main.nf.test | 7 +- tests/getAllFilesFromDir/main.nf.test.snap | 6 +- 5 files changed, 79 insertions(+), 45 deletions(-) diff --git a/README.md b/README.md index ac4ff89..a5fa95f 100644 --- a/README.md +++ b/README.md @@ -57,18 +57,41 @@ Only argument is path to the file. ## `getAllFilesFromDir()` -Get all files (can include folders too) from a directory, not matching a regex pattern. +Files produced by a pipeline can be compared to a snapshot. +This function produces a list of all the content of a directory, and can exclude some files based on a glob pattern. +From there one can get just filenames and snapshot only the names when content is not stable. +Or snapshot the whole list of files that have stable content. + +In this example, these are the files produced by a pipeline: + +```bash +results/ +├── pipeline_info +│   └── execution_trace_2024-09-30_13-10-16.txt +└── stable + ├── stable_content.txt + └── stable_name.txt + +2 directories, 3 files +``` + +In this example, 1 file is stable with stable content (`stable_content.txt`), and 1 file is stable with a stable name (`stable_name.txt`). +The last file has no stable content (`execution_trace_2024-09-30_13-10-16.txt`) as its name is based on the date and time of the pipeline execution. + +For this example, we want to snapshot the files that have stable content, and the filenames that have stable names. + ```groovy -def timestamp = [/.*\d{4}-\d{2}-\d{2}_\d{2}-\d{2}-\d{2}.*/] -def stable_name = getAllFilesFromDir(params.outdir, false, timestamp) -def stable_content = getAllFilesFromDir(params.outdir, false, timestamp + [/stable_name\.txt/] ) +// Use getAllFilesFromDir() to get a list of all files and folders from the output directory, minus non stable files +def stable_name = getAllFilesFromDir(params.outdir, true, ['**/execution_trace*.txt'] ) +// Use getAllFilesFromDir() to get a list of all files from the output directory, minus the non stable files +def stable_content = getAllFilesFromDir(params.outdir, false, ['**/execution_trace*.txt', '**/stable_name.txt'] ) assert snapshot( - // Only snapshot name + // Only snapshot name as content is not stable stable_name*.name, // Snapshot content stable_content ).match() ``` -First argument is the directory path, second is a boolean to include folders, and the third is a list of regex patterns to exclude. +First argument is the pipeline `outdir` directory path, second is a boolean to include folders, and the third is a list of glob patterns to ignore. diff --git a/src/main/java/nf_core/nf/test/utils/Methods.java b/src/main/java/nf_core/nf/test/utils/Methods.java index 3a1d189..83568d3 100644 --- a/src/main/java/nf_core/nf/test/utils/Methods.java +++ b/src/main/java/nf_core/nf/test/utils/Methods.java @@ -5,11 +5,19 @@ import java.io.File; import java.io.FileReader; import java.io.IOException; +import java.nio.file.FileSystems; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.PathMatcher; +import java.nio.file.Paths; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; import java.util.ArrayList; -import java.util.Collections; +import java.util.Comparator; import java.util.List; import java.util.Map; -import java.util.regex.Pattern; +import java.util.stream.Collectors; public class Methods { @@ -41,44 +49,44 @@ public static Map> removeNextflowVersion(CharSequenc } // Return all files in a directory and its sub-directories - // matching or not matching supplied regexes - public static List getAllFilesFromDir(String outdir, boolean includeDir, List excludeRegexes) { + // matching or not matching supplied glob + public static List getAllFilesFromDir(String outdir, boolean includeDir, List ignoreGlobs) + throws IOException { List output = new ArrayList<>(); - File directory = new File(outdir); + Path directory = Paths.get(outdir); - getAllFilesRecursively(directory, includeDir, excludeRegexes, output); - - Collections.sort(output); - return output; - } - - // Recursively list all files in a directory and its sub-directories - // matching or not matching supplied regexes - private static void getAllFilesRecursively(File directory, boolean includeDir, List excludeRegexes, - List output) { - File[] files = directory.listFiles(); - if (files != null) { - for (File file : files) { - boolean matchesInclusion = includeDir || file.isFile(); - boolean matchesExclusion = false; + List excludeMatchers = new ArrayList<>(); + if (ignoreGlobs != null && !ignoreGlobs.isEmpty()) { + for (String glob : ignoreGlobs) { + excludeMatchers.add(FileSystems.getDefault().getPathMatcher("glob:" + glob)); + } + } - if (excludeRegexes != null) { - for (String regex : excludeRegexes) { - if (Pattern.matches(regex, file.getName())) { - matchesExclusion = true; - break; - } - } + Files.walkFileTree(directory, new SimpleFileVisitor() { + @Override + public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) { + if (!isExcluded(file)) { + output.add(file.toFile()); } + return FileVisitResult.CONTINUE; + } - if (matchesInclusion && !matchesExclusion) { - output.add(file); + @Override + public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) { + // Exclude output which is the root output folder from nf-test + if (includeDir && (!isExcluded(dir) && !dir.getFileName().toString().equals("output"))) { + output.add(dir.toFile()); } + return FileVisitResult.CONTINUE; + } - if (file.isDirectory()) { - getAllFilesRecursively(file, includeDir, excludeRegexes, output); - } + private boolean isExcluded(Path path) { + return excludeMatchers.stream().anyMatch(matcher -> matcher.matches(directory.relativize(path))); } - } + }); + + return output.stream() + .sorted(Comparator.comparing(File::getPath)) + .collect(Collectors.toList()); } } diff --git a/tests/getAllFilesFromDir/main.nf b/tests/getAllFilesFromDir/main.nf index dd4041c..191d223 100644 --- a/tests/getAllFilesFromDir/main.nf +++ b/tests/getAllFilesFromDir/main.nf @@ -19,5 +19,5 @@ workflow { """ I DO NOT HAVE STABLE NAME """.stripIndent().trim()) - .collectFile(storeDir: "${params.outdir}/not_stable", name: "${trace_timestamp}.txt", sort: true, newLine: true) + .collectFile(storeDir: "${params.outdir}/pipeline_info", name: "execution_trace_${trace_timestamp}.txt", sort: true, newLine: true) } diff --git a/tests/getAllFilesFromDir/main.nf.test b/tests/getAllFilesFromDir/main.nf.test index 902204e..801099d 100644 --- a/tests/getAllFilesFromDir/main.nf.test +++ b/tests/getAllFilesFromDir/main.nf.test @@ -12,9 +12,10 @@ nextflow_pipeline { } then { - def timestamp = [/.*\d{4}-\d{2}-\d{2}_\d{2}-\d{2}-\d{2}.*/] - def stable_name = getAllFilesFromDir(params.outdir, false, timestamp) - def stable_content = getAllFilesFromDir(params.outdir, false, timestamp + [/stable_name\.txt/] ) + // Use getAllFilesFromDir() to get a list of all files and folders from the output directory, minus the timestamped files + def stable_name = getAllFilesFromDir(params.outdir, true, ['**/*[0-9]*.txt'] ) + // Use getAllFilesFromDir() to get a list of all files from the output directory, minus the non-stable files + def stable_content = getAllFilesFromDir(params.outdir, false, ['**/*[0-9]*.txt', '**/stable_name.txt'] ) assert snapshot( // Only snapshot name stable_name*.name, diff --git a/tests/getAllFilesFromDir/main.nf.test.snap b/tests/getAllFilesFromDir/main.nf.test.snap index 7541e65..446cf60 100644 --- a/tests/getAllFilesFromDir/main.nf.test.snap +++ b/tests/getAllFilesFromDir/main.nf.test.snap @@ -2,6 +2,8 @@ "getAllFilesFromDir": { "content": [ [ + "pipeline_info", + "stable", "stable_content.txt", "stable_name.txt" ], @@ -13,6 +15,6 @@ "nf-test": "0.9.0", "nextflow": "24.04.4" }, - "timestamp": "2024-09-30T10:55:42.424751" + "timestamp": "2024-09-30T13:09:08.845794" } -} \ No newline at end of file +} From fcd888903f9f52e493e4cf965ded2f4e197a7b8d Mon Sep 17 00:00:00 2001 From: Maxime U Garcia Date: Mon, 30 Sep 2024 13:26:12 +0200 Subject: [PATCH 5/9] Update README.md Co-authored-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com> --- README.md | 28 +--------------------------- 1 file changed, 1 insertion(+), 27 deletions(-) diff --git a/README.md b/README.md index a5fa95f..7146e1b 100644 --- a/README.md +++ b/README.md @@ -13,37 +13,11 @@ nf-core pipelines create a yml file listing all the versions of the software use Here is an example of this file coming from the rnaseq pipeline. ```yaml -BBMAP_BBSPLIT: - bbmap: 39.01 -CAT_FASTQ: - cat: 8.3 -CUSTOM_CATADDITIONALFASTA: - python: 3.9.5 -CUSTOM_GETCHROMSIZES: - getchromsizes: 1.2 -FASTQC: - fastqc: 0.12.1 -GTF2BED: - perl: 5.26.2 -GTF_FILTER: - python: 3.9.5 -GUNZIP_ADDITIONAL_FASTA: - gunzip: 1.1 -GUNZIP_GTF: - gunzip: 1.1 -STAR_GENOMEGENERATE: - star: 2.7.10a - samtools: 1.18 - gawk: 5.1.0 -TRIMGALORE: - trimgalore: 0.6.7 - cutadapt: 3.4 -UNTAR_SALMON_INDEX: +UNTAR: untar: 1.34 Workflow: nf-core/rnaseq: v3.16.0dev Nextflow: 24.04.4 -``` This function remove the Nextflow version from this yml file, as it is not relevant for the snapshot. From e01e16ccfbd8149028b36ca9dce662f154d039f0 Mon Sep 17 00:00:00 2001 From: Maxime U Garcia Date: Mon, 30 Sep 2024 13:26:21 +0200 Subject: [PATCH 6/9] Update README.md Co-authored-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com> --- README.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 7146e1b..025ad62 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,13 @@ Workflow: nf-core/rnaseq: v3.16.0dev Nextflow: 24.04.4 -This function remove the Nextflow version from this yml file, as it is not relevant for the snapshot. +This function remove the Nextflow version from this yml file, as it is not relevant for the snapshot. Therefore for the purpose of the snapshot, it would consider this to be the contents of the YAML file: + +```yaml +UNTAR: + untar: 1.34 +Workflow: + nf-core/rnaseq: v3.16.0dev Usage: From c1a8a27b6618904cca2b871d1990958b89ed1883 Mon Sep 17 00:00:00 2001 From: Maxime U Garcia Date: Mon, 30 Sep 2024 13:26:30 +0200 Subject: [PATCH 7/9] Update README.md Co-authored-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 025ad62..45e56bf 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,7 @@ Usage: assert snapshot(removeNextflowVersion("$outputDir/pipeline_info/nf_core_rnaseq_software_mqc_versions.yml")).match() ``` -Only argument is path to the file. +The only argument is path to the file which must be a versions file in YAML format as per the nf-core standard. ## `getAllFilesFromDir()` From 288cd4b4f1a7de2057af761364dfaf571deef560 Mon Sep 17 00:00:00 2001 From: Maxime U Garcia Date: Mon, 30 Sep 2024 13:27:18 +0200 Subject: [PATCH 8/9] Update README.md Co-authored-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com> --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index 45e56bf..5c91081 100644 --- a/README.md +++ b/README.md @@ -62,9 +62,7 @@ For this example, we want to snapshot the files that have stable content, and th ```groovy -// Use getAllFilesFromDir() to get a list of all files and folders from the output directory, minus non stable files def stable_name = getAllFilesFromDir(params.outdir, true, ['**/execution_trace*.txt'] ) -// Use getAllFilesFromDir() to get a list of all files from the output directory, minus the non stable files def stable_content = getAllFilesFromDir(params.outdir, false, ['**/execution_trace*.txt', '**/stable_name.txt'] ) assert snapshot( // Only snapshot name as content is not stable From 81bd2372ea30bf9acee15e0d093c72fff6ae4b48 Mon Sep 17 00:00:00 2001 From: maxulysse Date: Mon, 30 Sep 2024 13:32:13 +0200 Subject: [PATCH 9/9] fix Markdown --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5c91081..09066be 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ UNTAR: Workflow: nf-core/rnaseq: v3.16.0dev Nextflow: 24.04.4 +``` This function remove the Nextflow version from this yml file, as it is not relevant for the snapshot. Therefore for the purpose of the snapshot, it would consider this to be the contents of the YAML file: @@ -60,7 +61,6 @@ The last file has no stable content (`execution_trace_2024-09-30_13-10-16.txt`) For this example, we want to snapshot the files that have stable content, and the filenames that have stable names. - ```groovy def stable_name = getAllFilesFromDir(params.outdir, true, ['**/execution_trace*.txt'] ) def stable_content = getAllFilesFromDir(params.outdir, false, ['**/execution_trace*.txt', '**/stable_name.txt'] )