Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs #9

Merged
merged 9 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,97 @@
# nft-utils

nf-test utility functions

This repository contains utility functions for nf-test.

These functions are used to help capture level tests using nf-test.

## `removeNextflowVersion()`

nf-core pipelines create a yml file listing all the versions of the software used in the pipeline.

Here is an example of this file coming from the rnaseq pipeline.

```yaml
BBMAP_BBSPLIT:
bbmap: 39.01
CAT_FASTQ:
cat: 8.3
CUSTOM_CATADDITIONALFASTA:
python: 3.9.5
CUSTOM_GETCHROMSIZES:
getchromsizes: 1.2
FASTQC:
fastqc: 0.12.1
GTF2BED:
perl: 5.26.2
GTF_FILTER:
python: 3.9.5
GUNZIP_ADDITIONAL_FASTA:
gunzip: 1.1
GUNZIP_GTF:
gunzip: 1.1
STAR_GENOMEGENERATE:
star: 2.7.10a
samtools: 1.18
gawk: 5.1.0
TRIMGALORE:
trimgalore: 0.6.7
cutadapt: 3.4
UNTAR_SALMON_INDEX:
untar: 1.34
Workflow:
nf-core/rnaseq: v3.16.0dev
Nextflow: 24.04.4
```
maxulysse marked this conversation as resolved.
Show resolved Hide resolved

This function remove the Nextflow version from this yml file, as it is not relevant for the snapshot.
maxulysse marked this conversation as resolved.
Show resolved Hide resolved

Usage:

```groovy
assert snapshot(removeNextflowVersion("$outputDir/pipeline_info/nf_core_rnaseq_software_mqc_versions.yml")).match()
```

Only argument is path to the file.
maxulysse marked this conversation as resolved.
Show resolved Hide resolved

## `getAllFilesFromDir()`

Files produced by a pipeline can be compared to a snapshot.
This function produces a list of all the content of a directory, and can exclude some files based on a glob pattern.
From there one can get just filenames and snapshot only the names when content is not stable.
Or snapshot the whole list of files that have stable content.

In this example, these are the files produced by a pipeline:

```bash
results/
├── pipeline_info
│   └── execution_trace_2024-09-30_13-10-16.txt
└── stable
├── stable_content.txt
└── stable_name.txt

2 directories, 3 files
```

In this example, 1 file is stable with stable content (`stable_content.txt`), and 1 file is stable with a stable name (`stable_name.txt`).
The last file has no stable content (`execution_trace_2024-09-30_13-10-16.txt`) as its name is based on the date and time of the pipeline execution.

For this example, we want to snapshot the files that have stable content, and the filenames that have stable names.


```groovy
// Use getAllFilesFromDir() to get a list of all files and folders from the output directory, minus non stable files
def stable_name = getAllFilesFromDir(params.outdir, true, ['**/execution_trace*.txt'] )
// Use getAllFilesFromDir() to get a list of all files from the output directory, minus the non stable files
maxulysse marked this conversation as resolved.
Show resolved Hide resolved
def stable_content = getAllFilesFromDir(params.outdir, false, ['**/execution_trace*.txt', '**/stable_name.txt'] )
assert snapshot(
// Only snapshot name as content is not stable
stable_name*.name,
// Snapshot content
stable_content
).match()
```

First argument is the pipeline `outdir` directory path, second is a boolean to include folders, and the third is a list of glob patterns to ignore.
74 changes: 41 additions & 33 deletions src/main/java/nf_core/nf/test/utils/Methods.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,19 @@
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.FileVisitResult;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.PathMatcher;
import java.nio.file.Paths;
import java.nio.file.SimpleFileVisitor;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.regex.Pattern;
import java.util.stream.Collectors;

public class Methods {

Expand Down Expand Up @@ -41,44 +49,44 @@ public static Map<String, Map<String, Object>> removeNextflowVersion(CharSequenc
}

// Return all files in a directory and its sub-directories
// matching or not matching supplied regexes
public static List<File> getAllFilesFromDir(String outdir, boolean includeDir, List<String> excludeRegexes) {
// matching or not matching supplied glob
public static List<File> getAllFilesFromDir(String outdir, boolean includeDir, List<String> ignoreGlobs)
throws IOException {
List<File> output = new ArrayList<>();
File directory = new File(outdir);
Path directory = Paths.get(outdir);

getAllFilesRecursively(directory, includeDir, excludeRegexes, output);

Collections.sort(output);
return output;
}

// Recursively list all files in a directory and its sub-directories
// matching or not matching supplied regexes
private static void getAllFilesRecursively(File directory, boolean includeDir, List<String> excludeRegexes,
List<File> output) {
File[] files = directory.listFiles();
if (files != null) {
for (File file : files) {
boolean matchesInclusion = includeDir || file.isFile();
boolean matchesExclusion = false;
List<PathMatcher> excludeMatchers = new ArrayList<>();
if (ignoreGlobs != null && !ignoreGlobs.isEmpty()) {
for (String glob : ignoreGlobs) {
excludeMatchers.add(FileSystems.getDefault().getPathMatcher("glob:" + glob));
}
}

if (excludeRegexes != null) {
for (String regex : excludeRegexes) {
if (Pattern.matches(regex, file.getName())) {
matchesExclusion = true;
break;
}
}
Files.walkFileTree(directory, new SimpleFileVisitor<Path>() {
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
if (!isExcluded(file)) {
output.add(file.toFile());
}
return FileVisitResult.CONTINUE;
}

if (matchesInclusion && !matchesExclusion) {
output.add(file);
@Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) {
// Exclude output which is the root output folder from nf-test
if (includeDir && (!isExcluded(dir) && !dir.getFileName().toString().equals("output"))) {
output.add(dir.toFile());
}
return FileVisitResult.CONTINUE;
}

if (file.isDirectory()) {
getAllFilesRecursively(file, includeDir, excludeRegexes, output);
}
private boolean isExcluded(Path path) {
return excludeMatchers.stream().anyMatch(matcher -> matcher.matches(directory.relativize(path)));
}
}
});

return output.stream()
.sorted(Comparator.comparing(File::getPath))
.collect(Collectors.toList());
}
}
2 changes: 1 addition & 1 deletion tests/getAllFilesFromDir/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,5 @@ workflow {
"""
I DO NOT HAVE STABLE NAME
""".stripIndent().trim())
.collectFile(storeDir: "${params.outdir}/not_stable", name: "${trace_timestamp}.txt", sort: true, newLine: true)
.collectFile(storeDir: "${params.outdir}/pipeline_info", name: "execution_trace_${trace_timestamp}.txt", sort: true, newLine: true)
}
9 changes: 6 additions & 3 deletions tests/getAllFilesFromDir/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,14 @@ nextflow_pipeline {
}

then {
def timestamp = [/.*\d{4}-\d{2}-\d{2}_\d{2}-\d{2}-\d{2}.*/]
def stable_name = getAllFilesFromDir(params.outdir, false, timestamp)
def stable_content = getAllFilesFromDir(params.outdir, false, timestamp + [/stable_name\.txt/] )
// Use getAllFilesFromDir() to get a list of all files and folders from the output directory, minus the timestamped files
def stable_name = getAllFilesFromDir(params.outdir, true, ['**/*[0-9]*.txt'] )
// Use getAllFilesFromDir() to get a list of all files from the output directory, minus the non-stable files
def stable_content = getAllFilesFromDir(params.outdir, false, ['**/*[0-9]*.txt', '**/stable_name.txt'] )
assert snapshot(
// Only snapshot name
stable_name*.name,
// Snapshot content
stable_content
).match()
}
Expand Down
6 changes: 4 additions & 2 deletions tests/getAllFilesFromDir/main.nf.test.snap
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
"getAllFilesFromDir": {
"content": [
[
"pipeline_info",
"stable",
"stable_content.txt",
"stable_name.txt"
],
Expand All @@ -13,6 +15,6 @@
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-09-30T10:55:42.424751"
"timestamp": "2024-09-30T13:09:08.845794"
}
}
}