v1.2.0 release (#778)

* Organise v1.1.1-dev * Fix changelog formatting and update changelog instructions (#772) * Initial changelog formatting issues * Update changelog + instructions * Updated changelog * Updated Code of conduct (#773) * Updated Code of conduct * Updated changelog * Fixed grammar * Fix zenodo DOI * Fixed typo in README * Shorten forced fit measurement names (#734) * Shorten names * Updated changelog * Update clearpiperun to use raw SQL (#775) * timing and memory benchmark * delete raw initial * adding profiler * optimisation handling exceptions * Added logging * Updated delete_run * Fix syntax errors * Disable triggers to see if that fixes speed issues * Remove memory profiling * Reenabled logging * Add end of loop logging, remove tqdm * Remove all tqdm, improve logging slightly * Added timing * Fixed tqdm missing * Fix logging * Added units to logging * specify source id in logging * Toggle triggers * clean up clearpiperun * Other minor updates * Fix variable name * Correctly handle images and skyregions that are associated with multiple runs * PEP8 * Updated changelog * Remove commented code * Remove whitespace - don't know why the linter didn't pick this up * Update vast_pipeline/management/commands/clearpiperun.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/management/commands/clearpiperun.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/management/commands/clearpiperun.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/management/commands/clearpiperun.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Update vast_pipeline/utils/delete_run.py Co-authored-by: Tom Mauch <[email protected]> * Fix logging count * Clean up logging statements --------- Co-authored-by: Shibli Saleheen <[email protected]> Co-authored-by: Tom Mauch <[email protected]> * Quick memory optimisations (#776) * Use itertuples over iterrows since iterrows is an enormous memory hog. * Drop sources_df columns before renaming id column to avoid a copy of the while dataframe in memory. * Decrease default partition size to 15MB * Dont split (large-in-memory) list of DataFrames into dask bags (No performance hit). * Don't write forced parquets in parallel (No perfomance hit for this). * Dont overwrite input DataFrame when writing parquets. * Update CHANGELOG.md * Address review comments. * Copy YAML objects before revalidation so the can be garbage collected. * Appease flake8 * 750 configure workers (#777) * Use itertuples over iterrows since iterrows is an enormous memory hog. * Drop sources_df columns before renaming id column to avoid a copy of the while dataframe in memory. * Decrease default partition size to 15MB * Dont split (large-in-memory) list of DataFrames into dask bags (No performance hit). * Don't write forced parquets in parallel (No perfomance hit for this). * Initial configuration updates for processing options. * Dont overwrite input DataFrame when writing parquets. * Update CHANGELOG.md * Address review comments. * Copy YAML objects before revalidation so the can be garbage collected. * Appease flake8 * Add processing options as optional with defaults. * filter processing config to parallel association. * Add a funtion to determine the number of workers and partitions for Dask. * Use config values for num_workers and max_partition_size throughout pipeline. * Correct working in config template. * Update CHANGELOG.md * Remove unused imports. * Bump strictyaml to 1.6.2 * Use YAML 'null' to create Python None for all cores option. * Make None the default in `calculate_workers_and_partitions` instead of 0 * Updated run config docs * Allow null for num_workers_io and improve validation of processing parameters. * Update num_workers_io default in docs. --------- Co-authored-by: Dougal Dobie <[email protected]> * Prepare v1.2.0 release --------- Co-authored-by: Shibli Saleheen <[email protected]> Co-authored-by: Tom Mauch <[email protected]>
askap-vast · Nov 7, 2024 · a6a8d99 · a6a8d99
1 parent 8a9c6ff
commit a6a8d99
Show file tree

Hide file tree

Showing 25 changed files with 1,217 additions and 801 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,37 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), with an added `List of PRs` section and links to the relevant PRs on the individual updates. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [1.2.0](https://github.com/askap-vast/vast-pipeline/releases/v1.2.0) (2024-11-07)
+
+#### Added
+
+- Added configuration options to specify number of workers and maximum partition size for parallel operations. [#777](https://github.com/askap-vast/vast-pipeline/pull/777)
+- Added vast_pipeline.utils.delete_run.py to enable deletion of pipeline runs using raw SQL [#775](https://github.com/askap-vast/vast-pipeline/pull/775)
+
+#### Changed
+
+- Small memory optimisations: Use `itertuples` in favour of `iterrows`, Loop over mappings rather than converting them to lists up-front. [#776](https://github.com/askap-vast/vast-pipeline/pull/776)
+- Updated clearpiperun to delete runs using raw SQL rather than via django [#775](https://github.com/askap-vast/vast-pipeline/pull/775)
+- Shortened forced fits measurement names to ensure they fit within the character limits - remove image prefix and limited to 1000 forced fits per source [#734](https://github.com/askap-vast/vast-pipeline/pull/734)
+- Cleaned up Code of Conduct including adding Zenodo DOI [#773](https://github.com/askap-vast/vast-pipeline/pull/773)
+- Updated changelog release instructions to remove each release having an empty "Unreleased" section at the start [#772](https://github.com/askap-vast/vast-pipeline/pull/772)
+
+#### Fixed
+
+- Fixed errant `<strong>` tag inside changelog and added verbatim formatting to other variables throughout [#772](https://github.com/askap-vast/vast-pipeline/pull/772)
+
+#### Removed
+
+
+#### List of PRs
+
+- [#777](https://github.com/askap-vast/vast-pipeline/pull/777): feat: Allow user to specify number of cores and memory size of partitions via configuration.
+- [#776](https://github.com/askap-vast/vast-pipeline/pull/776): fix: Minor memory optimisations
+- [#775](https://github.com/askap-vast/vast-pipeline/pull/775): fix, feat: Enabled deletion of pipeline runs directly using SQL rather than via django
+- [#734](https://github.com/askap-vast/vast-pipeline/pull/734): Shortened forced fits measurement names
+- [#773](https://github.com/askap-vast/vast-pipeline/pull/773): docs: Cleaned up Code of Conduct including adding Zenodo DOI
+- [#772](https://github.com/askap-vast/vast-pipeline/pull/772): fix, docs: Fixed changelog formatting and updated changelog release instructions
+
 ## [1.1.1](https://github.com/askap-vast/vast-pipeline/releases/v1.1.1) (2024-10-15)
 
 #### Added
@@ -17,13 +48,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 
 #### Fixed
 
-- Removed errant <strong> tag from docs header [#766](https://github.com/askap-vast/vast-pipeline/pull/766)
-
-#### Removed
+- Removed errant `<strong>` tag from docs header [#766](https://github.com/askap-vast/vast-pipeline/pull/766)
 
 #### List of PRs
 
-- [#766](https://github.com/askap-vast/vast-pipeline/pull/766): docs: Removed errant <strong> tag from docs header and refreshed docs text
+- [#766](https://github.com/askap-vast/vast-pipeline/pull/766): docs: Removed errant `<strong>` tag from docs header and refreshed docs text
 - [#761](https://github.com/askap-vast/vast-pipeline/pull/762): docs: Add Zenodo DOI
 
 
@@ -33,7 +62,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 
 - Added further memory usage and timing debug logging [#725](https://github.com/askap-vast/vast-pipeline/pull/725)
 - Add support for python 3.10 [#740](https://github.com/askap-vast/vast-pipeline/pull/740)
-- Added support calculate_n_partitions for sensible dask dataframe partitioning [#724](https://github.com/askap-vast/vast-pipeline/pull/724)
+- Added support `calculate_n_partitions` for sensible dask dataframe partitioning [#724](https://github.com/askap-vast/vast-pipeline/pull/724)
 - Added support for compressed FITS files [#694](https://github.com/askap-vast/vast-pipeline/pull/694)
 - Added links to Data Central DAS and the Fink Broker to the source page [#697](https://github.com/askap-vast/vast-pipeline/pull/697/)
 - Added `n_new_sources` column to run model to store the number of new sources in a pipeline run [#676](https://github.com/askap-vast/vast-pipeline/pull/676).
@@ -68,7 +97,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 
 #### Changed
 
-- Updated README.md [#758](https://github.com/askap-vast/vast-pipeline/pull/758)
+- Updated `README.md` [#758](https://github.com/askap-vast/vast-pipeline/pull/758)
 - Force dask<2022.4.2, numpy<1.23, param<2.0 [#728](https://github.com/askap-vast/vast-pipeline/pull/728)
 - Bumped versions for github actions packages [#728](https://github.com/askap-vast/vast-pipeline/pull/728)
 - Changed pipeline.new_sources.parallel_get_rms_measurements to drop all but one RMS measurmeents [#730](https://github.com/askap-vast/vast-pipeline/pull/730)
@@ -153,7 +182,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 
 #### List of PRs
 
-- [#758](https://github.com/askap-vast/vast-pipeline/pull/758): docs: Updated README.md 
+- [#758](https://github.com/askap-vast/vast-pipeline/pull/758): docs: Updated `README.md` 
 - [#755](https://github.com/askap-vast/vast-pipeline/pull/755): fix: Fixed handling of NaNs and negatives in noise image statistics
 - [#754](https://github.com/askap-vast/vast-pipeline/pull/754): fix: Optimise YAML config parsing
 - [#725](https://github.com/askap-vast/vast-pipeline/pull/725): feat: Added further memory usage and timing debug logging 
@@ -172,7 +201,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 - [#685](https://github.com/askap-vast/vast-pipeline/pull/685): docs: Updated `runpipeline` section on CLI docs.
 - [#676](https://github.com/askap-vast/vast-pipeline/pull/676): Removed home counts and new source count.
 - [#665](https://github.com/askap-vast/vast-pipeline/pull/665): Update Gr1N/setup-poetry to v7.
-- [#658](https://github.com/askap-vast/vast-pipeline/pull/658): feat: Add MAX_CUTOUT_IMAGES setting.
+- [#658](https://github.com/askap-vast/vast-pipeline/pull/658): feat: Add `MAX_CUTOUT_IMAGES` setting.
 - [#655](https://github.com/askap-vast/vast-pipeline/pull/655): feat: Add run config option to disable measurement pairs.
 - [#648](https://github.com/askap-vast/vast-pipeline/pull/648): fix: make Image and Measurement creation atomic together.
 - [#653](https://github.com/askap-vast/vast-pipeline/pull/653): fix: Allow forced fitting on images with empty catalogues.

diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -1,7 +1,8 @@
-<!-- copied from https://vast-survey.org/policies.html -->
+<!-- copied from https://vast-survey.org/Code-of-Conduct/ -->
 # Code Of Conduct
 
-By joining the VAST collaboration you agree to adhere to the Code of Conduct below.
+By joining the VAST collaboration you agree to adhere to the [VAST Code of Conduct](https://vast-survey.org/Code-of-Conduct/) which is reproduced below.
+
 We are committed to making this collaboration productive and enjoyable for everyone, regardless of gender, sexual orientation, disability, physical appearance, body size, race, nationality or religion. We will not tolerate harassment of colleagues and students in any form.
 To achieve this, VAST members must endeavour to work together in a cooperative way on scientific projects that fall within the scope of VAST. In particular all members must:
 
@@ -34,10 +35,9 @@ We abide by the principles of openness, respect, and consideration of others of
 
 ## Acknowledgements
 
-We ask that all VAST publications (papers, ATELs, etc) include the line:
-> _This work was done as part of the ASKAP Variables and Slow Transients (VAST) collaboration (Murphy et al. 2013, PASA, 30, 6)._
+If you use this software in your work please acknowledge it by citing the DOI: [10.5281/zenodo.13927015](https://doi.org/10.5281/zenodo.13927015).
 
-Separately, all refereed publications should carry the standard [CSIRO acknowledgement](https://www.atnf.csiro.au/research/publications/Acknowledgements.html):
+All refereed publications using ASKAP data should carry the standard [CSIRO acknowledgement](https://www.atnf.csiro.au/research/publications/Acknowledgements.html):
 > _The Australian SKA Pathfinder is part of the Australia Telescope National Facility which is managed by CSIRO. Operation of ASKAP is funded by the Australian Government with support from the National Collaborative Research Infrastructure Strategy. ASKAP uses the resources of the Pawsey Supercomputing Centre. Establishment of ASKAP, the Murchison Radio-astronomy Observatory and the Pawsey Supercomputing Centre are initiatives of the Australian Government, with support from the Government of Western Australia and the Science and Industry Endowment Fund. We acknowledge the Wajarri Yamatji people as the traditional owners of the Observatory site._
 
-This project is supported by the University of Sydney, the Australian Research Council, and CSIRO.
+This project is supported by the University of Sydney, the Australian Research Council, and the CSIRO.
diff --git a/README.md b/README.md
@@ -14,7 +14,7 @@ This repository holds the code of the VAST Pipeline, a radio transient detection
 
 Please read the [Installation Instructions](https://vast-survey.org/vast-pipeline/v1.0.0/gettingstarted/installation/). If you have any questions or feedback, we welcome you to open an [issue](https://github.com/askap-vast/vast-pipeline/issues). If you are interested in contributing to the code, please read and follow the [Contributing and Developing Guidelines](https://vast-survey.org/vast-pipeline/v1.0.0/developing/intro/).
 
-If using this tool in your research, please cite [10.5281/zenodo.13927015](https://doi.org/10.5281/zenodo.13927016).
+If using this tool in your research, please cite [10.5281/zenodo.13927015](https://doi.org/10.5281/zenodo.13927015).
 
 ## Features
 

diff --git a/docs/developing/github.md b/docs/developing/github.md
@@ -71,7 +71,7 @@ In to order to make a release, please follow these steps:
 4. Bump the version number of the Python package using Poetry, i.e. `poetry version X.Y.Z`. This will update the version number in `pyproject.toml`.
 5. Update the version in `package.json` and `vast_pipeline/_version.py` to match the new version number, then run `npm install` to update the `package-lock.json` file.
 6. Update the "announcement bar" in the documentation to refer to the new release. This can be found in `docs/theme/main.html` at line 37.
-7. Update the [`CHANGELOG.md`](https://github.com/askap-vast/vast-pipeline/blob/master/CHANGELOG.md){:target="_blank"} by making a copy of the "Unreleased" heading at the top, and renaming the second one to the new version. Include a link to the release - it won't exist yet, so just follow the format of the others. After this there should be an "Unreleased" heading at the top, immediately followed by another heading with the new version number, which is followed by all the existing changes.
+7. Update the [`CHANGELOG.md`](https://github.com/askap-vast/vast-pipeline/blob/master/CHANGELOG.md){:target="_blank"} by renaming the "Unreleased" heading to the new version. Include a link to the release - it won't exist yet, so just follow the format of the others. Also remove any empty sub-headings under the new release heading.
 8. Commit all the changes made above to the new branch and push it to GitHub.
 9. Open a PR to merge the new branch into `master`. Note that the default target branch is `dev` so you will need to change this to `master` when creating the PR.
 10. Once the PR has been reviewed and approved, merge the branch into `master`. This can only be done by administrators of the repository.
@@ -82,5 +82,5 @@ In to order to make a release, please follow these steps:
 
 12. Push the tag to GitHub, i.e. `git push origin vX.Y.Z`.
 13. Merge the release branch into `dev`, resolving any conflicts.
-14. Append "dev" to the version numbers in `pyproject.toml`, `package.json` and `vast_pipeline/_version.py`, then run `npm install` to update `package-lock.json`, and commit the changes to `dev`.  This can either be done as a new commit, or while resolving merge conflicts in the previous step, if appropriate.
+14. Append "dev" to the version numbers in `pyproject.toml`, `package.json` and `vast_pipeline/_version.py`, then run `npm install` to update `package-lock.json`. Add a new "Unreleased" heading to the `CHANGELOG.md` with all standard subheadings ("Added", "Changed", "Fixed", "Removed" and "List of PRs"). Commit all changes to `dev` either as a new commit, or while resolving merge conflicts in the previous step.
 15. Create a [new release](https://github.com/askap-vast/vast-pipeline/releases/new) on GitHub that points to the tagged commit on master.
diff --git a/docs/theme/main.html b/docs/theme/main.html
@@ -35,7 +35,7 @@
   </style>
 
   <a href="{{ base_url }}/changelog/">
-     <strong> New in version 1.1.1:</strong> Refreshed docs, support for compressed fits files, python 3.10 and more 🧑‍💻. Check the <strong>release notes!</strong>
+     <strong> New in version 1.2.0:</strong> Specify number of cores and memory partitions via config, delete runs with SQL rather than django, memory optimisations and more 🧑‍💻. Check the <strong>release notes!</strong>
   </a>
 
 {% endblock %}

diff --git a/docs/using/runconfig.md b/docs/using/runconfig.md
@@ -127,6 +127,21 @@ Below is an example of a default `config.yaml` file. Note that no images or othe
       # aggregate pair metrics that are stored in Source objects.
       source_aggregate_pair_metrics_min_abs_vs: 4.3
 
+    processing:
+      # Options to control use of Dask parallelism
+      # NOTE: These are advanced options and you should only change them if you know what you are doing.
+
+      # The total number of workers available to Dask ('null' means use one less than all cores)
+      num_workers: null
+
+      # The number of workers to use for disk IO operations (e.g. when reading images for forced extraction)
+      num_workers_io: 5
+
+      # The default maximum size (in MB) to allow per partition of Dask DataFrames
+      # Increasing this will create fewer partitions and will potentially increase the memory footprint
+      # of parallel tasks.
+      max_partition_mb: 15
+
     ```
 
 !!! note

diff --git a/package-lock.json b/package-lock.json
diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "vast-pipeline",
-  "version": "1.1.1",
+  "version": "1.2.0",
   "description": "Vast Pipeline code base for processing and analysing telescope images from the Square Kilometre Pathfinder",
   "main": "gulpfile.js",
   "scripts": {