Skip to content

Commit

Permalink
More object store documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
jmchilton committed Mar 6, 2023
1 parent b855f39 commit 7688ac2
Show file tree
Hide file tree
Showing 2 changed files with 176 additions and 0 deletions.
4 changes: 4 additions & 0 deletions doc/source/admin/production.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,10 @@ Using a cluster will also net you a fringe benefit: When running tools locally,

Configuration is not difficult once your cluster is set up. Details can be found on the [cluster](cluster.md) page.

### Distributing Data

Information on distributing Galaxy datasets across multiple disk and leveraging services such as S3 can be found in the [Galaxy Training Materials](https://training.galaxyproject.org/training-material/topics/admin/tutorials/object-store/tutorial.html) and in the [sample object_store_conf.xml file](https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/config/sample/object_store_conf.xml.sample).

### Cleaning up datasets

When datasets are deleted from a history or library, it is simply marked as deleted and not actually removed, since it can later be undeleted. To free disk space, a set of scripts can be run (e.g. from `cron`) to remove the data files as specified by local policy. See the [Purge histories and datasets](https://galaxyproject.org/admin/config/performance/purge-histories-and-datasets/) page for instructions.
Expand Down
172 changes: 172 additions & 0 deletions lib/galaxy/config/sample/object_store_conf.xml.sample
Original file line number Diff line number Diff line change
Expand Up @@ -231,3 +231,175 @@
<extra_dir type="temp" path="database/tmp_gcp"/>
</object_store>
-->


<!--
User-Selectable Scratch Storage
This distributed object store will default to a normal
path on disk using the default quota but sets up a second
path with user-private storage a larger quota and warns
the user the disk is routinely cleaned. Relative speed
and stability differences are communicated to the user
using object store badges - as well as how data is backed up
(in the default case) and not backed up for scratch storage.
The admin is responsible for routinely cleaning that storage
using Galaxy's admin scripts - this object store configuration
just allows the user selection and communicates expectations
to the user. Training related to Galaxy cleanup scripts can be
found at:
https://training.galaxyproject.org/training-material/topics/admin/tutorials/maintenance/slides.html.
In this example, the scratch storage is marked as user-private
using the private="true" attribute on the backend definition.
This means it cannot be used in public datasets, shared between
users, etc.. This is more example purposes - you may very well not
want scratch storage to be defined as private as it prevents a lot
of regular functionality and Galaxy handles regularly cleaned
datasets fairly gracefully when the appropriate admin scripts
are used.
-->
<!--
<object_store type="distributed">
<backends>
<backend id="default" allow_selection="true" type="disk" weight="1" name="Default Galaxy Storage">
<description>This is Galaxy's default object store - this disk is regularly backed up and all user's have a default quota of 200 GB.
</description>
<files_dir path="database/objects/deafult"/>
<badges>
<slower />
<more_stable />
<backed_up>Backed up to Galaxy's institutional long term tape drive nightly. More information about our tape drive can be found on our [Archive Tier Storage](https://www.msi.umn.edu/content/archive-tier-storage) page.</backed_up>
</badges>
</backend>
<backend id="scratch" allow_selection="true" type="disk" weight="0" name="Scratch Storage" private="true">
<quota source="second_tier" />
<description>This object store is connected to institutional scratch storage. This disk is not backed up and private to your user and datasets belonging to this storage will be automatically deleted after one month.
</description>
<files_dir path="database/objects/temp"/>
<badges>
<faster />
<less_stable />
<not_backed_up />
<short_term>The data stored here is purged after a month.</short_term>
</badges>
</backend>
</backends>
</object_store>
-->

<!--
User-Selectable Experimental Storage
This distributed object store will default to a normal
path on disk using the default quota but sets up a second
path with more experimental storage (here iRODs) and a higher
quota. The different backup strategies for normal disk and iRODs
as well as their respective stability are communicated to the user
using object store badges.
-->
<!--
<object_store type="distributed">
<backends>
<backend id="default" allow_selection="true" type="disk" weight="1" name="Default Galaxy Storage">
<description>This is Galaxy's default object store - this disk is regularly backed up and all user's have a default quota of 200 GB.
</description>
<files_dir path="database/objects/deafult"/>
<badges>
<more_stable />
<backed_up>Backed up to Galaxy's institutional long term tape drive nightly. More information about our tape drive can be found on our [Archive Tier Storage](https://www.msi.umn.edu/content/archive-tier-storage) page.</backed_up>
</badges>
</backend>
<backend id="experimental" allow_selection="true" type="irods" weight="0" name="Experimental iRODS Storage">
<quota source="irods_qutoa" />
<description>This object store uses our experimental instituional iRODS service. This disk has larger quotas but is more experimental and expected job failure rates are higher.
</description>
<auth username="rods" password="rods" />
<resource name="demoResc" />
<zone name="tempZone" />
<connection host="localhost" port="1247" timeout="30" refresh_time="300" connection_pool_monitor_interval="3600"/>
<cache path="database/object_store_cache_irods" size="1000" />
<badges>
<less_stable />
<backed_up>This data is backed up using iRODs native hierarchal storage management mechanisms. The rules describing how data is stored and backed up in iRODS can be found in our institutional [iRODS documentation](https://irods.org/uploads/2018/Saum-SURFsara-Data_Archiving_in_iRODS-slides.pdf)</backed_up>
</badges>
</backend>
</backends>
</object_store>
-->

<!--
User-Selectable Storage - A Complex Institutional Example
Huge chunks of text were stolen wholesale from MSI's data storage website
(https://www.msi.umn.edu/content/data-storage). Large changes were made and adapted
this for demonstration purposes - none of the text or policies or guarantees reflect
actual current MSI or UMN policies.
-->
<!--
<object_store type="distributed">
<backends>
<backend id="high_performance" allow_selection="true" type="disk" weight="1" name="High Performance Storage">
<description>All MSI researchers have access to a high-performance, high capacity primary storage platform. This system currently provides 3.5 PB (petabytes) of storage. The integrity of the data is protected by daily snapshots and tape backups. It has sustained read and write speeds of up to 25 GB/sec.
There is default access to this storage by any MSI group with an active account. Very large needs can be also met, but need to be approved by the MSI HPC Allocation Committee. More details are available on the [Storage Allocations](https://www.msi.umn.edu/content/storage-allocations) page.
More information about MSI Storage can be found [here](https://www.msi.umn.edu/content/data-storage).
</description>
<files_dir path="/Users/jxc755/workspace/galaxy/database/objects/deafult"/>
<badges>
<faster />
<more_stable />
<backed_up>Backed up to MSI's long term tape drive nightly. More information about our tape drive can be found on our [Archive Tier Storage](https://www.msi.umn.edu/content/archive-tier-storage) page.</backed_up>
</badges>
</backend>
<backend id="second" allow_selection="true" type="disk" weight="0" name="Second Tier Storage">
<quota source="second_tier" />
<description>MSI first added a Ceph object storage system in November 2014 as a second tier storage option. The system currently has around 10 PB of usable storage installed.
MSI's second tier storage is designed to address the growing need for resources that support data-intensive research. It is tightly integrated with other MSI storage and computing resources in order to support a wide variety of research data life cycles and data analysis workflows. In addition, this object storage platform offers new access modes, such as Amazon’s S3 (Simple Storage Service) interface, so that researchers can better manage their data and more seamlessly share data with other researchers whether or not the other researcher has an MSI account or is at the University of Minnesota.
More information about MSI Storage can be found [here](https://www.msi.umn.edu/content/data-storage).
</description>
<files_dir path="/Users/jxc755/workspace/galaxy/database/objects/temp"/>
<badges>
<faster />
<less_stable />
<not_backed_up />
<less_secure>MSI's enterprise level data security policies and montioring have not yet been integrated with Ceph storage.</less_secure>
<short_term>The data stored here is purged after a month.</short_term>
</badges>
</backend>
<backend id="experimental" allow_selection="true" type="disk" weight="0" name="Experimental Scratch" private="true">
<quota enabled="false" />
<description>MSI Ceph storage that is purged more aggressively (weekly instead of monthly) and so it only appropriate for short term methods development and such. The rapid deletion of stored data enables us to provide this storage without a quota.
More information about MSI Storage can be found [here](https://www.msi.umn.edu/content/data-storage).
</description>
<files_dir path="/Users/jxc755/workspace/galaxy/database/objects/temp"/>
<badges>
<faster />
<less_stable />
<not_backed_up />
<less_secure>MSI's enterprise level data security policies and montioring have not yet been integrated with Ceph storage.</less_secure>
<short_term>The data stored here is purged after a week.</short_term>
</badges>
</backend>
<backend id="surfs" allow_selection="true" type="disk" weight="0" name="SURFS" private="true">
<quota source="umn_surfs" />
<description>Much of the data analysis conducted on MSI’s high-performance computing resources uses data gathered from UMN shared research facilities (SRFs). In recognition of the need for short to medium term storage for this data, MSI provides a service, Shared User Research Facilities Storage (SURFS), enabling SRFs to deliver data directly to MSI users. By providing a designated location for this data, MSI can focus data backup and other processes to these key datasets. As part of this service, MSI will provide the storage of the data for one year from its delivery date.
It's expected that the consumers of these data sets will be responsible for discerning which data they may wish to keep past the 1-year term, and finding an appropriate place to keep it. There are several possible storage options both at MSI and the wider university. You can explore your options using OIT’s digital [storage options chooser tool](https://it.umn.edu/services-technologies/comparisons/select-digital-storage-options).
More information about MSI Storage can be found [here](https://www.msi.umn.edu/content/data-storage).</description>
<badges>
<slower />
<more_secure>University of Minnesota data security analysist's have authorized this storage for the storage of human data.</more_secure>
<more_stable />
<backed_up />
</badges>
</backend>
</backends>
</object_store>
-->

0 comments on commit 7688ac2

Please sign in to comment.