Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/list dataset extractors list #1200

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ scripts/keycloak/data/*
# ignore clowder chart deps
deployments/kubernetes/charts/clowder2/charts
deployments/kubernetes/charts/clowder2/*clowder2-software-dev.yaml
*secret*.yaml

# Environments
.env
Expand Down
3 changes: 1 addition & 2 deletions .run/uvicorn.run.xml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="uvicorn" type="PythonConfigurationType" factoryName="Python" nameIsGenerated="true">
<module name="clowder2" />
<option name="ENV_FILES" value="" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
Expand All @@ -14,7 +13,7 @@
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="uvicorn" />
<option name="PARAMETERS" value="app.main:app --reload --host 0.0.0.0" />
<option name="PARAMETERS" value="app.main:app --host 0.0.0.0 --workers 17" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="true" />
Expand Down
28 changes: 28 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,34 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).

## [v2.0.0-beta.3] - 2024-07-29

### Added

- License management
- Release dataset with versions
- Enable and disable user account through Keycloak
- Jupyterhub integration
- Interface for creating and editing matching criteria for triggering extractors
- Interface for editing metadata definitions
- My dataset tab listing all the datasets created by the user
- Drag and drop upload multiple files
- Footer with links to documentation, source code, and contact information
- Documentation through MKDocs

### Changed

- Allow public datasets and files to be searchable
- List all the extractors with the ability to enable/disable the extractors
- Filter listeners based on their support for file or dataset
- Helm chart updated to support custom existing secret

### Fixed

- Clowder registration link on the top bar
- Case-insensitive search
- Download count immediately increments after download

## [v2.0.0-beta.2] - 2024-02-16

### Added
Expand Down
3 changes: 2 additions & 1 deletion backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@ ENV PATH="/code/.venv/bin:$PATH"
COPY ./app /code/app

# launch app using uvicorn
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]
# Number of recommended workers is 2 x number_of_cores +1
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80", "--workers", "17"]
2 changes: 1 addition & 1 deletion backend/app/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ class Settings(BaseSettings):
API_V2_STR: str = "/api/v2"
admin_email: str = "[email protected]"
frontend_url: str = "http://localhost:3000"
version: str = "2.0.0-beta.2"
version: str = "2.0.0-beta.3"

# Unique secret for hashing API keys. Generate with `openssl rand -hex 32`
local_auth_secret = "clowder_secret_key"
Expand Down
6 changes: 4 additions & 2 deletions backend/app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@
description="A cloud native data management framework to support any research domain. Clowder was "
"developed to help researchers and scientists in data intensive domains manage raw data, complex "
"metadata, and automatic data pipelines. ",
version="2.0.0-beta.2",
version="2.0.0-beta.3",
contact={"name": "Clowder", "url": "https://clowderframework.org/"},
license_info={
"name": "Apache 2.0",
Expand Down Expand Up @@ -316,7 +316,9 @@ async def startup_beanie():
ThumbnailDBViewList,
LicenseDB,
],
recreate_views=True,
# If view exists, will not recreate
# When view query changes, make sure to manually drop view and recreate
recreate_views=False,
)


Expand Down
135 changes: 98 additions & 37 deletions backend/app/routers/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@
from pymongo import DESCENDING
from rocrate.model.person import Person
from rocrate.rocrate import ROCrate
from starlette.concurrency import run_in_threadpool

router = APIRouter()
security = HTTPBearer()
Expand Down Expand Up @@ -1192,16 +1193,26 @@ async def download_dataset(
bag_info_path = os.path.join(current_temp_dir, "bag-info.txt")
tagmanifest_path = os.path.join(current_temp_dir, "tagmanifest-md5.txt")

with open(manifest_path, "w") as f:
pass # Create empty file so no errors later if the dataset is empty

with open(bagit_path, "w") as f:
f.write("Bag-Software-Agent: clowder.ncsa.illinois.edu" + "\n")
f.write("Bagging-Date: " + str(datetime.datetime.now()) + "\n")
await run_in_threadpool(lambda: open(manifest_path, "w").close())
await run_in_threadpool(lambda: open(manifest_path, "w").close())
await run_in_threadpool(
lambda: open(bagit_path, "w").write(
"Bag-Software-Agent: clowder.ncsa.illinois.edu"
+ "\n"
+ "Bagging-Date: "
+ str(datetime.datetime.now())
+ "\n"
)
)

with open(bag_info_path, "w") as f:
f.write("BagIt-Version: 0.97" + "\n")
f.write("Tag-File-Character-Encoding: UTF-8" + "\n")
await run_in_threadpool(
lambda: open(bag_info_path, "w").write(
"BagIt-Version: 0.97"
+ "\n"
+ "Tag-File-Character-Encoding: UTF-8"
+ "\n"
)
)

# Write dataset metadata if found
metadata = await MetadataDB.find(
Expand All @@ -1214,6 +1225,10 @@ async def download_dataset(
metadata_content = json_util.dumps(metadata)
with open(datasetmetadata_path, "w") as f:
f.write(metadata_content)
await run_in_threadpool(
lambda: open(datasetmetadata_path, "w").write(metadata_content)
)

crate.add_file(
datasetmetadata_path,
dest_path="metadata/_dataset_metadata.json",
Expand All @@ -1236,16 +1251,20 @@ async def download_dataset(
hierarchy = await _get_folder_hierarchy(file.folder_id, "")
dest_folder = os.path.join(current_temp_dir, hierarchy.lstrip("/"))
if not os.path.isdir(dest_folder):
os.makedirs(dest_folder, exist_ok=True)
await run_in_threadpool(os.makedirs, dest_folder, exist_ok=True)
file_name = hierarchy + file_name
current_file_path = os.path.join(current_temp_dir, file_name.lstrip("/"))

content = fs.get_object(settings.MINIO_BUCKET_NAME, bytes_file_id)
file_md5_hash = hashlib.md5(content.data).hexdigest()
with open(current_file_path, "wb") as f1:
f1.write(content.data)
with open(manifest_path, "a") as mpf:
mpf.write(file_md5_hash + " " + file_name + "\n")
await run_in_threadpool(
lambda: open(current_file_path, "wb").write(content.data)
)
await run_in_threadpool(
lambda: open(manifest_path, "a").write(
file_md5_hash + " " + file_name + "\n"
)
)
crate.add_file(
current_file_path,
dest_path="data/" + file_name,
Expand All @@ -1266,23 +1285,43 @@ async def download_dataset(
current_temp_dir, metadata_filename
)
metadata_content = json_util.dumps(metadata)
with open(metadata_filename_temp_path, "w") as f:
f.write(metadata_content)
await run_in_threadpool(
lambda: open(metadata_filename_temp_path, "w").write(
metadata_content
)
)
crate.add_file(
metadata_filename_temp_path,
dest_path="metadata/" + metadata_filename,
properties={"name": metadata_filename},
)

bag_size_kb = bag_size / 1024

with open(bagit_path, "a") as f:
f.write("Bag-Size: " + str(bag_size_kb) + " kB" + "\n")
f.write("Payload-Oxum: " + str(bag_size) + "." + str(file_count) + "\n")
f.write("Internal-Sender-Identifier: " + dataset_id + "\n")
f.write("Internal-Sender-Description: " + dataset.description + "\n")
f.write("Contact-Name: " + user_full_name + "\n")
f.write("Contact-Email: " + user.email + "\n")
await run_in_threadpool(
lambda: open(bagit_path, "a").write(
"Bag-Size: "
+ str(bag_size_kb)
+ " kB"
+ "\n"
+ "Payload-Oxum: "
+ str(bag_size)
+ "."
+ str(file_count)
+ "\n"
+ "Internal-Sender-Identifier: "
+ dataset_id
+ "\n"
+ "Internal-Sender-Description: "
+ dataset.description
+ "\n"
+ "Contact-Name: "
+ user_full_name
+ "\n"
+ "Contact-Email: "
+ user.email
+ "\n"
)
)
crate.add_file(
bagit_path, dest_path="bagit.txt", properties={"name": "bagit.txt"}
)
Expand All @@ -1296,14 +1335,33 @@ async def download_dataset(
)

# Generate tag manifest file
manifest_md5_hash = hashlib.md5(open(manifest_path, "rb").read()).hexdigest()
bagit_md5_hash = hashlib.md5(open(bagit_path, "rb").read()).hexdigest()
bag_info_md5_hash = hashlib.md5(open(bag_info_path, "rb").read()).hexdigest()

with open(tagmanifest_path, "w") as f:
f.write(bagit_md5_hash + " " + "bagit.txt" + "\n")
f.write(manifest_md5_hash + " " + "manifest-md5.txt" + "\n")
f.write(bag_info_md5_hash + " " + "bag-info.txt" + "\n")
manifest_md5_hash = await run_in_threadpool(
lambda: hashlib.md5(open(manifest_path, "rb").read()).hexdigest()
)
bagit_md5_hash = await run_in_threadpool(
lambda: hashlib.md5(open(bagit_path, "rb").read()).hexdigest()
)
bag_info_md5_hash = await run_in_threadpool(
lambda: hashlib.md5(open(bag_info_path, "rb").read()).hexdigest()
)

await run_in_threadpool(
lambda: open(tagmanifest_path, "w").write(
bagit_md5_hash
+ " "
+ "bagit.txt"
+ "\n"
+ manifest_md5_hash
+ " "
+ "manifest-md5.txt"
+ "\n"
+ bag_info_md5_hash
+ " "
+ "bag-info.txt"
+ "\n"
)
)

crate.add_file(
tagmanifest_path,
dest_path="tagmanifest-md5.txt",
Expand All @@ -1317,13 +1375,16 @@ async def download_dataset(
)
zip_name = dataset.name + version_name + ".zip"
path_to_zip = os.path.join(current_temp_dir, zip_name)
crate.write_zip(path_to_zip)
f = open(path_to_zip, "rb", buffering=0)
zip_bytes = f.read()

await run_in_threadpool(crate.write_zip, path_to_zip) # takes the most time?

f = await run_in_threadpool(open, path_to_zip, "rb", 0)
zip_bytes = await run_in_threadpool(f.read)
stream = io.BytesIO(zip_bytes)
f.close()
await run_in_threadpool(f.close)

try:
shutil.rmtree(current_temp_dir)
await run_in_threadpool(shutil.rmtree, current_temp_dir)
except Exception as e:
print("could not delete file")
print(e)
Expand Down
4 changes: 2 additions & 2 deletions deployments/kubernetes/charts/clowder2/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ description: >
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 1.9.1
version: 1.9.2

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "2.0.0-beta.2"
appVersion: "2.0.0-beta.3"

# List of people that maintain this helm chart.
maintainers:
Expand Down
4 changes: 2 additions & 2 deletions deployments/kubernetes/charts/clowder2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ hostname: { your hostname. E.g. clowder2.software-dev.ncsa.illinois.edu }
frontend:
image:
repository: clowder/clowder2-frontend
tag: 2.0.0-beta.2
tag: 2.0.0-beta.3
pullPolicy: Always

backend:
image:
repository: clowder/clowder2-backend
tag: 2.0.0-beta.2
tag: 2.0.0-beta.3
pullPolicy: Always

geoserver:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "clowder2.fullname" . }} --template "{{"{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}"}}")
echo http://$SERVICE_IP:{{ .Values.frontend.service.port }}
{{- else if contains "ClusterIP" .Values.frontend.service.type }}
export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "clowder2.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}")
export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "clowder2.fullname" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}")
export CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
echo "Visit http://127.0.0.1:8080 to use your application"
kubectl --namespace {{ .Release.Namespace }} port-forward $POD_NAME 8080:$CONTAINER_PORT
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@ Expand the name of the chart.
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Define release name.
*/}}
{{- define "clowder2.releaseName" -}}
{{- default .Release.Name .Values.releaseOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
Expand Down Expand Up @@ -46,7 +53,7 @@ app.kubernetes.io/managed-by: {{ .Release.Service }}
Selector labels
*/}}
{{- define "clowder2.selectorLabels" -}}
app.kubernetes.io/name: {{ include "clowder2.name" . }}
app.kubernetes.io/name: {{ include "clowder2.fullname" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

Expand Down
Loading
Loading