Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standalone 5/N: Implement and test ZarrStream_s #297

Merged
merged 57 commits into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
c7832d7
Move driver source files and tests to a separate directory.
aliddell Sep 17, 2024
a811e22
Define the Zarr streaming API.
aliddell Sep 17, 2024
cf89e13
Define the Zarr logger.
aliddell Sep 17, 2024
fff86c6
Rename zarr.h to acquire.zarr.h.
aliddell Sep 17, 2024
61a53bc
Merge branch 'standalone-sequence-3' into standalone-sequence-4
aliddell Sep 17, 2024
bdd0c51
Instantiate the logger's mutex.
aliddell Sep 17, 2024
d23fe1a
Implement and test stream settings, with API setters and getters.
aliddell Sep 17, 2024
3cd2bb3
Wrap C API functions in extern "C" {}
aliddell Sep 17, 2024
d099795
Document the StreamSettings getters.
aliddell Sep 17, 2024
cfd412e
Merge remote-tracking branch 'upstream/standalone-sequence-3' into st…
aliddell Sep 17, 2024
4e3b470
Merge remote-tracking branch 'upstream/standalone-sequence-4' into st…
aliddell Sep 17, 2024
da98e57
call it 'type'
aliddell Sep 17, 2024
aa3489e
Merge branch 'standalone-sequence-3' into standalone-sequence-4
aliddell Sep 17, 2024
c3e419b
Merge branch 'standalone-sequence-3' into standalone-sequence-4b
aliddell Sep 17, 2024
fdf5b08
No need to double CHECK the settings pointer.
aliddell Sep 17, 2024
cbbc87d
Implement ZarrStream_s.
aliddell Sep 17, 2024
0f28cfc
Test ZarrStream_s.
aliddell Sep 17, 2024
1ca2a1d
Implement the rest of the Zarr API functions.
aliddell Sep 17, 2024
996d22e
Document that ZarrStream_append will block for compression and flushing
aliddell Sep 18, 2024
d6ea43e
Merge branch 'standalone-sequence-3' into standalone-sequence-4
aliddell Sep 18, 2024
46eb93d
Merge branch 'standalone-sequence-4' into standalone-sequence-4b
aliddell Sep 18, 2024
95973d4
Merge branch 'standalone-sequence-4b' into standalone-sequence-5
aliddell Sep 18, 2024
a8b30e1
Merge branch 'standalone-sequence-5' into standalone-sequence-6
aliddell Sep 18, 2024
e546f7d
Respond to PR comments.
aliddell Sep 18, 2024
2d6aae1
Merge branch 'standalone-sequence-2' into standalone-sequence-3
aliddell Sep 18, 2024
a1b0034
Merge remote-tracking branch 'upstream/main' into standalone-sequence-3
aliddell Sep 18, 2024
cb5414f
Merge remote-tracking branch 'upstream/main' into standalone-sequence-4
aliddell Sep 18, 2024
a4868e8
Merge branch 'standalone-sequence-4' into standalone-sequence-4b
aliddell Sep 18, 2024
784aae6
Merge branch 'standalone-sequence-4b' into standalone-sequence-5
aliddell Sep 18, 2024
eb5b0fa
Merge branch 'standalone-sequence-5' into standalone-sequence-6
aliddell Sep 18, 2024
42a4940
Respond to PR comments.
aliddell Sep 20, 2024
9a6212f
Merge branch 'standalone-sequence-3' into standalone-sequence-4
aliddell Sep 20, 2024
e5f54ec
Merge branch 'standalone-sequence-4' into standalone-sequence-5
aliddell Sep 20, 2024
5076311
Merge branch 'standalone-sequence-6' into standalone-sequence-5
aliddell Sep 20, 2024
2853ad1
wip
aliddell Sep 20, 2024
25bfb8b
Reorder and document settings fields
aliddell Sep 20, 2024
e65f889
Merge branch 'standalone-sequence-3' into standalone-sequence-5
aliddell Sep 20, 2024
b7aeda7
wip
aliddell Sep 20, 2024
e3e6e3f
Remove version specifier from ZarrStream_create.
aliddell Sep 20, 2024
44fe333
Merge branch 'standalone-sequence-3' into standalone-sequence-5
aliddell Sep 20, 2024
401231b
wip
aliddell Sep 20, 2024
d15f431
Document the settings struct a bit.
aliddell Sep 20, 2024
10a6318
Merge branch 'standalone-sequence-3' into standalone-sequence-5
aliddell Sep 20, 2024
e6ad827
wip
aliddell Sep 20, 2024
59bea6c
Fix up some parameters.
aliddell Sep 20, 2024
e984c90
Merge branch 'standalone-sequence-3' into standalone-sequence-5
aliddell Sep 20, 2024
3a0b63a
Update ZarrStream implementation to use settings struct.
aliddell Sep 20, 2024
68dcb03
Remove some redundant code
aliddell Sep 20, 2024
ca93466
Merge remote-tracking branch 'upstream/main' into standalone-sequence-4
aliddell Sep 24, 2024
977b8da
Respond to PR comments
aliddell Sep 24, 2024
d94acb6
Merge branch 'standalone-sequence-4' into standalone-sequence-5
aliddell Sep 24, 2024
39ec6a7
Respond to PR comments
aliddell Sep 24, 2024
e6e978b
Merge branch 'standalone-sequence-4' into standalone-sequence-5
aliddell Sep 24, 2024
bda14d7
Respond to PR comments
aliddell Sep 24, 2024
a2878df
Merge branch 'standalone-sequence-4' into standalone-sequence-5
aliddell Sep 24, 2024
82df0a2
Respond to PR comments
aliddell Sep 25, 2024
99148eb
Merge remote-tracking branch 'upstream/main' into standalone-sequence-5
aliddell Sep 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions include/acquire.zarr.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
#pragma once

#include "zarr.types.h"

#ifdef __cplusplus
extern "C"
{
#endif

/**
* @brief The settings for a Zarr stream.
* @details This struct contains the settings for a Zarr stream, including
* the store path, custom metadata, S3 settings, chunk compression settings,
* dimension properties, whether to stream to multiple levels of detail, the
* pixel data type, and the Zarr format version.
* @note The store path can be a filesystem path or an S3 key prefix. For example,
* supplying an endpoint "s3://my-endpoint.com" and a bucket "my-bucket" with a
* store_path of "my-dataset.zarr" will result in the store being written to
* "s3://my-endpoint.com/my-bucket/my-dataset.zarr".
* @note The dimensions array may be allocated with ZarrStreamSettings_create_dimension_array
* and freed with ZarrStreamSettings_destroy_dimension_array. The order in which you
* set the dimension properties in the array should match the order of the dimensions
* from slowest to fastest changing, for example, [Z, Y, X] for a 3D dataset.
*/
typedef struct ZarrStreamSettings_s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to move this type to zarr.types, or is there a specific reason for keeping it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to bury the main settings struct at the bottom of zarr.types.h.

{
const char* store_path; /**< Path to the store. Filesystem path or S3 key prefix. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe a more robust API would involve passing a store object (either as part of the settings object or separately to the "create stream" function) rather than using a single argument that could have dual meanings. I previously raised this point. However, I don't consider it a critical issue that would prevent progress.

const char* custom_metadata; /**< JSON-formatted custom metadata to be stored with the dataset. */
ZarrS3Settings* s3_settings; /**< Optional S3 settings for the store. */
ZarrCompressionSettings* compression_settings; /**< Optional chunk compression settings for the store. */
ZarrDimensionProperties* dimensions; /**< The properties of each dimension in the dataset. */
size_t dimension_count; /**< The number of dimensions in the dataset. */
bool multiscale; /**< Whether to stream to multiple levels of detail. */
ZarrDataType data_type; /**< The pixel data type of the dataset. */
ZarrVersion version; /**< The version of the Zarr format to use. 2 or 3. */
} ZarrStreamSettings;

typedef struct ZarrStream_s ZarrStream;

/**
* @brief Get the version of the Zarr API.
* @return The version of the Zarr API.
*/
uint32_t Zarr_get_api_version();

/**
* @brief Set the log level for the Zarr API.
* @param level The log level.
* @return ZarrStatusCode_Success on success, or an error code on failure.
*/
ZarrStatusCode Zarr_set_log_level(ZarrLogLevel level);

/**
* @brief Get the log level for the Zarr API.
* @return The log level for the Zarr API.
*/
ZarrLogLevel Zarr_get_log_level();

/**
* @brief Get the message for the given status code.
* @param status The status code.
* @return A human-readable status message.
*/
const char* Zarr_get_status_message(ZarrStatusCode status);

/**
* @brief Allocate memory for the dimension array in the Zarr stream settings struct.
* @param[in, out] settings The Zarr stream settings struct.
* @param dimension_count The number of dimensions in the dataset to allocate memory for.
* @return ZarrStatusCode_Success on success, or an error code on failure.
*/
ZarrStatusCode ZarrStreamSettings_create_dimension_array(ZarrStreamSettings* settings, size_t dimension_count);

/**
* @brief Free memory for the dimension array in the Zarr stream settings struct.
* @param[in, out] settings The Zarr stream settings struct containing the dimension array to free.
*/
void ZarrStreamSettings_destroy_dimension_array(ZarrStreamSettings* settings);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe requiring the settings object for memory allocation and deallocation blurs ownership boundaries. Instead, could the function return a pointer to the dimension array, which the user can then assign to the settings object? This way, when users want to destroy it, they can simply pass that pointer. This approach would clarify ownership and simplify memory management.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but there are two parameters that are set or reset (I double checked this and realized dimension_count is not actually getting reset in ZarrStreamSettings_destroy_dimension_array though, but I'll push a fix for that).


/**
* @brief Create a Zarr stream.
* @param[in, out] settings The settings for the Zarr stream.
* @return A pointer to the Zarr stream struct, or NULL on failure.
*/
ZarrStream* ZarrStream_create(ZarrStreamSettings* settings);

/**
* @brief Destroy a Zarr stream.
* @details This function frees the memory allocated for the Zarr stream.
* @param stream The Zarr stream struct to destroy.
*/
void ZarrStream_destroy(ZarrStream* stream);

/**
* @brief Append data to the Zarr stream.
* @details This function will block while chunks are compressed and written
* to the store. It will return when all data has been written.
* @param[in, out] stream The Zarr stream struct.
* @param[in] data The data to append.
* @param[in] bytes_in The number of bytes in @p data. It should be at least
* the size of a single frame.
* @param[out] bytes_out The number of bytes written to the stream.
* @return ZarrStatusCode_Success on success, or an error code on failure.
*/
ZarrStatusCode ZarrStream_append(ZarrStream* stream,
const void* data,
size_t bytes_in,
size_t* bytes_out);

#ifdef __cplusplus
}
#endif
128 changes: 128 additions & 0 deletions include/zarr.types.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
#ifndef H_ACQUIRE_ZARR_TYPES_V0
#define H_ACQUIRE_ZARR_TYPES_V0

#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>

#ifdef __cplusplus
extern "C"
{
#endif

typedef enum
{
ZarrStatusCode_Success = 0,
ZarrStatusCode_InvalidArgument,
ZarrStatusCode_Overflow,
ZarrStatusCode_InvalidIndex,
ZarrStatusCode_NotYetImplemented,
ZarrStatusCode_InternalError,
ZarrStatusCode_OutOfMemory,
ZarrStatusCode_IOError,
ZarrStatusCode_CompressionError,
ZarrStatusCode_InvalidSettings,
ZarrStatusCodeCount,
} ZarrStatusCode;

typedef enum
{
ZarrVersion_2 = 2,
ZarrVersion_3,
ZarrVersionCount
} ZarrVersion;

typedef enum
{
ZarrLogLevel_Debug = 0,
ZarrLogLevel_Info,
ZarrLogLevel_Warning,
ZarrLogLevel_Error,
ZarrLogLevel_None,
ZarrLogLevelCount
} ZarrLogLevel;

typedef enum
{
ZarrDataType_uint8 = 0,
ZarrDataType_uint16,
ZarrDataType_uint32,
ZarrDataType_uint64,
ZarrDataType_int8,
ZarrDataType_int16,
ZarrDataType_int32,
ZarrDataType_int64,
ZarrDataType_float32,
ZarrDataType_float64,
ZarrDataTypeCount
} ZarrDataType;

typedef enum
{
ZarrCompressor_None = 0,
ZarrCompressor_Blosc1,
ZarrCompressorCount
} ZarrCompressor;

typedef enum
{
ZarrCompressionCodec_None = 0,
ZarrCompressionCodec_BloscLZ4,
ZarrCompressionCodec_BloscZstd,
ZarrCompressionCodecCount
} ZarrCompressionCodec;

typedef enum
{
ZarrDimensionType_Space = 0,
ZarrDimensionType_Channel,
ZarrDimensionType_Time,
ZarrDimensionType_Other,
ZarrDimensionTypeCount
} ZarrDimensionType;

/**
* @brief S3 settings for streaming to Zarr.
*/
typedef struct
{
const char* endpoint;
const char* bucket_name;
const char* access_key_id;
const char* secret_access_key;
} ZarrS3Settings;

/**
* @brief Compression settings for a Zarr array.
* @detail The compressor is not the same as the codec. A codec is
* a specific implementation of a compression algorithm, while a compressor
* is a library that implements one or more codecs.
*/
typedef struct
{
ZarrCompressor compressor; /**< Compressor to use */
ZarrCompressionCodec codec; /**< Codec to use */
uint8_t level; /**< Compression level */
uint8_t shuffle; /**< Whether to shuffle the data before compressing */
} ZarrCompressionSettings;

/**
* @brief Properties of a dimension of the Zarr array.
*/
typedef struct
{
const char* name; /**< Name of the dimension */
ZarrDimensionType type; /**< Type of the dimension */
uint32_t array_size_px; /**< Size of the array along this dimension in
pixels */
uint32_t chunk_size_px; /**< Size of the chunks along this dimension in
pixels */
uint32_t shard_size_chunks; /**< Number of chunks in a shard along this
dimension */
} ZarrDimensionProperties;

#ifdef __cplusplus
}
#endif

#endif // H_ACQUIRE_ZARR_TYPES_V0
3 changes: 3 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
add_subdirectory(logger)
add_subdirectory(streaming)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the "logger" is used exclusively by the streaming target and not by the driver, adding its subdirectory should be placed there instead of at the top level.


if (BUILD_ACQUIRE_DRIVER_ZARR)
add_subdirectory(driver)
endif ()
25 changes: 25 additions & 0 deletions src/logger/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
set(CMAKE_POSITION_INDEPENDENT_CODE ON)

set(tgt acquire-zarr-logger)

add_library(${tgt}
logger.hh
logger.cpp
)

set(PUBLIC_INCLUDE_DIR ${CMAKE_SOURCE_DIR}/include/)

target_include_directories(${tgt}
PUBLIC
$<BUILD_INTERFACE:${PUBLIC_INCLUDE_DIR}>
PRIVATE
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>
)

set_target_properties(${tgt} PROPERTIES
MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>"
)

install(TARGETS ${tgt}
LIBRARY DESTINATION lib
)
84 changes: 84 additions & 0 deletions src/logger/logger.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
#include "logger.hh"

#include <cstdarg>
#include <iomanip>
#include <iostream>
#include <filesystem>
#include <string>
#include <thread>

ZarrLogLevel Logger::current_level_ = ZarrLogLevel_Info;
std::mutex Logger::log_mutex_{};

void
Logger::set_log_level(ZarrLogLevel level)
{
current_level_ = level;
}

ZarrLogLevel
Logger::get_log_level()
{
return current_level_;
}

std::string
Logger::log(ZarrLogLevel level,
const char* file,
int line,
const char* func,
const char* format,
...)
{
std::scoped_lock lock(log_mutex_);
if (current_level_ == ZarrLogLevel_None || level < current_level_) {
return {}; // Suppress logs
}

va_list args;
va_start(args, format);

std::string prefix;
std::ostream* stream = &std::cout;

switch (level) {
case ZarrLogLevel_Debug:
prefix = "[DEBUG] ";
break;
case ZarrLogLevel_Info:
prefix = "[INFO] ";
break;
case ZarrLogLevel_Warning:
prefix = "[WARNING] ";
stream = &std::cerr;
break;
case ZarrLogLevel_Error:
prefix = "[ERROR] ";
stream = &std::cerr;
break;
}

// Get current time
auto now = std::chrono::system_clock::now();
auto time = std::chrono::system_clock::to_time_t(now);
auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(
now.time_since_epoch()) %
1000;

// Get filename without path
std::filesystem::path filepath(file);
std::string filename = filepath.filename().string();

// Output timestamp, log level, filename
*stream << std::put_time(std::localtime(&time), "%Y-%m-%d %H:%M:%S") << '.'
<< std::setfill('0') << std::setw(3) << ms.count() << " " << prefix
<< filename << ":" << line << " " << func << ": ";

char buffer[1024];
vsnprintf(buffer, sizeof(buffer), format, args);
*stream << buffer << std::endl;

va_end(args);

return buffer;
}
30 changes: 30 additions & 0 deletions src/logger/logger.hh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#include "zarr.types.h"

#include <mutex>

class Logger
{
public:
static void set_log_level(ZarrLogLevel level);
static ZarrLogLevel get_log_level();

static std::string log(ZarrLogLevel level,
const char* file,
int line,
const char* func,
const char* format,
...);

private:
static ZarrLogLevel current_level_;
static std::mutex log_mutex_;
};

#define LOG_DEBUG(...) \
Logger::log(ZarrLogLevel_Debug, __FILE__, __LINE__, __func__, __VA_ARGS__)
#define LOG_INFO(...) \
Logger::log(LogLevel_Info, __FILE__, __LINE__, __func__, __VA_ARGS__)
#define LOG_WARNING(...) \
Logger::log(ZarrLogLevel_Warning, __FILE__, __LINE__, __func__, __VA_ARGS__)
#define LOG_ERROR(...) \
Logger::log(ZarrLogLevel_Error, __FILE__, __LINE__, __func__, __VA_ARGS__)
Loading