-
Notifications
You must be signed in to change notification settings - Fork 1
Home
This document is intended for Java developers familiar with writing web applications. The code uses hapi-server's TimeUtil.java class for parsing and formatting times. The code uses hapi-server's URITemplate.java class for resolving time ranges into sub-intervals and other controls where formatted times are useful.
There are presently four or more different Java server implementations. It would be useful to have a framework which handles the "HAPI parts" of the server, such as input validation and data handling, so that those setting up servers can focus on the data sources feeding into the server.
This will be a Java servlet which implements the HAPI protocol (version 3.1), using plugins written and maintained by the data steward. How these plugins connect will be documented below, and may evolve as the server matures, but the intent is that plugins for the Python server might also be used with this server implementation.
Much of this is inspired by the lessons learned implementing Das2Servers, where a perl script would hand off control to a reader, which would generate a stream of data on stdout which would go through the server and then out to the client. The Das2Server would perform additional operations, such as data reduction, before sending out the stream to the client. The state-of-the-art Das2Server is implemented in Python and provides caches of reduced data. The HAPI server has similarities, except that subsetting and formatting must be done to service the request.
This will provide a number of features which the other implementations may be missing. The mature HAPI server will handle input validation, protecting the server from hostile attacks. It may provide caching of data when the readers are slow, and also validate that reader responses are correct. Readers should be allowed to be sloppy, so that boilerplate code for subsetting needn't appear in each reader.
Last, we intend to support use with Docker containers, so that it can be set up easily at such sites. A war file will be the initial target and Docker container will be provided later.
This code can be cloned and compiled against a server such as Apache Tomcat, version 8.0 and up. The server will initially have one dataset, intended to give a simple way to check that everything is running properly.
The server has one variable which configures it, hapi_home. This is loaded into the server using getServletContext().getInitParameter("hapi_home"), and when this is the default, /tmp/hapi-server/, then the environment variable HAPI_HOME is used. So either web.xml is edited, or your container may allow for this to be set externally, and finally the environment variable HAPI_HOME can be set. This will be the name of a web-process-writable directory where the configuration is stored, along with computed and cached responses. When the server is first run, a configuration containing an example data source is created. Here is /tmp/hapi-server/, the default location:
spot9> ls /tmp/hapi-server/
about.json capabilities.json catalog.json config data info
spot9> ls /tmp/hapi-server/config/
catalog.json config.json
This is the configuration for the catalog. This can be just the contents that should be sent when a catalog request is made. When the server is started and the first catalog request is made, this file time is compared to catalog.json in the main directory and updates will be accepted into the server. For example, a new dataset is added and the json saved. A request is made for the catalog and then validity checks are made and it is accepted, or a json formatting error is caught and the server's old catalog is served until the error is corrected.
Entries can also refer to an external source which will generate the catalog. For example, the HAPI server may be wrapping another server type, simply translating responses. In this case, "x_group_id" and "x_source" identify how the catalog entries will be read in.
"catalog" : [
{
"x_group_id": "wrap_hapi_server",
"x_source": "spawn",
"x_command": "wget -O - https://jfaden.net/HapiServerDemo/hapi/catalog",
"x_config": {
"info": {
"source":"spawn",
"command":"wget -O - https://jfaden.net/HapiServerDemo/hapi/info?id=${id}"
},
"data": {
"source": "spawn",
"command": "wget -O - https://jfaden.net/HapiServerDemo/hapi/data?id=${id}&start=${start}&stop=${stop}"
}
}
}
]
In this example, where x_source is "spawn", the command "wget" is spawned and the result will be a catalog. The entries in this catalog will be merged in to the dataset catalog, replacing the one entry with the set returned by the spawn call.
The source can also be "classpath" and a Java method will be called. In this case, the jar file, the class within the jar file, and the static method with the class are identified. The server will call this method and the JSON string returned will be the catalog which is merged in.
This program will be run any time the file time on the cached response is older than the catalog. So to cause the program to rerun and update the cache file, simply touch the file to update its timestamp:
touch ${HAPI_HOME}/config/catalog.json
and then reload the page, or cached response, ${HAPI_HOME}/catalog.json, can be deleted.
The file ${HAPI_HOME}/config/x_landing.json allows you to customize which datasets are shown, with JSON arrays "x_landing_include" and "x_landing_exclude", and integer "x-landing-count".
Each dataset within the HAPI server has a unique id. For example, https://cdaweb.gsfc.nasa.gov/hapi/info?id=GOES13_EPS-MAGED_1MIN uses the id GOES13_EPS-MAGED_1MIN. This will be paired with a time range to request data from the HAPI server. In the server config area, files like .json configure each identifier. This will control the info response as well as the data response.
The file <id>.json contains two nodes, "info" and "data" which will specify each response. If the info node contains a well-formed info response, this response is sent out from the server when info for this id requested. As with the catalog, sanity checks are performed to make sure the info response is properly formed. Once accepted into the server, the info directory will contain the info response. This can be thought of as a cache which is used to quickly provide responses, but also provides transparency so you can see what the server is doing.
The info response can contain macros which allow the specification to remain unchanged even though values change. For example, "stopTime" indicates the end of a dataset, and macros like "now-PT24H" can be used to indicate that the data is valid up to 24 hours ago. Other macros include:
macro | meaning |
---|---|
now | the current time |
now-P1D | 24 hours ago |
lastday | last midnight boundary |
lastday-P5D | five days before the last midnight boundary |
lastminute, lasthour, lastmonth | other boundaries |
The file can also contain an info node with "source"="spawn" and a command which is issued which will provide the response. When this is the case, the command is run and the result is stored in the HAPI_HOME/info cache. Updating the configuration file timestamp (with the touch command) will cause the configuration to be re-read and executed.
Some servers will have many dataset ids which all have the same info response. In this case, the one info response can be specified using the file "config.json". Note this means that "config" and "catalog" cannot be used as dataset names.
Note if <id>.json is used, then updates to catalog.json or config.json are ignored.
Last, if a parameter in the info response has x_format specified to be a Python or Java-style format string, like "%.3f" or "%.4e", then this will be used to format the CSV responses.
The "data" node of configuration files controls how data is read in. There are a number of supported "source" values. For example, "spawn" and then "command" specify a command which is run and the csv data response is provided. Here is a list of supported sources:
source | arguments | description |
---|---|---|
spawn | command | the command is run and its response on stdout is the data response |
hapiserver | url | another HAPI server provides this data |
aggregation | files | template spec like $Y$m$d.csv is used to combine files |
classpath | class | a HapiRecordSource object is created and used to create records |
HAPIRecordSource.java is an interface which describes a source of HAPI records.
It is created by
SourceRegistry.getSource which looks up the data node from the configurations. A HapiRecordSource has the methods:
method | description |
---|---|
hasParamSubsetIterator() | true means the data source can return a subset of the data. |
getIterator(start,stop,params) → Iterator<HapiRecord> | returns an iterator which has a subset of the parameters. |
getIterator(start,stop) → Iterator<HapiRecord> | returns an iterator which returns all parameters. |
hasGranuleIterator() | returns true when the data source must be called granule at a time. The server itself will aggregate the data. |
getGranuleIterator(start,stop) → Iterator<int[]> | returns an iterator which provides the time ranges which must be called. These will be 14-integer time ranges. |
getTimeStamp() | return null or the timestamp of the most recent change to the data, so that the client can cache results. |
and the HapiRecordSource class is here.
A command is executed to create a stream of HAPI records. Let's start with an example:
/home/jbf/ct/hapi/git/server-java/SSCWebServer/src/SSCWebReader.sh data ${id} ${start} ${stop}
In this case the start time, stop time, and id are inserted into the command line and the command is executed. Its stdout is then fed into the data servlet which does additional operations to the data, such as:
- making sure the response has the correct fields, parsing and reformatting times and other fields to the requested format.
- trimming data which is outside of the bounds specified by start and stop, making sure the response is compliant.
- caching slow responses so that if requested again they will be faster.
This is not yet implemented.
So note that the stderr can be used for debug messages. Also note that the server does not sort records which must be monotonic in time (or cull out-of-order records), and this might be done in the future.
It might also be the case that the reader doesn't accept times in arbitrary formats, so the server provides timeFormat and granuleSize parameters. In the config json file, this might look like:
{
"info":{
"source":"spawn",
"command":"/home/jbf/ct/hapi/git/server-java/SSCWebServer/src/SSCWebReader.sh info ${id}"
},
"data": {
"source": "spawn",
"command": "/home/jbf/ct/hapi/git/server-java/SSCWebServer/src/SSCWebReader.sh data ${id} ${start} ${stop}",
"timeFormat": "$Y-$m-$dZ",
"granuleSize": "P1D"
}
}
timeFormat
specifies that the data reader command is expecting times to be
formatted like $Y-$m-$dZ (see URI_Templates). granuleSize
set to P1D means
that the request is broken up into daily (P1D means a duration of one day)
granules, and the reader is called for each granule.
The data command can also have ${parameters}
which will mean that the server
can call the reader to read a specific set of parameters. For example, we
wouldn't want to force the client to read all data out of a CDF file to read
just one parameter, so the reader can handle this. When ${parameters}
is
missing, the reader must return all parameters and then the server will subset
the parameters for the client.
This allows a custom Java class to read in the data.
{
"info":{ "...":"omitted for brevity" },
"data": {
"source": "classpath",
"class": "org.hapiserver.WindSwe2mDataSource",
"args": [ "file:/home/jbf/ct/data.backup/2022/wind_swe_2m/","${id}" ]
}
}
This will create an instance of org.hapiserver.WindSwe2mDataSource, which is a HapiDataSource and is used to load data. The constructor is called with the arguments specified in "args", each can be of type String or number. If args is missing, then four default arguments are passed in: HAPI_HOME, id, info, and data-config. The arguments info and data-config are JSONObjects.
The tag "classpath" is used to specify the name of a jar file which contains the class.
This combines pre-formatted csv responses into a stream of HapiRecords. A file template is created using the URI_Templates specification.
{
"info":{ "...":"omitted for brevity" },
"data": {
"source": "aggregation",
"files": "https://jfaden.net/~jbf/data/gardenhouse/data/PoolTemperature/$Y/28.FF6319A21705.$Y$m$d.csv"
}
}
This will read in files within the time range specified, combining them into one stream.