diff --git a/docs/new_docs/api.md b/docs/new_docs/api.md new file mode 100644 index 00000000..d705264d --- /dev/null +++ b/docs/new_docs/api.md @@ -0,0 +1,193 @@ +### Base API URL: /seeder/api/ + +By default, all API endpoints required the user to be authenticated either in Session (using a cookie) or by an API Token. + +## TODO + +- [ ] Get rid of deprecated Harvest URLs both in API and in Harvest catalogue. +- [ ] https://github.com/WebarchivCZ/Seeder/issues/593 : Better endpoints for Harvests/Collections by date, should replace the current ones. +- [ ] Rewrite the rest of the Harvest URLs and perhaps Dumps to REST ViewSets so REST Framework authentication can be enforced. + +--- + +# Authentication + +### `/seeder/api/token/ [POST]` + +Get a REST Framework token by logging in. + +`POST {username: str, password: str}`: returns `{token: str}` + +### `/seeder/api/auth/login/ [GET, POST]` + +Classic HTML login page. Can also submit {username, password} with POST but a CSRF token is needed. + +### `/seeder/api/auth/logout/ [GET, POST]` + +Simple logout page, any request to it will log out the current user. + +# HarvestConfig + +**API is only accessible using Session authentication, not using a Token.** + +### `/seeder/api/harvest_config/ [GET]` + +- `GET`: List all available Harvest Configurations + +``` +[ + { + "harvest_type": "serials", + "duration": 259200, + "budget": 10000, + "dataLimit": 10000000000, + "documentLimit": 0, + "deduplication": "PATH" + }, + ... +] +``` + +- `GET ?harvest_type={type}`: \ + Retrieve only configurations with the specified `{type}`, still in a list. \ + `{harvest_type}` is a unique property (only one can exist) but if none exist, returns an empty list. + +### `/seeder/api/harvest_config/{id}/ [GET]` + +- `GET $id`: Retrieve a Harvest Configuration by its ID – **not used** + +# Blacklists + +### `/seeder/api/blacklist/ [GET]` + +- `GET`: List all available Blacklists + +``` +[ + { + "id": 1, + "active": true, + "created": "2019-07-03T10:18:30.490837Z", + "last_changed": "2019-07-08T09:56:21.955846Z", + "title": "2. uroven Whitelist Svet", + "blacklist_type": 1, + "url_list": "1229.webnode.cz\r\n14-15.cz\r\n147.228.94.30\r\n147.231.53.91\r\n..." + }, + ... +] +``` + +### `/seeder/api/blacklist/{id}/ [GET]` + +- `GET $id`: Retrieve a specific Blacklist by its ID and all its fields. + +### `/seeder/api/blacklist/lastchanged/ [GET]` + +- `GET`: Retrieve the ISO timestamp of the last change across all Blacklists. + +``` +{ "lastChanged": "2022-05-05T12:00:46.944898Z" } +``` + +# Data Retrieval & Updating + +### `/seeder/api/category/{id}/ [GET]` + +- `GET $id`: Retrieve a specific Category by its ID and all its fields. + +### `/seeder/api/source/{id}/ [GET, PATCH]` + +- `GET $id`: Retrieve a specific Source by its ID and all its fields. +- `PATCH $id {partial_fields}`: Update selected Source's fields using partial update. + +### `/seeder/api/seed/{id}/ [GET, PATCH, PUT]` + +- `GET $id`: Retrieve a specific Seed by its ID and all its fields. +- `PATCH $id {partial_fields}`: Update selected Seed's fields using partial update. +- `PUT $id {fields}`: Update selected Seed's fields in full. + +# Harvest URLs + +At the moment, these are all Django views rather than REST Framework ViewSets, since originally, these were meant to return plaintext rather than JSON. + +Dates are in the format `{YYYY-MM-DD}`, e.g. "2022-05-05". + +Most of these endpoints are now deprecated or should become such. + +### `/seeder/harvests/{date}/harvests [GET]` + +Retrieve a plaintext list of `/seeder/harvest/{id}/urls` endpoints for the Harvests scheduled on the specified date. \ +Each URL is on a separate line. + +### `/seeder/harvests/{id}/urls [GET]` + +Retrieve a plaintext list of all the Seeds (URLs) for a specific Harvest by its ID. \ +Each Seed is on a separate line. + +### `/seeder/harvests/{id}/json [GET]` + +Retrieve a more robust set of information about a specific Harvest by its ID. + +The information includes relevant dates, Harvest's metadata, and all its Seeds split into their relevant collections. + +``` +{ + "idHarvest": 32, + "dateGenerated": "2022-05-05T09:01:56.748312+00:00", + "dateFrozen": null, + "plannedStart": "2021-12-23T11:26:00+00:00", + "type": "serials", + "combined": false, + "name": "Serials_2021-12-23_M0-OneShot", + "anotation": "Serials sklizeň s frekvencí 0x ročně ~ Serials sklizen pro OneShot+Custom seminka", + "hash": "ef9a1bef68e24d9ab5481c5f685fc7b6", + "seedsNo": 94, + "duration": 259200, + "budget": 10000, + "dataLimit": 10000000000, + "documentLimit": 0, + "deduplication": "PATH", + "collections": [ + { + "name": "Serials_M0_2021-12-23", + "collectionAlias": "M0", + "annotation": "Serials sklizeň s frekvencí 0x ročně", + "nameCurator": null, + "idCollection": null, + "aggregationWithSameType": true, + "hash": "fd2c9e4e61377fb93b5ffee86d815f55", + "seedsNo": 48, + "seeds": [ + "http://casodej.cz/analyza17.htm", + "http://dejepis.pajka.info", + "http://dieceze.misto.cz", + ... + ] + }, + ... + ] +} +``` + +### `/seeder/harvests/{date}/shortcut_urls [GET]` + +Retrieve a list of `/seeder/harvests/{date}/seeds-{date}-{shortcut}.txt [GET]` endpoints for the Harvests scheduled on the specified date. \ +Each URL is on a separate line. + +### `/seeder/harvests/{date}/seeds-{date}-{shortcut}.txt [GET]` + +Retrieve a list of Seeds for a particular date and for a particular "shortcut", e.g. "V1", "OneShot", "ArchiveIt", ... + +This endpoint is now deprecated, and while it can still be accessed, it shouldn't be used as it doesn't return sensible data. + +# Dumps + +### `/seeder/source/dump [GET]` + +Retrieve a dump of all public Seeds, one per line. \ +**Accessible publicly, without authentication.** + +### `/seeder/blacklists/dump [GET]` + +Retrieve a dup of all URLs on all Blacklists, one per line. \ +**Accessible publicly, without authentication.** diff --git a/docs/new_docs/openapi_schema.yml b/docs/new_docs/openapi_schema.yml new file mode 100644 index 00000000..bdae6b98 --- /dev/null +++ b/docs/new_docs/openapi_schema.yml @@ -0,0 +1,1348 @@ +openapi: 3.0.2 +info: + title: '' + version: '' +paths: + /seeder/api/category/{id}/: + get: + operationId: retrieveCategory + description: '' + parameters: + - name: id + in: path + required: true + description: A unique integer value identifying this category. + schema: + type: string + responses: + '200': + content: + application/json: + schema: + properties: + id: + type: integer + readOnly: true + name: + type: string + maxLength: 150 + sub_categories: + type: array + items: + properties: + id: + type: integer + readOnly: true + name: + type: string + maxLength: 255 + subcategory_id: + type: string + nullable: true + maxLength: 40 + required: + - name + required: + - name + - sub_categories + description: '' + /seeder/api/source/{id}/: + get: + operationId: retrieveSource + description: Viewset that does not implement listing, deleting and creating + of sources. + parameters: + - name: id + in: path + required: true + description: A unique integer value identifying this Source. + schema: + type: string + responses: + '200': + content: + application/json: + schema: + properties: + id: + type: integer + readOnly: true + publisher: + properties: + id: + type: integer + readOnly: true + contacts: + type: array + items: + properties: + id: + type: integer + readOnly: true + active: + type: boolean + created: + type: string + format: date-time + readOnly: true + last_changed: + type: string + format: date-time + readOnly: true + name: + type: string + nullable: true + maxLength: 256 + email: + type: string + format: email + maxLength: 254 + phone: + type: string + nullable: true + maxLength: 128 + address: + type: string + nullable: true + position: + type: string + nullable: true + maxLength: 256 + publisher: + type: integer + required: + - email + - publisher + active: + type: boolean + created: + type: string + format: date-time + readOnly: true + last_changed: + type: string + format: date-time + readOnly: true + name: + type: string + maxLength: 150 + required: + - contacts + - name + type: object + readOnly: true + seed: + properties: + id: + type: integer + readOnly: true + active: + type: boolean + created: + type: string + format: date-time + readOnly: true + last_changed: + type: string + format: date-time + readOnly: true + main_seed: + type: boolean + url: + type: string + format: uri + maxLength: 200 + pattern: "^(?:[a-z0-9\\.\\-\\+]*)://(?:[^\\s:@/]+(?::[^\\\ + s:@/]*)?@)?(?:(?:0|25[0-5]|2[0-4]\\d|1\\d?\\d?|[1-9]\\d?)(?:\\\ + .(?:0|25[0-5]|2[0-4]\\d|1\\d?\\d?|[1-9]\\d?)){3}|\\[[0-9a-f:\\\ + .]+\\]|([a-z\xA1-\uFFFF0-9](?:[a-z\xA1-\uFFFF0-9-]{0,61}[a-z\xA1\ + -\uFFFF0-9])?(?:\\.(?!-)[a-z\xA1-\uFFFF0-9-]{1,63}(?