Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write up tickets #3

Closed
22 tasks done
volcan01010 opened this issue Aug 16, 2021 · 8 comments
Closed
22 tasks done

Write up tickets #3

volcan01010 opened this issue Aug 16, 2021 · 8 comments

Comments

@volcan01010
Copy link
Collaborator

volcan01010 commented Aug 16, 2021

Tasks

Tasks for V1.0 - Minimum viable product

  • HelloWorld FastAPI running in Docker container (based on what we have)
  • Define validation endpoint
  • Define conversion endpoint
  • Define isvalid endpoint
  • CI pipeline running through to test deployment
  • Define function to do validation (with tests)
  • Define function to do conversion (with tests)
  • Define function to check isvalid (with tests)
  • Wire up validation function
  • Wire up conversion function
  • Wire up isvalid function
  • Create front-end HTML form to call the API
  • Wire up form to API
  • CI pipeline builds container in GitHub #18
  • Bug: Missing file in multi-convert #15
  • Bug: Operational Error in k8s multi-validate
  • Limit Total Upload Size - 50MB

Extra tasks:

API spec

/openapi.json

Parameters: None
Response: JSON specification

/docs

Parameters: None
Response: HTML Swagger page with basic interface

/api/v1/validate

POST: Binary upload of file
Response: File download of log
Errors: 400 - some problem with uploaded file

/api/v1/convert

POST: Binary upload of file
Response: File download of log
Errors: 400 - some problem with uploaded file

/api/v1/isvalid

POST: Binary upload of file
Response: {"result": "true"}
Errors: 400 - some problem with uploaded file

/ returns HTML page with a form

Parameters: None
Response: HTML page with from

  • Other organisations will want to host their own versions
  • Form acts as simple tool
  • Minimum is boxes and text to start with

API questions

  • What do we do if processing takes a long time?
  • How do we handle multiple files upload and download (should be separate endpoint e.g. /convertmany)?
  • Should we have a limit on file size?
  • Do you have to specify the sub-version (4.x) that you are aiming for? This is read from the TRAN section, otherwise assumes version 4.1.

Backend functions

def validate(filename: Path, results_dir: Path) -> Path:
    # return synthetic log with exception details on error
    pass

def convert(filename: Path, results_dir: Path) -> Path:
    # returns converted file
    pass

def isvalid(filename: Path) -> Bool:
    pass

class Ags4CliError:
    """Exception raised in response to error with ags4_cli call."""

Notes:

  • The cli tool allows the output file to be specified. We will create name based on input file and put in common directory.
  • The cli tool has an option to specify the AGS4 dictionary file for each conversion. Should we include this at some point? We don't allow users to upload their own dictionaries.
  • The cli convert tool only does .ags <-> .xlsx, so we don't need to capture that information as it infers the transfer from the filename.
  • The cli convert tool has an option to specify numeric data formatting based on the AGS specification. Should we include this at some point. It looks like the default is much better.
  • The report log is just plain text - should we just respond with that instead each time? But then the end user has to parse it - what if they just want to know pass / fail?
@volcan01010
Copy link
Collaborator Author

volcan01010 commented Aug 16, 2021

Issues with real files:

  • A3040_03.ags - Unbound local error following duplicate headers but script continues (many of these)
  • E52A4379 (2).ags - Duplicate entries in MONR causes CLI to block and ask for user input to allow column renaming (a few of these). We should not do renaming, because then the file that is validated is not the file that was uploaded.
  • CG014058_F.ags - Crashes with "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 20618: invalid continuation byte". This corresponds to a \ae symbol.
  • f12743_P.ags - Crashes with "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 76069: invalid start byte". This corresponds to a ^3 (cubed) symbol in the file.

Many files default to the standard dictionary. Most files have errors. Many have thousands of errors. One had 205529 errors.

Some of the files with errors were ISO-8859 encoded. I remember Jenn having all kinds of file encoding issues before. For Unicode errors, the API could report the line number of the offending character. The exception message reports the position in bytes, but line number would be more user friendly.

@volcan01010
Copy link
Collaborator Author

Some files fail and don't get a full output e.g. A1001.ags if they fail. They instead write an error log, but it doesn't contain the full header information.

We should find that text in the output and use it to create a better log file.

@volcan01010
Copy link
Collaborator Author

Can we provide a Python / Requests code snippet in the README that shows how to upload and validate a file in a script?

@volcan01010
Copy link
Collaborator Author

Bugs:

When running against the internally deployed version:

  • Run validatemany with random_binary.ags, nonsense.ags, example1.xlsx, example1.ags, empty.ags, dummy.xlsx as the 6 input files from the landing page. It returns a server error. It should return the concatenated logs for the files (and does on my locally running version)

  • Run convertmany with example1.ags, empty.ags, dummy.xls as the 3 files from the landing page. It returns a zip with empty.xlsx and example1.xlsx. empty.xlsx is corrupt. The conversion log states that all 3 files were converted successfully. The zip should only contain example.xlsx and the log should describe errors for the other two files.

@KoalaGeo
Copy link
Collaborator

Added requirement to limit the upload size to 50MB (would cover vast majority of use cases).

This would be total size limit, so one 50 MB file or fifty 1MB files

May need to set maximum number of files if performance is poor.

Solutions

@volcan01010
Copy link
Collaborator Author

I've moved the contributions to upstream project out to a separate ticket #24

@volcan01010
Copy link
Collaborator Author

I've moved the file size limit to version 2 (#30)

@volcan01010
Copy link
Collaborator Author

All tasks done here and released as v1.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants