Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unclear testing for query references #95

Open
nbrinckm opened this issue Jun 26, 2020 · 7 comments
Open

unclear testing for query references #95

nbrinckm opened this issue Jun 26, 2020 · 7 comments

Comments

@nbrinckm
Copy link

nbrinckm commented Jun 26, 2020

Description

Hi there, I try to create a service that gives works on a geojson file.
The basic idea is to split a city (with all buildings) into equal size parts (regarding the number of buildings) and to have the buildings in the clusters to be close to each other.

(It is basically from a task to create surveys for students to check taxonomies).

I can run the code itself in a process.
My main problem is the testing. As my test data set is whole city of Chia, Colombia, I run into troubles regarding to the maximum request size.

For the input data I'm already able to set the config (as it is a global object) in the pws configuration module.
For the output, I can ask the pywps server for giving back a reference.

The stuff I'm really in trouble is to query this reference in the testcase:

import os
from pywps import Service, configuration
from pywps.tests import client_for, assert_response_success

import time


from .common import get_output, WPS, OWS, WpsClient
from babybird.processes.wps_split_buildings import SplitBuildings

import geopandas
import requests

# Some of the test code is from here:
# https://github.com/bird-house/emu/blob/master/tests/test_wps_poly_centroid.py

def test_wps_building_splitter():

    current_dir = os.path.dirname(os.path.abspath(__file__))
    data_file = os.path.join(current_dir, 'buildings.json')
    with open(data_file, 'r') as infile:
        data_file_str = infile.read()
    n_parts = 4

    service = Service(processes=[SplitBuildings()])
    print(dir(service))
    client = client_for(service)

    process_identifier = 'splitbuildings'

    configuration.CONFIG.set('server', 'maxrequestsize', '10gb')

    output_element = WPS.Output(
        OWS.Identifier('splittedbuildings'),
    )
    output_element.attrib['asReference'] = 'true'

    response_document_element = WPS.ResponseDocument(
        output_element
    )
    response_document_element.attrib['lineage'] = 'true'
    response_document_element.attrib['status'] = 'true'

    response_form_element = WPS.ResponseForm(response_document_element)

    request_doc = WPS.Execute(
        OWS.Identifier(process_identifier),
        WPS.DataInputs(
            WPS.Input(
                OWS.Identifier('buildings'),
                WPS.Data(WPS.ComplexData(data_file_str))
            ),
            WPS.Input(
                OWS.Identifier('count'),
                WPS.Data(WPS.LiteralData(str(4))) # must be string
            )
        ),
        response_form_element,
        version='1.0.0'
    )

    resp = client.post_xml(doc=request_doc)
    assert_response_success(resp)
    outputs = get_output(resp.xml)
    assert 'splittedbuildings' in outputs.keys()

    url_to_fetch = outputs['splittedbuildings']
    print(url_to_fetch)

    output_data = client.get(url_to_fetch)
    print(output_data)

The service itself is like this:

from pywps import Process, ComplexInput, LiteralInput, LiteralOutput, UOM, ComplexOutput
from pywps.app.Common import Metadata
from pywps import FORMATS

import geopandas

import logging
LOGGER = logging.getLogger("PYWPS")


class SplitBuildings(Process):
    """A process to split buildings in parts."""
    def __init__(self):
        inputs = [
            ComplexInput(
                "buildings", 
                "The buildings to split", 
                abstract="the buildings to split.",
                supported_formats=[
                    FORMATS.JSON,
                ]
            ),
            LiteralInput(
                "count",
                "The count of parts",
                abstract="The count of parts that we want to get.",
                data_type="integer",
            )
        ]
        outputs = [
            ComplexOutput(
                "splittedbuildings",
                "The splitted buildings",
                abstract="The buildings with an area index.",
                supported_formats=[
                    FORMATS.JSON,
                ]
            )
        ]

        super(SplitBuildings, self).__init__(
            self._handler,
            identifier="splitbuildings",
            title="Split the buildings",
            abstract="Split buildings into parts (adding an area index).",
            keywords=['json', 'buildings'],
            metadata=[
                Metadata('PyWPS', 'https://pywps.org/'),
                Metadata('Birdhouse', 'http://bird-house.github.io/'),
                Metadata('PyWPS Demo', 'https://pywps-demo.readthedocs.io/en/latest/'),
            ],
            version='1.0',
            inputs=inputs,
            outputs=outputs,
            store_supported=True,
            status_supported=True
        )

    @staticmethod
    def _handler(request, response):
        geojson_input_file = request.inputs['buildings'][0].file
        n_parts = request.inputs['count'][0].data
        data = geopandas.read_file(geojson_input_file, driver="GeoJSON")

        # some more processing...
        data['areaindex'] = n_parts

        data.to_file('outputfile.geojson', 'GeoJSON')

        response.outputs['splittedbuildings'].file = 'outputfile.geojson'
        return response

When I try to query the url, it doesn't work. (I guess it can be partly because the application may doesn't run on that port; however I haven't seen any documentation on which port it runs then / how to change the url).

As I wrote, I don't know how to really get the result back in the testcase, so that I can check the data after the processing. The testcases I found so far (in the emu repo for example) are all happy with processing literalstrings or with a successful execution of the WPS process, but there was no point in querying the reference urls.
Please help me to understand what I have to do here.

Environment

  • Cookiecutter version: 5351c2f
  • Python version: 3.6.8
  • Operating System: Ubuntu 18.10

Steps to Reproduce

  • clone of the cookiecutter-birdhouse repo
  • created an virtual envirioment & activated it
  • installed the dependencies from the requirements*.txt files
  • make bake (and following in the babybird folder)
  • installed the dependencies there from the requiremts*.txt files
  • installed geopandas
  • wrote the two files menioned above
  • make test

Additional Information

@huard
Copy link
Collaborator

huard commented Jun 26, 2020

Hi Nils,

I think the issue is that the test server is not a file server, so it does serve the output files. However, they should be somewhere on your disk. Note that our test config usually includes

[server]
allowedinputpaths=/

which might help in your case.

Also, you might want to take a look at owslib to make WPS queries. I've recently added support to retrieve files from the local filesystem for exactly this purpose: geopython/OWSLib#680

HTH

@nbrinckm
Copy link
Author

nbrinckm commented Jun 26, 2020

So the testserver has no store functionality? :-(

I realized that the file is created then in the main project folder (next to the makefile). I don't think it is good behaviour and I don't like the idea to rely on this. But thank you for your help anyway.

I also tried to run the test against the live server with the owslib:

import os
import unittest
import owslib.wps

INPUT_FILE = os.path.join(
    os.path.dirname(os.path.abspath(__file__)),
    'tests',
    'buildings.json'
)

URL_WPS = 'http://localhost:5000/wps'

# identifier
IDENTIFIER_PROCESS = 'splitbuildings'
IDENTIFIER_INPUT_BUILDINGS = 'buildings'
IDENTIFIER_INPUT_COUNT = 'count'
IDENTIFIER_OUTPUT = 'splittedbuildings'

COLUMN_AREA_INDEX = 'areaindex'

class TestLiveServer(unittest.TestCase):
    def test_building_splitter(self):
        wps = owslib.wps.WebProcessingService(URL_WPS, verbose=True)

        with open(INPUT_FILE, 'r') as infile:
            input_buildings = infile.read()

        execution = wps.execute(IDENTIFIER_PROCESS,
            inputs=[
                (IDENTIFIER_INPUT_BUILDINGS, owslib.wps.ComplexDataInput(
                    value=input_buildings,
                    mimeType='application/json'
                )),
                (IDENTIFIER_INPUT_COUNT, '4'),
            ],
            output=[
                (IDENTIFIER_OUTPUT, True),
            ]
        )

        wps.monitorExecution(execution)

        outfile = gpd.read_file(execution.processOutputs[0].reference)
        self.assertTrue(COLUMN_AREA_INDEX in outfile.columns)


if __name__ == '__main__':
    unittest.main()

(In this case the file is outside of the tests folder, but next to the makefile for the babybird.)

I changed the default.cfg file, so that it allows 10 gb requests sizes (just to be sure).

After installing and running the babybird application, I get 400 status codes for the post requests (so even on the execution part).

@huard
Copy link
Collaborator

huard commented Jun 26, 2020

You should save process results in self.workdir.

@nbrinckm
Copy link
Author

Thx, with the workdir I get rid of the json file in the main project folder (that was created before in the tests).

But for making it completely clear for me: At the moment there is no way to get the referenced files by the urls given back from the process within the test case?

@nbrinckm
Copy link
Author

Ok test via owslib works after setting the maxrequestsize option.

Still it would be great to handle the reference output in the bird-house-style tests.

@huard
Copy link
Collaborator

huard commented Jun 26, 2020

Agreed. Maybe we could bundle a tiny file server... @cehbrecht Is this something you have considered already?

@cehbrecht
Copy link
Member

Agreed. Maybe we could bundle a tiny file server... @cehbrecht Is this something you have considered already?

@huard I have not thought about it. But because we are using werkzeug we can probably easily configure a data file service which gets (optionally) started by the command line:
https://github.com/bird-house/emu/blob/5811119f870fab71f8df5a44e725ac11c94864fa/emu/cli.py#L83

But this means we need a running wps ... current pywps tests don't need this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants