Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Junction Management performances ideas #320

Open
sygilber opened this issue Jul 31, 2021 · 13 comments
Open

Junction Management performances ideas #320

sygilber opened this issue Jul 31, 2021 · 13 comments

Comments

@sygilber
Copy link
Contributor

Just sharing here some thoughts about junction code performances.

a) Testing if a junction exists

Using curl against a wrp with only a single junction defined, testing if the junction exists using this URI code takes systematically about 700 ms:

curl -k --user admin@local:******** -H "Content-Type:application/json" -H "Accept: application/json" -X GET https://someappliance/wga/reverseproxy/default/junctions

And using curl against the same wrp, testing if the junction exists using this URI code takes systematically about 500 ms (irrespective if it exists or not):

curl -k --user admin@local:******** -H "Content-Type:application/json" -H "Accept: application/json" -X GET https://someappliance/wga/reverseproxy/default/junctions?junctions_id=/isam

200 ms of difference can be quite a lot when trying to detect presence of 50 junctions defined on a particular WRP, and even more if it is clustered.

So I am planning some refactoring/performance testing of the method _check() (inside junctions.py) to use the second form (junctions?junctions_id) of the restapi query than the first. My only concern was if testing for a non-existing junction using the second restapi format when the junction does not exist would fill the wrp server logs with errors messages (such as DPWWM1346E) but it does not.

@sygilber
Copy link
Contributor Author

Some results are in:

With 100 iteration with current unmodified code:

ibm.isam.set_junctions : Set an array of Junctions -------------------------------------------------------------------------- 222.55s

And with over 100 iteration with modified code (_check()):

ibm.isam.set_junctions : Set an array of Junctions -------------------------------------------------------------------------- 200.88s

So that is already a 10% improvment right there.

ps: I did forced the code to think the junction had to be created on each iteration.

But I will need to tweek the code to handle the case where the junction does not exist and a 500 rc is returned along with this {"message":"DPWWM1346E Cannot show junction\nDPWWA1201E Junction not found"} which raises an error. We would want that to be muted in this particular case.

@Warkdev
Copy link

Warkdev commented Aug 1, 2021

Hey, nice you're attacking this.

One way to reduce the impact of deploying several junctions at once is to lower the amount of API requests. I've found that the biggest gain is doing the following :

  • retrieve list of configured data (with details if possible which is not the case for junctions)
  • force add new junctions (the ones not in the retrieved list)
  • force delete old junctions
  • update junctions (check idempotency first)

All in all, this is reducing the amount of API calls by, at worst, 49 calls for a batch of 50 existing junctions and much more for a batch of 50 new junctions.

@sygilber
Copy link
Contributor Author

sygilber commented Aug 1, 2021

Your right, There is no way right now to obtain in a single API call the full and detailled list of all junctions; only one of the other. Currently, the full list is often re-obtained over and over.

But should I understand that you have a private python code supporting a the simulated batch update method your described ? Anyhow, that could be the next step, i.e. to provide new function in junction.py to do otherwise (batch mode), thus caching and computing at once all required changes.

But for now the approach I am seeking is to make little iterative changes.

@sygilber
Copy link
Contributor Author

sygilber commented Aug 1, 2021

Just sharing my experimental code here for now from junctions.py's _check() method yielding 10% improvement:

def _check(isamAppliance, reverseproxy_id, junctionname):
    ret_obj = isamAppliance.invoke_get("Retrieving the parameters for a single standard or virtual junction",
                                       "{0}/{1}/junctions?junctions_id={2}".format(uri, reverseproxy_id,
                                                                                   junctionname),
                                       requires_modules=requires_modules,
                                       requires_version=requires_version,
                                       ignore_error=True)
    # Assume that if 'message' is present, an error occured thus the junction does not exist.
    if 'message' in ret_obj['data']:
        return False

    return True

Away from the office for a while and this needs more testing before I submit it via a PR.

@sygilber
Copy link
Contributor Author

sygilber commented Aug 1, 2021

Already spoted another possible optimization refered to b):

        if _check(isamAppliance, reverseproxy_id, junction_point) is True:
            logger.debug("Junction exists. Compare junction details.")
            ret_obj = get(isamAppliance, reverseproxy_id=reverseproxy_id, junctionname=junction_point)

In the set() method, if the _check() method returned the junction data (and not just the boolean, the call to get() above would thus not be necessary, This is going in the direction you mentioned which is to reduce API call.

@Warkdev
Copy link

Warkdev commented Aug 1, 2021

Your right, There is no way right now to obtain in a single API call the full and detailled list of all junctions; only one of the other. Currently, the full list is often re-obtained over and over.

But should I understand that you have a private python code supporting a the simulated batch update method your described ? Anyhow, that could be the next step, i.e. to provide new function in junction.py to do otherwise (batch mode), thus caching and computing at once all required changes.

But for now the approach I am seeking is to make little iterative changes.

Yes, this is what we have (but tailored for our deployment approach which is slightly different than the one proposed here). We managed, for the same deployment of a wrp with 200 junctions (incl objectspace, policies,...) to reduce the time from 3h30 to 10 minutes using this technique.

@Warkdev
Copy link

Warkdev commented Aug 1, 2021

Just sharing my experimental code here for now from junctions.py's _check() method yielding 10% improvement:

def _check(isamAppliance, reverseproxy_id, junctionname):
    ret_obj = isamAppliance.invoke_get("Retrieving the parameters for a single standard or virtual junction",
                                       "{0}/{1}/junctions?junctions_id={2}".format(uri, reverseproxy_id,
                                                                                   junctionname),
                                       requires_modules=requires_modules,
                                       requires_version=requires_version,
                                       ignore_error=True)
    # Assume that if 'message' is present, an error occured thus the junction does not exist.
    if 'message' in ret_obj['data']:
        return False

    return True

Away from the office for a while and this needs more testing before I submit it via a PR.

Yes, you'd need to try and catch the error though. I think in case of a http 500, the isam appliance class will throw an exception.

@sygilber
Copy link
Contributor Author

sygilber commented Aug 1, 2021

Some more results:

In cases of 100% idempotency, with original _check() (100 junctions):

ibm.isam.set_junctions : Set an array of Junctions -------------------------------------------------------------------------- 160.26s

In cases of 100% idempotency, with revised _check() (100 junctions):

ibm.isam.set_junctions : Set an array of Junctions --------------------------------------------------------------------------- 80.08s

So indeed, a reduction of api call makes a huge difference.

Will continue testing new code for a while and will consider a PR once we are confident it is rock solid in a few weeks.

@Warkdev
Copy link

Warkdev commented Aug 1, 2021

Each api calls is 500ms. Once you know this, you know how to save time.

Also, think about a way to provide the feedback of what changed or not to ansible. 😉

@vcassidy
Copy link

vcassidy commented Aug 2, 2021

I tackled this problem a while ago for a gui I was writing rather than for Ansible or python api. It is a little ugly but the most effective way to get the details behind all junctions for a WebSEAL instance was to revert to pdadmin commands and parse the response.

  1. Get the list of junctions
    https://{appliance_hostname}/wga/reverseproxy/{reverseproxy_id}/junctions
  2. Parse the response and create a pdadmin s t webseal-name show /junctionname for each junction
  3. Send all the junction show commands for a WebSEAL instance in a single POST to pdadmin interface
    https://{appliance_hostname}/isam/pdadmin/
  4. parse the response

the parsing is ugly but the response format from commandline hasn't changed in many releases. Sending 1 requests to pdadmin for all the junctions will cut the total time taken to a fraction of sending a request per junction.

Hope this is useful

@Warkdev
Copy link

Warkdev commented Aug 2, 2021

This is definitely the most effective way today.

@Warkdev
Copy link

Warkdev commented Sep 30, 2021

I tackled this problem a while ago for a gui I was writing rather than for Ansible or python api. It is a little ugly but the most effective way to get the details behind all junctions for a WebSEAL instance was to revert to pdadmin commands and parse the response.

  1. Get the list of junctions
    https://{appliance_hostname}/wga/reverseproxy/{reverseproxy_id}/junctions
  2. Parse the response and create a pdadmin s t webseal-name show /junctionname for each junction
  3. Send all the junction show commands for a WebSEAL instance in a single POST to pdadmin interface
    https://{appliance_hostname}/isam/pdadmin/
  4. parse the response

the parsing is ugly but the response format from commandline hasn't changed in many releases. Sending 1 requests to pdadmin for all the junctions will cut the total time taken to a fraction of sending a request per junction.

Hope this is useful

Hello, would you mind sharing that part of your code? I did write something very similar. And that's actually way faster and consistant than the api. Which is a pity.

Ps: isva 10.0.2.0 added 2 news server settings.

@tombosmansibm
Copy link
Collaborator

Even better would be if there was a REST endpoint to get all junctions ... https://ibmsecurity.ideas.ibm.com/ideas/ISAM-I-1061

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants