Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UIS Delta subscriptions & GraphQL null stripping back-end #118

Merged
merged 3 commits into from
Jun 18, 2020

Conversation

dwsutherland
Copy link
Member

These changes partially address:
https://github.com/cylc/cylc-admin/blob/master/docs/proposal-subscriptions.md
As a back-end solution required to bring the proposal to fruition.

Depends on: cylc/cylc-flow#3500
(They should be merge together, as there is one change that is breaking to the BaseResolvers)

Doesn't break Web UI.

Features:

  • Published deltas of a single main loop iteration grouped into a single message are received (as topic all) and applied to the local data-store (instead of the same multi-topic ones).
  • Use of Imported Cylc-Flow GraphQL backend and middleware to set undesired/empty field values to null, and strip these out of the execution result.
  • New/Working GraphQL subscription of received Workflow Scheduler deltas:
    uis_delta_subs_final

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Already covered by existing tests.
  • No change log entry required (invisible to users).
  • No documentation update required.

@dwsutherland
Copy link
Member Author

Note: should fail until Cylc-Flow sibling PR is merged.

Also need to fix bug in GraphiQL loading the schema:
GraphiQL_Broken_Schema_Load

@dwsutherland
Copy link
Member Author

dwsutherland commented Feb 17, 2020

Ok fixed the GraphiQL problem... It was an issue with the middleware stripping nulls during Introspection.

@kinow
Copy link
Member

kinow commented Feb 28, 2020

@dwsutherland any reason for having a separate section for what was removed, but not for what was added and updated?

From what I understand, it looks like added and updated are both under workflows {}. Which should work fine, but will require an extra step to check whether the task or job exists, then update if so, or add if not. More for my own curiosity 😬

ps: workflow five working fine for me after checking out both branches, I'm already able to start working on the JS branch as we did for subscription/websockets 🎉 thanks!

@dwsutherland
Copy link
Member Author

From what I understand, it looks like added and updated are both under workflows {}. Which should work fine, but will require an extra step to check whether the task or job exists, then update if so, or add if not. More for my own curiosity 😬

@kinow - Trying to figure out what part of the code you're referring too 😕 (sorry)

@kinow
Copy link
Member

kinow commented Mar 2, 2020

From what I understand, it looks like added and updated are both under workflows {}. Which should work fine, but will require an extra step to check whether the task or job exists, then update if so, or add if not. More for my own curiosity 😬

@kinow - Trying to figure out what part of the code you're referring too 😕 (sorry)

Hi @dwsutherland

Sorry, I am talking about the GraphQL output. The query response has deltas { pruned: ..., workflows: ...}. I was just wondering why we didn't have something like deltas { pruned: ..., added: ..., modified: ... }

Sorry for not being very clear.

@hjoliver
Copy link
Member

hjoliver commented Mar 2, 2020

Trying to figure out what part of the code you're referring to

In gif above (in the Issue description) there's a "pruned" section that singles out what's been removed in the latest data push. @kinow is wondering (I think, based on a conversation we had on Friday) if it is also possible to single out what's been added? (then I guess the main body of the data structure would contain just what changed - but has not been added or removed .., .rather than what's now present, which includes what's new and what's changed).

@hjoliver
Copy link
Member

hjoliver commented Mar 2, 2020

(Doh, I was typing at the same time as @kinow 😬 )

@kinow
Copy link
Member

kinow commented Mar 2, 2020

(Doh, I was typing at the same time as @kinow 😬 )

But your explanation was much better than mine. I'm working exactly on this, but still doing JS now, not querying the endpoint yet. So good timing for you to have a look at this @dwsutherland 😬

@dwsutherland
Copy link
Member Author

Ah, I see.. I haven't really made that distinction at the workflow level either, but intend to do so (for the sake of our event driven future).. I guess if something's been added and then modified (if that's even possible in one push), they could be in both added and modified (is that okay? the receiver would then need to order their operations)

@kinow
Copy link
Member

kinow commented Mar 2, 2020

Ah, I see.. I haven't really made that distinction at the workflow level either, but intend to do so (for the sake of our event driven future).. I guess if something's been added and then modified (if that's even possible in one push), they could be in both added and modified (is that okay? the receiver would then need to order their operations)

Ah, I guess it would still work. I would just go and:

  1. first remove the pruned items
  2. then add the new ones
  3. finally update (including anything that was added)

We can do that later. I'll start using this code for the JS code and send a functional reviewer later (not looking much at the code changes, but more how it works, whether it does what it is supposed to do, etc) 👍 Thanks!

@dwsutherland
Copy link
Member Author

Ah, I guess it would still work. I would just go and:

  1. first remove the pruned items
  2. then add the new ones
  3. finally update (including anything that was added)

Maybe do the pruning last, although I would hope/think it wouldn't matter as something shouldn't be pruned and updated in the same push.

@kinow
Copy link
Member

kinow commented Mar 3, 2020

@dwsutherland, the original query used by the tree view (current & infinite) has something like:

query {
  workflows (ids: ["five"]) {
    ...
    taskProxies (sort: ...) {
      ...
      firstParent { }
      task { }
      jobs (sort: ...) { }
    }
  }
}

Using GraphiQL, I created the following subscription for deltas.

subscription {
  deltas (id: "five") {
    pruned { familyProxies taskProxies jobs }
    workflow {
      id
      status
      taskProxies {
        id
        name
        state
        ...
        firstParent {}
        ... # same structure as in the query
      }
      familyProxies { }
    }
  }
}

The subscription is submitted and I am getting the results, but the deltas.workflow.taskProxies (which I am writing the code to add/update now) never includes firstParent, task, or jobs. Only a flat list of key: value.

Q1: Is that intentional? The JS code fails when I add a delta TaskProxy to the tree, because the code for the tree view is expecting a TaskProxy with task, jobs, and the firstParent (to create hierarchy this last one).

Q2: just to confirm, the sorting (sort( key: ... )) is not supported in subscriptions?

Q3: in the query, I can send ids: ["..."], which I think accepts a workflow.id or workflow.name. But in the delta subscription, I think it is accepting only workflow.id. The components/views are all using the workflow.name for now (they get it from the URL, /#/tree/<workflow-name>). Should we have a standard and have id and ids both to allow the same?

Thanks!

@dwsutherland
Copy link
Member Author

The subscription is submitted and I am getting the results, but the deltas.workflow.taskProxies (which I am writing the code to add/update now) never includes firstParent, task, or jobs. Only a flat list of key: value.
Q1: Is that intentional? The JS code fails when I add a delta TaskProxy to the tree, because the code for the tree view is expecting a TaskProxy with task, jobs, and the firstParent (to create hierarchy this last one).

I need to fix how this field is resolved.. I was in two minds to where the resolver would need to look, but this use case makes it clear that sub-field resolution should use the local data-store (not the other delta data) post delta application...

Q2: just to confirm, the sorting (sort( key: ... )) is not supported in subscriptions?

Should be able to reinstate this for subscriptions

Q3: in the query, I can send ids: ["..."], which I think accepts a workflow.id or workflow.name. But in the delta subscription, I think it is accepting only workflow.id. The components/views are all using the workflow.name for now (they get it from the URL, /#/tree/<workflow-name>). Should we have a standard and have id and ids both to allow the same?

Yes, use id for the update application (as ID is set for every delta element), however name should be in the initial data query and the view can display that ... We only send what's changed (other than ID of a delta element).

I'm working on some changes in the backend to distinguish between added and updated deltas.. Will also work on nested field resolvers & sorting/filtering!

@kinow
Copy link
Member

kinow commented Mar 3, 2020

Thanks for the quick reply @dwsutherland ! It's been super easy to work with the deltas so far (easier than I expected). I've left most of the logic in-place in my branch, so once you have the updates for the code I will check it out and test it while finishing the JS code.

@dwsutherland
Copy link
Member Author

dwsutherland commented Mar 4, 2020

@kinow - Also, I didn't implemented nested resolving because it's not consistently possible and I thought that we were aiming for flat.. i.e.:

a {
  b {
    c
    d
  }
}

What if a & d is updated but b isn't? Or if a is a delta but it's reference to b hasn't changed (so the delta doesn't contain that ID to resolve it)

It's technically possible, but would require an argument of each level of nest to specify whether to resolve from the delta info or the data-store.

@dwsutherland
Copy link
Member Author

dwsutherland commented Mar 4, 2020

In the case of firstParent & jobs we should be able to implement that because new taskProxy will have a reference to firstParent in the delta, and new jobs will put a reference in taskProxy .. However a change in Job state won't result in taskProxies { jobs { . . . }} showing up because the reference didn't change.

And now we are splitting things to added and updated, even this won't be possible unless we go to the data-store first and just filter on the specified level of nest.

@kinow
Copy link
Member

kinow commented Mar 4, 2020

Hmmm, tricky. I can handle the flat structure on the client side. I will have to think a little. We don't have a flat structure in the initial query.

It is not easy to fetch tasks or jobs in the original workflow data. These are under workflow.taskProxies or workflow.taskProxies.jobs respectively. There is no lookup table anywhere.

So I think the options I have now are iterate in JS (which could have performance issues) or try to use a flat structure for the tree component. So instead of using workflow.taskProxies.jobs, I would use workflow.taskProxies and workflow.jobs in the query, and then associate them in JS. That way I would be able to receive deltas delta.workflow.jobs and just add to the Vuex store.

🤔

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a functional review, testing if the deltas in this branch worked with the UI. +1, after a few iterations with @dwsutherland , we got it all sorted. Thanks a lot for the help and patience David! Will leave the code review for @hjoliver & @oliver-sanders.

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I've not done heavy testing but I believe @kinow has.

One question.

Comment on lines +155 to +158
if field.name == WORKFLOW:
self.data[w_id][field.name].Clear()
else:
self.data[w_id][field.name].clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here, is there a difference between Clear and clear?

Copy link
Member Author

@dwsutherland dwsutherland May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One's a Protobuf element (i.e. PbWorkflow), so uses Clear, the other(s) a dictionary.

@kinow
Copy link
Member

kinow commented May 24, 2020

From RIOT

Re-synced branches this morning after approving it some time ago. There were now null values in the GraphQL response of the UI subscription. This caused task proxies not to be removed (as a node would get a firstParent { name: null } I think).

image

Adding stripNull: true, in GraphiQL fixed the problem. But it failed with ApolloClient. See query used in this gist.

Looks like the __typename causes the initial data burst to be empty. That __typename is put in the query by ApolloClient, and is used in the UI tree component to decide how to render each node.

Having both __typename and stripNull: true results in empty initial data burst. If I remove stripNull then the query works.

@kinow
Copy link
Member

kinow commented May 24, 2020

@dwsutherland another comment/question, not related to the issue above. I decided to have a quick look at the Graph component this morning, checking if the delta subscription would work.

I don't know much about the Graph component & JS cytoscape, but guessed it was still worth to use the momentum to try and learn and review this PR at the same time.

With this query

collapsed deltas query with edges
subscription ($workflowId: ID) {
  deltas (workflows: [$workflowId], stripNull: true) {
    added {
      edges {
        id
        source
        target
        suicide
        cond
      }
    }
    updated {
      edges {
        id
        source
        target
        suicide
        cond
      }
    }
    pruned {
      edges
    }
  }
}

I get the initial burst of data, and eventually added and pruned data (I guess for five we never get updated edges as the workflow cycles are always the same?).

However, every now and then I also get this empty response.

{
  "deltas": {
    "added": {},
    "updated": {},
    "pruned": {}
  }
}

Is that correct? I was expecting to get data iff there was some data in the server to be sent to the client.

image

@dwsutherland
Copy link
Member Author

Having both __typename and stripNull: true results in empty initial data burst. If I remove stripNull then the query works.

Fixed (in cylc-flow PR)

@dwsutherland
Copy link
Member Author

Is that correct? I was expecting to get data iff there was some data in the server to be sent to the client.

I get the initial burst:

{
  "deltas": {
    "added": {
      "edges": [
        {
          "id": "sutherlander|five|foo.20130808T0000Z|bar.20130808T0000Z",
          "source": "sutherlander|five|20130808T0000Z|foo",
          "target": "sutherlander|five|20130808T0000Z|bar",
          "suicide": false,
          "cond": false
        },
        {
          "id": "sutherlander|five|prep.20130808T0000Z|foo.20130808T0000Z",
          "source": "sutherlander|five|20130808T0000Z|prep",
          "target": "sutherlander|five|20130808T0000Z|foo",
          "suicide": false,
          "cond": false
        }
      ]
    },
    "updated": {},
    "pruned": {}
  }
}

And then added and pruned thereafter...

I could probably strip deltas further (i.e. not send if sub-fields are empty), it happens because the whole delta-store is yielded with it's skeleton structure.

@kinow
Copy link
Member

kinow commented May 24, 2020

@dwsutherland synced this branch to pick up latest fix for the __typename with stripNull: true.

That appears to have fixed the initial data burst. But the JS code failed to render the tree again. A bit of debugging the queries again, it looks like another interesting issue.

Here's what I am getting in deltas.updated.jobs with stripNull: false.

image

Now deltas.updated.jobs with stripNull: true.

image

@dwsutherland it looks like when stripNull: true, the updated jobs in the deltas are always parent-less. Without the firstParent, I can still find the node parent through its ID, but so far I managed to link nodes without having to parse the ID. Is that something simple to fix in the backend?

@dwsutherland
Copy link
Member Author

@dwsutherland it looks like when stripNull: true, the updated jobs in the deltas are always parent-less. Without the firstParent, I can still find the node parent through its ID, but so far I managed to link nodes without having to parse the ID. Is that something simple to fix in the backend?

This works:

      jobs (stripNull: true) {
        id
        state
        firstParent: taskProxy(stripNull: false, deltaStore: false) {
          id
          __typename
        }
      }

It's because the jobs taskProxy field isn't update/in-the-delta .. This will lookup the data-store for the missing relationship

@kinow
Copy link
Member

kinow commented May 25, 2020

Tested, and indeed works! Thanks @dwsutherland !

@dwsutherland
Copy link
Member Author

Tested, and indeed works! Thanks @dwsutherland !

Ok original should work now:

      jobs (stripNull: true) {
        id
        state
        firstParent: taskProxy {
          id
          __typename
        }
      }

Ignored stripping of sub-field if it's return type is in cylc.flow.network.schema.NODE_MAP.

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor issues after enabling strip null in the UI. Working again (thanks to @dwsutherland !)

@oliver-sanders
Copy link
Member

It's because the jobs taskProxy field isn't update/in-the-delta .. This will lookup the data-store for the missing relationship

Aaahhh, that's what the deltaStore argument is for.

@dwsutherland
Copy link
Member Author

dwsutherland commented May 27, 2020

It's because the jobs taskProxy field isn't update/in-the-delta .. This will lookup the data-store for the missing relationship

Aaahhh, that's what the deltaStore argument is for.

It looks up the relationship from the data-store without the arg now, but the arg will still determine whether to try retrieve the node from the delta-store or data-store.

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced today, and tested with the UI. Everything worked fine, re-approved! 🎉

@kinow
Copy link
Member

kinow commented Jun 11, 2020

@dwsutherland I have both branches synced (did that after our Riot meeting some days ago).

And have also finished the code and tests in the WUI. So today and tomorrow are days that I am planning to spend a lot of time looking at the screen while it runs five (tasks attached to ROOT/cyclepoint) and families2 (a workflow I copied from @hjoliver that uses families, and has at least one family whose parent is another family, like ROOT -> FAM1 -> FAM2 -> tasks).

During development I spent a looooong time with five. So decided to focus more on families2 today. And I noticed that after several minutes running (+10min, some times +50 minutes) I would get an error as the JS code tried to add a job, but couldn't find its parent.

After some adding some try/catch with a debugger statement, I managed to capture the moment the error occurs.

And it's strange that apparently I am receiving a Job from the GraphQL endpoint before the cyclepoint and tasks? See screenshot below.

image

The cycle point for the year 2772 is missing. The JS code does some business logic when adding nodes to the tree, like cycle points, tasks, families, etc. However, what you are seeing in the browser console is the output of the lookup, a helper Map that I use to access elements in the tree.

For cycle points, all that I do when adding one is check if (cyclePoint && !this.lookup.has(cyclePoint.id)) { this.lookup.set(cyclePoint.id, cyclePoint) ... }. So cycle points are always added, as long as they do not exist already.

Here's the GraphQL deltas that I received when the exception occurred.

GraphQL deltas response...
{
  "id": "kinow|families2",
  "shutdown": false,
  "added": {
    "jobs": [
      {
        "id": "kinow|families2|27720611T1203+12|qux|1",
        "firstParent": {
          "id": "kinow|families2|27720611T1203+12|qux",
          "__typename": "TaskProxy"
        },
        "batchSysName": "background",
        "host": "localhost",
        "state": "ready",
        "submitNum": 1,
        "__typename": "Job"
      }
    ],
    "__typename": "Added"
  },
  "updated": {
    "taskProxies": [
      {
        "id": "kinow|families2|27700611T1203+12|qux",
        "state": "succeeded",
        "isHeld": false,
        "latestMessage": "succeeded",
        "firstParent": {
          "id": "kinow|families2|27700611T1203+12|FAM6",
          "name": "FAM6",
          "cyclePoint": "27700611T1203+12",
          "state": "succeeded",
          "__typename": "FamilyProxy"
        },
        "task": {
          "meanElapsedTime": 20,
          "name": "qux",
          "__typename": "Task"
        },
        "__typename": "TaskProxy"
      },
      {
        "id": "kinow|families2|27720611T1203+12|qux",
        "state": "ready",
        "isHeld": false,
        "latestMessage": "",
        "firstParent": {
          "id": "kinow|families2|27720611T1203+12|FAM6",
          "name": "FAM6",
          "cyclePoint": "27720611T1203+12",
          "state": "ready",
          "__typename": "FamilyProxy"
        },
        "task": {
          "meanElapsedTime": 20,
          "name": "qux",
          "__typename": "Task"
        },
        "__typename": "TaskProxy"
      }
    ],
    "jobs": [
      {
        "id": "kinow|families2|27700611T1203+12|qux|1",
        "firstParent": {
          "id": "kinow|families2|27700611T1203+12|qux",
          "__typename": "TaskProxy"
        },
        "finishedTime": "2020-06-11T15:31:36+12:00",
        "state": "succeeded",
        "__typename": "Job"
      }
    ],
    "familyProxies": [
      {
        "id": "kinow|families2|27720611T1203+12|FAM5",
        "state": "ready",
        "firstParent": {
          "id": "kinow|families2|27720611T1203+12|root",
          "name": "root",
          "cyclePoint": "27720611T1203+12",
          "state": "ready",
          "__typename": "FamilyProxy"
        },
        "__typename": "FamilyProxy"
      },
      {
        "id": "kinow|families2|27720611T1203+12|FAM6",
        "state": "ready",
        "firstParent": {
          "id": "kinow|families2|27720611T1203+12|FAM5",
          "name": "FAM5",
          "cyclePoint": "27720611T1203+12",
          "state": "ready",
          "__typename": "FamilyProxy"
        },
        "__typename": "FamilyProxy"
      },
      {
        "id": "kinow|families2|27700611T1203+12|FAM6",
        "state": "succeeded",
        "firstParent": {
          "id": "kinow|families2|27700611T1203+12|FAM5",
          "name": "FAM5",
          "cyclePoint": "27700611T1203+12",
          "state": "succeeded",
          "__typename": "FamilyProxy"
        },
        "__typename": "FamilyProxy"
      },
      {
        "id": "kinow|families2|27700611T1203+12|FAM5",
        "state": "succeeded",
        "firstParent": {
          "id": "kinow|families2|27700611T1203+12|root",
          "name": "root",
          "cyclePoint": "27700611T1203+12",
          "state": "succeeded",
          "__typename": "FamilyProxy"
        },
        "__typename": "FamilyProxy"
      }
    ],
    "__typename": "Updated"
  },
  "pruned": {
    "__typename": "Pruned"
  },
  "__typename": "Deltas"
}

Pruned is empty, so we ignore that.

Added has one entry, the job.

Now the interesting part. The updated deltas include the families and task proxies.

I will start writing some debugging code to capture 100 GraphQL messages, and store so that next time this exception occurs I can take a look at previous messages and confirm I didn't get a added.cyclePoints. But from looking at the code, and at the data received, it looks to me like somehow the backend skipped sending me the deltas.added.cyclePoint, and sent the family proxies in the updated key.

Does it make sense? Do you think this scenario is possible by some rare chance?

Thanks!

p.s.1: deltas.added.cyclePoint is for

    cyclePoints: familyProxies(ids: ["root"], ghosts: true) {
      cyclePoint
    }

p.s.2: once families2 is passing with no runtime issues/errors, I still need to test the deltas in the WUI with complex 😨

@kinow
Copy link
Member

kinow commented Jun 12, 2020

Funny, couldn't reproduce it at home, but I noticed my families2 workflow is different, as it's using integer cycle points.

Let me try to reproduce it again next Monday (@dwsutherland don't bother looking into this, as I know you are busy with migration, and it could be something in my environment/branch).

Notes to self

  1. workflow five at home
[scheduling]
    initial cycle point = 20130808T00
#    final cycle point = 20130808T12
    [[dependencies]]
        [[[R1]]]
            graph = "prep => foo"
        [[[PT12H]]]
            graph = "foo[-PT12H] => foo => bar"

[runtime]
  [[root]]
      script="sleep 12"

[visualization]
    initial cycle point = 20130808T00
    final cycle point = 20130808T12
    [[node attributes]]
        foo = "color=red"
        bar = "color=blue"
  1. workflow families2 at home
[scheduling]
  cycling mode = integer
  initial cycle point = 1
  [[dependencies]]
     [[[R/^/P1]]]  # 1, 2, 3, 4, 5, ...
        graph = """foo[-P1] => foo
                 foo => f1 & f2 => bar"""
     [[[R/^/P2]]]  # 1, 3, 5, ...
        graph = "foo => f3 & f4 => bar"

[runtime]
  [[root]]
      script = "sleep $((5 + RANDOM % 10))"
  [[FAM]]
  [[f1, f2, f3, f4]]
     inherit = FAM

Executed the complex workflow today too, and after 10 minutes got no errors in the browser console (only warnings like "[Violation] 'message' handler took 197ms'", which indicates there are ApolloClient functions taking too long to execute; I believe these are functions apply... which apply the deltas).

Also executed five and families2, but found no error. When I started the tests, I had cylc-uiserver with latest commit 411da1ae685f472ad0983f2c09123a549185ee50. Syncing repository it had no changes. cylc-flow was on a3218bd2e43c8bab4eb5ff27157e94004134612c, and syncing I got 04d5cb3210f7c934d5528d144458f41333ee8669 as last commit (this is from this week I think).

I had also prepared a diff that might be helpful when inspecting deltas:

diff --git a/src/components/cylc/tree/deltas.js b/src/components/cylc/tree/deltas.js
index 1176836..f8d90b8 100644
--- a/src/components/cylc/tree/deltas.js
+++ b/src/components/cylc/tree/deltas.js
@@ -195,6 +195,8 @@ function handleDeltas (deltas, tree) {
   }
 }
 
+const TEMP_DELTAS = []
+
 /**
  * @param {null|{
  *   id: string,
@@ -228,6 +230,7 @@ export function applyDeltas (deltas, tree) {
       tree.clear()
       return
     }
+    TEMP_DELTAS.push(deltas)
     if (tree.isEmpty()) {
       // When the tree is null, we have two possible scenarios:
       //   1. This means that we will receive our initial data burst in deltas.added.workflow
@@ -246,6 +249,12 @@ export function applyDeltas (deltas, tree) {
       } catch (error) {
         // eslint-disable-next-line no-console
         console.error('Error applying initial data burst for deltas', error)
+        // eslint-disable-next-line no-console
+        console.log('Printing latest 10 deltas...')
+        TEMP_DELTAS
+          .slice(-10)
+          // eslint-disable-next-line no-console
+          .map(delta => console.log(delta))
         throw error
       }
     } else {
@@ -258,6 +267,12 @@ export function applyDeltas (deltas, tree) {
       } catch (error) {
         // eslint-disable-next-line no-console
         console.error('Error applying deltas', error)
+        // eslint-disable-next-line no-console
+        console.log('Printing latest 10 deltas...')
+        TEMP_DELTAS
+          .slice(-10)
+          // eslint-disable-next-line no-console
+          .map(delta => console.log(delta))
         throw error
       }
     }

That simply prints the latest 10 deltas (I think, haven't tested the code yet). On Monday I will use this code again from my NIWA laptop, and will report back what were the latest deltas, what was the difference in my families2 workflow at NIWA. That should clarify whether we could have a bug in the UIS, or if it's something in JS or in my environment.

@kinow
Copy link
Member

kinow commented Jun 14, 2020

My branches at work are up to date too. The workflow families2 I have here is different (though no idea how that happened).

[scheduling]
initial cycle point = now

[[dependencies]]
[[[P1Y]]]
graph = "FAM3:succeed-all => FAM6"

[runtime]
    [[root]]
        init-script = echo "I'm first"
        env-script = echo "Hi first, I'm second"
        script = sleep 20; echo "RubyRubyRubyRuby"
        exit-script = echo 'Yay!'
        err-script = echo 'Boo!'
    [[FAM]]
    [[FAM2]]
        inherit = FAM
    [[FAM3]]
        inherit = FAM2
    [[foo]]
        inherit = FAM3
    [[bar]]
        inherit = FAM3
    [[FAM4]]
    [[FAM5]]
    [[FAM6]]
        inherit = FAM5
    [[qux]]
        inherit = FAM6
    [[qaz]]
        inherit = FAM4, FAM
    [[qar]]
        inherit = FAM, FAM4, FAM5

However, I haven't been able to reproduce this error today after running the workflow in two browsers for ~30 minutes.

@kinow
Copy link
Member

kinow commented Jun 14, 2020

Spoke too soon, just had that again! Let me apply this patch, and try to capture a sequence of deltas to see if that proves helpful.

@kinow
Copy link
Member

kinow commented Jun 14, 2020

Error occurred after another 5 minutes running the workflow families2 above here at NIWA.

image

The 10 deltas in order to the last one, which caused the error: https://gist.github.com/kinow/374dee8fd5f08620c193e43fe3ab5938

Investigating now, as it could be something in the JS code 👍

@kinow
Copy link
Member

kinow commented Jun 15, 2020

I repeated this test, now printing also everything that was pruned. I had a theory that maybe the parent wasn't there because I pruned it.

For the job without parent, for its cycle point, nothing was pruned. Meaning it wasn't added, and instead the WUI received the data only in the updated key. Will have to wait for an assessment of @dwsutherland here to see if it could be something in the backend 👍

@kinow
Copy link
Member

kinow commented Jun 15, 2020

Query used: https://gist.github.com/kinow/a1f99b40d0deb3766e13b0584a84773c

Variables has only the $workflowId set to kinow|families2.

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@dwsutherland
Copy link
Member Author

dwsutherland commented Jun 18, 2020

I repeated this test, now printing also everything that was pruned. I had a theory that maybe the parent wasn't there because I pruned it.

For the job without parent, for its cycle point, nothing was pruned. Meaning it wasn't added, and instead the WUI received the data only in the updated key. Will have to wait for an assessment of @dwsutherland here to see if it could be something in the backend 👍

I think it might be something to do with job deltas being event driven and task pool updates collected at the end of a main-loop iteration(?) hmmm... not sure.. How does a job start and/or message back without the taskProxies being created in the data-store.. doesn't seem possible ..

Probably a follow on PR would be fine though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants