-
Notifications
You must be signed in to change notification settings - Fork 97
Conversation
@@ -21,6 +21,10 @@ | |||
|
|||
'use strict'; | |||
|
|||
if (process.isWorkerThread) { | |||
throw new Error('domains are not available inside of worker threads'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a pretty big decision to make. Personally I'm +1 on it but it means that any code that currently runs in child processes won't be able to run on workers.
Petka's original idea was to convert cluster
to use workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it means that any code that currently runs in child processes won't be able to run on workers.
You mean “some code”, right? As in, child processes and workers are incompatible? I’m okay with that as well.
Petka's original idea was to convert
cluster
to use workers.
I would assume by now too much code relies on that not being the case anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean “some code”, right? As in, child processes and workers are incompatible? I’m okay with that as well.
That sentence was missing "and uses domains" :)
I think that people currently using child processes to offload work are the most obvious potential consumers of workers and this might make their life harder.
I'm definitely OK with starting this way in any case.
I would assume by now too much code relies on that not being the case anyway.
I'm not sure why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume by now too much code relies on that not being the case anyway.
I'm not sure why?
For one, the current implementation in this PR doesn’t provide stdio to Workers, but I would be pretty sure a lot of cluster-using code wants that to be there ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it - can't stdio just be done through a MessagePort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjamingr Yes … but that could also be done by the user, right? stdio
in Node/Ayo is complicated, and I’m a bit scared of what happens when multiple handles to the same resource exist (like, some could be set to blocking, data would end up being intertwined, there would be different libuv buffers, and for stdin
there can’t really be any consistent behaviour anyway.
I’ve made console.*
work by just using a fully synchronous stream that operates directly on the file descriptors, which might not be super-efficient but I think it’s reasonable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's reasonable and we can always add it later.
lib/internal/worker.js
Outdated
debug(`[${process.threadId}] created Worker with ID ${this.threadId}`); | ||
|
||
// Actually start the new thread now that everything is in place. | ||
this[kHandle].startThread(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we shouldn't provide .start()
as an external method so creating and starting the work are different, but I don't feel strongly about it since the user can always just keep a function returning a thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it’s an arbitrary decision in my eyes, too. Right now, all communication with the Worker is asynchronous anyway, so I couldn’t find any advantage in adding a separate method, but if there is some, fine by me. :)
lib/internal/worker.js
Outdated
|
||
const assert = require('assert'); | ||
const errors = require('internal/errors'); | ||
const { Serializer, Deserializer } = require('v8'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a way to pass binary data fast between isolates - given we can't pass structured data (right?) I think that we need a way to pass buffers and other node builtins fast if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The custom Serializer does support Node.js Buffers. Are you saying the performance of going through the serializer might be suboptimal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying that passing binary data efficiently between workers needs to be fast, it would be amazing to be able to pass objects faster than structured-cloning them but I don't see that happening any time soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the most commonly misunderstood problems people want to solve with workers is "offload parsing a lot of JSON" where the overhead is creating all the objects and not reading the actual JSON and passing it to a worker won't help (since now 2 copies are made per worker because it's deserializaed from JSON to object, then between workers, then to the user).
I don't think we can fix that though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjamingr Maybe to clarify: Are you saying the serialization (that corresponds to structured cloning, and afaik is what Chrome implements structured cloning on top of) needs to be fast, or that passing in a Buffer on the one side and getting out another Buffer on the other side needs to be fast?
Anyway:
- I think we might be able to get rid of one (?) of the copy operations along the way when a Buffer is passed in by adding a fast case for “this is just an ArrayBufferView”
- I would also like to add
SharedArrayBuffer
/transferable support, now or in the future
I don't think we can fix that though.
Yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that passing in a Buffer on the one side and getting out another Buffer on the other side needs to be fast?
That, I think that passing a buffer should be fast (and optimally a binary stream). I have no hopes for anything that requires actually copying JS objects from one heap to another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won’t be able to get around one of the copies, because any Buffer
that’s being passed in could be modified in the sending thread afterwards, but I think keeping it at a single copy should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjamingr Yeah. A FAQ/explainer for questions like these should be good, though that might sound too much like a Node.js EP... sigh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@addaleax I'm fine with modifying a sent buffer throw
ing or providing a different type.
I'm not too concerned with the API but I definitely think workers should have the capability to pass memory without copies somehow.
For example, if I'm using workers to transcode video and I'm doing it at 20Mbps in 4 workers - that means my process is copying 4.8 gigabits of memory every minute which is a lot of overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with modifying a sent buffer
throw
ing or providing a different type.
Oh, right. I think we can Neuter()
array buffers like that (or even have to do that when using them as transferrables?).
I'm not too concerned with the API but I definitely think workers should have the capability to pass memory without copies somehow.
I think that would be doable with SharedArrayBuffers in any case
(to be clear, leaving out support for SABs/transferrables is not a design decision of this PR, it’s a TODO
of it :D)
Awesome work! I do have some questions about this implementation. It doesn't have to be as detailed (or contentious) as a Node.js EP, but I think these questions are likely to come up in the future. Please understand that, by asking these questions, I'm also genuinely interested in the answers and not trying to pass any judgements on the decisions made by this implementation.
EDIT: Added two more questions. |
Very nice questions, I looked at the code and Anna will probably have more to add on top of it - but doing the best of my ability to answer:
I don't see why not. This design (if I understand correctly) isn't interested with module loading semantics.
Not a lot at all - I'm not sure that's a goal though.
To be honest that's something that I think should be solved by using the
I think CPU intensive work on in-process memory is a much more important goal for web workers. I think as a project Ayo (or Node.js) shouldn't provide any more tools for synchronous I/O since libuv already thread-pools blocking I/O operations like DNS lookups and they look "as asynchronous" to the user. I'll let Domenic answer what he meant though :) (note he has asked to not be pinged in Node.js repos, I'm not sure how/if that holds here). |
You’re not coming across any other way :) Also, it’s fair to question my design decisions, they are somewhat ad-hoc and I wouldn’t be surprised if this PR did a shift in some other direction before it lands. And @benjamingr’s answers are good answers :)
I think |
Hmm, right. I was also more wondering more generally about per-thread resource limits, so not necessarily those exposed through ulimit. I.e. more in the sense of "what else can we do with Workers?" |
If you’re thinking about limiting a Worker’s access to APIs, I’d go with the combination of a Worker + a new VM context that Benjamin mentioned, and let userland handle that case from there. Otherwise, I think this really depends on what kind of limits you’re having in mind? |
a2c7227
to
5a29c75
Compare
I’ve added a TODO list in the PR description, if anybody has interest in jumping in on something specific, please let me know (here, or if that doesn’t work, ping me in the discord channel/twitter/wherever) |
@TimothyGu for your more recent questions:
Yes. V8 already has an implementation of proper message passing code that handles transferring ABs and sharing SABs through workers in
Just to be clear, my personal rationale is more that I’m tired and frustrated with Node’s review process. That this would give Ayo a technical edge over Node is really really nice, but not the main issue for me. I would hope that, if done right, this could also be a good chance to give people who have not felt welcome around Node.js to enter the scene. I would be more than happy to answer questions newcomers have about this PR, whether very abstract or very concretely referring to pieces of code. If that fails, yes, we can go back to looking for Node people. (Also, to clarify: Do you consider yourself a person “capable of reviewing this PR”? I personally would trust your review on this). |
Thank you for this rationale for posting the PR here. It makes total sense, though I agree we'll have to wait and see regarding the reviews we will get.
To a certain extent, yes, I do. On the other hand, I would not feel comfortable signing off on this if this was to go in w/o other more capable pairs of eyes over it :) |
155c764
to
ce036b7
Compare
Fwiw, I just updated the (There are no docs for that yet, but the API matches the MDN descriptions except that events are emitted via a Node.js-style |
06f44c0
to
69b4c36
Compare
debb7ee
to
fb81652
Compare
fb81652
to
b4ae5fa
Compare
Okay, resolved conflicts + got CI back to passing again after the upstream update. |
Just playing with it a bit right now. One thing that might trip up new users a bit is that a Not sure if there's a way to automatically restore the buffer type on the other end or if it should just be mentioned in the docs that this happens. |
Just got https://gist.github.com/Qard/b0928fb4e92f53d4789702cd8ed2641a |
5f8b13b
to
56ceb5e
Compare
Yes – that is tricky. On the one hand, if we want to follow the Web’s structured cloning algorithm, we should stick to it, and we’re supporting serialization between On the other hand, I get that people might expect
Yeah… my bad. :) I’ll align the behaviour with what WebWorkers do (i.e. just ignore posted messages/terminate requests for already-terminated workers) |
56ceb5e
to
36da346
Compare
Taken from petkaantonov/io.js@ea143f7 and modified to fit current linter rules and coding style.
Native addons need to use flags to indicate that they are capable of being loaded by worker threads. Native addons are unloaded if all Environments referring to it have been cleaned up, except if it also loaded by the main Environment.
This should help a lot with actual sandboxing of JS code.
36da346
to
47e8463
Compare
Moving to #58 after we renamed branches |
edit: moved to #58
(status: currently ready for review, by anyone, including you: #40 (comment))
(If that gives the wrong impression: No, this does not conform the browser WebWorker API, but that should be rather easily implementable on top of this, and I’m okay with that.)
Fixes: #31
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesTODO
.start()
manually forMessagePort
sperformance
(currently pending the V8 6.1 update in upstream Node.js)MessagePort
s configurablev8::Platform
implementation with multi-isolate support(@petkaantonov please feel free to indicate whether attributing you for code that comes from your original PR is not enough/too much/just right :) )