-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multiprocessing / Split delivery code #46
Add multiprocessing / Split delivery code #46
Conversation
* Move core_mem.h below config.h
* Adds --cpu-limit and --parallel-fail-fast arguments * Adds disable, parallel, and setup_script keys to [test] blocks
* Move slot->gate assignment to mp_pool_task() * Remove mmap() to slot->gate. * Change type of ident and log_root variables for the sake of easy (fewer maps)
* Remove multiprocessing.h from other files
* Only initiate a kill if we have more than one process. The current process is already failed out, no need to terminate it again.
* Add get_task_duration() * Add get_pool_show_summary() * Add signaled_by member to MultiProcessingTask * Add time_data member to MultiProcessingTask for duration tracking
* Fix child not returning result of execvp(). task->status is for program status, not fork() status.
* Remove exmain() and dead comments from main()
|
* When strdup fails and the temporary file handle is open, close the handle and die. * reported by @kmacdonald-stsci
203708e
to
a84b874
Compare
…tring * Fix leaks caused by css_filename path and the dirs array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks fine. I have some questions, but no blockers. Also, I noted some areas where I think there could be memory leaks and a possible race condition.
union INIVal val; | ||
|
||
memset(&data, 0, sizeof(data)); | ||
data.src = calloc(PATH_MAX, sizeof(*data.src)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a local array that is always allocated to the same size, maybe just define this as a stack variable, instead of allocating it on the heap to avoid potential memory leaks.
src/utils.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions pushd
and popd
don't appear to be safe for shared memory. It's possible for one or both of these functions to be called at the same time in two different processes, with undefined results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. On the bright side these are not used by the child process(es). I think the point of confusion (or at least the point where it looks like it would cause problems) stems from using pushd
/popd
to enter the package's source directory. At that point the directory is recorded in shared shared memory, and popd
d. This takes place before mp_task_fork()
is called so it should be safe as-is.
When the fork()
occurs later on the child runs chdir(dir_path_waiting_in_shared_memory);
} | ||
recipe_type = recipe_get_type(recipe_dir); | ||
pushd(recipe_dir); | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this segment of code within a set of curly brackets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯
I think the bare pushd
above the curly brackets was supposed to be if (!pushd(receipe_dir))
.
} | ||
|
||
pushd(srcdir); | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this block inside curly brackets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the other one. I must have encased the code in brackets before (forgetting to) write an if
statement
if (globals.jfrog.url) { | ||
guard_free(globals.jfrog.url); | ||
} | ||
globals.jfrog.url = strdup(jfurl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check for NULL returns?
sprintf(bottom_index, "%s/%s/index.html", ctx->storage.wheel_artifact_dir, rec->d_name); | ||
bottom_fp = fopen(bottom_index, "w+"); | ||
if (!bottom_fp) { | ||
return -3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dp
and top_fp
are still open. For a function that can return in several places, but still needs to do clean up before returning, I suggest setting a return value, the jumping to a CLEANUP
label at the end of the function, to ensure all clean up that needs to happen will happen before returning.
sprintf(dpath, "%s/%s", ctx->storage.wheel_artifact_dir, rec->d_name); | ||
struct StrList *packages = listdir(dpath); | ||
if (!packages) { | ||
fclose(top_fp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dp
is sill open.
src/stasis_main.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main function is 500 lines long. I suggest refactoring that into smaller, easier to read and follow functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'll refactor it in a separate PR
} | ||
char *basetemp_path = NULL; | ||
if (get_basetemp_dir_entrypoint(f, &basetemp_path)) { | ||
return -2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a memory leak for output
or is data_out
handled by the calling function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_out
is allocated at the start of get_basetemp_dir_entrypoint
so this might be a leak. I'll have to run it through the debugger to make sure.
task->pid = pid; | ||
task->parent_pid = pid; | ||
|
||
mp_global_task_count++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this increment properly without locking first? Isn't there a possibility for a race condition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I haven't observed any log clobbering at all, but it's possible that you're right and a lock needs to exist... even for a nanosecond.
* All tasks are executed by the same machinery under the hood. So have them all react the same way.
This PR
[test:*]
blocks.delivery.c
into smaller more manageable files.Changes
pip install
in parallel will, more often than not, break thesite-packages
directory. In order to solve this I created a new test block key:script_setup
. All of these setup scripts are executed in series prior to running anyscript
s.disable
key istrue
(default:false
), thescript
will not be executed.parallel
key isfalse
(default:true
) thescript
will be added to the serial task pool. This is useful when you have a huge test suite and want to usepytest-xdist
without oversubscribing the system.setup_script
is always executed. This ensures all test blocks are using package versions defined in the stasis config.DONE
,FAIL
,TERM
(TERM
is used by--parallel-fail-fast
to indicate processes have beenkill()
'd on tear down)fork()
_GNU_SOURCE
is now defined globally at compile-time instead of within the source code._FORTIFY_SOURCE=1
(usecmake -DFORTIFY_SOURCE=ON [..]
to enable)._FORTIFY_SOURCE=2
breaks the code. Variables of typeconst char *
are optimized out all over the place for reasons unknown.New CLI arguments:
--cpu-limit
defines the number of tasks that will run concurrently. If the input value is<1
it is reset to1
. The default isCPU_COUNT - 1
--parallel-fail-fast
terminates all processes in a task pool when an error occurs. The behavior of--continue-on-error
has not changed. If both "fail fast" and "continue on error" are enabled you may end up with a partially tested environment.Notes
You can probably see that
workaround.tox_posargs
has been replaced by a template functiontox_run
... However, I suggest avoiding tox altogether. Tox generates its own virtual environments that share nothing in common with the STASIS test environment, and because dependencies are managed by tox.ini directly (often wide open with no constraints) it's not even testing anything relevant to your delivery.In the near future I'm going to rip out tox-related code. Use
pytest
, or whichever test runner is appropriate for the package you're testing.