Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create working solution for globals issue #4

Open
kotowicz opened this issue Jan 14, 2014 · 23 comments
Open

create working solution for globals issue #4

kotowicz opened this issue Jan 14, 2014 · 23 comments
Assignees

Comments

@kotowicz
Copy link
Owner

Problem statement: #2
Possibly related Matlab issues:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/163362
http://www.mathworks.com/matlabcentral/newsreader/view_thread/287891
http://mathforum.org/kb/message.jspa?messageID=7445456

I'm happy to receive any feedback from those of you working with globals.

@kotowicz
Copy link
Owner Author

detailed problem demo:

>> global a
>> a = 10

a =

    10

>> matlabpool open local
Starting matlabpool using the 'local' profile ... a
connected to 12 workers.

a =

    10

>> a

a =

    10

>> ParforProgressStressTest2

  >> execution time was 1.64s.

ParforProgressStressTest2 running time: 1.6267s.
>> a
Undefined function or variable 'a'.

the global a disappears because JAVAADDPATH clears all globals.

@kotowicz
Copy link
Owner Author

The 0.2.5 release https://github.com/kotowicz/matlab-ParforProgress2/releases/tag/v0.2.5 should address all issues. In case there are still problems with global variables, please report them here.

@kotowicz
Copy link
Owner Author

demo to show that globals do not get cleared anymore:

>> global a
>> a = 10

a =

    10

>> matlabpool open local
Starting matlabpool using the 'local' profile ... connected to 12 workers.

>> ParforProgressStressTest2(10000, 0)

  >> execution time was 1.36s.

ParforProgressStressTest2 running time: 1.3521s.

>> a

a =

    10

>> matlabpool close
Sending a stop signal to all the workers ... stopped.

>> ParforProgressStressTest2(10000, 0)

  >> execution time was 3.18s.

ParforProgressStressTest2 running time: 3.1779s.

>> a

a =

    10

>> matlabpool open local
Starting matlabpool using the 'local' profile ... connected to 12 workers.

>> ParforProgressStressTest2(10000, 0)

  >> execution time was 1.3s.

ParforProgressStressTest2 running time: 1.2994s.

>> matlabpool close
Sending a stop signal to all the workers ... stopped.

>> a

a =

    10

@chinasaur
Copy link
Contributor

I haven't had a chance to look into the code yet; will try this week. Maybe you can just tell me; was the problem with my approach that if you close and open the matlabpool then it doesn't realize it needs to add the Java path again in the child processes?

If that's the issue, I think the best solution would be to have a runonall that does the same kind of check for whether Java path is already set on all the child processes. Then just run that every time and it will normally do nothing but if it's the first time it will add the java path on all, and if the pool has been restarted it will add java path on just the children but not the master.

@chinasaur
Copy link
Contributor

Actually, perhaps you should consider an additional change that I have made in my own branch. In my version, in addition to trying to avoid the globals issue, I have also eliminated the Client Java class. Instead, the Client class is just a Matlab handle and then the increment method looks like:

function increment(o, i) %#ok<INUSD>
    % i is a fake input so we stay compatible with
    % "ParforProgressConsole2.m"
    s = java.net.Socket();
    s.connect(o.ISA);
    s.close();
end

Of course, it's a bit constraining to make the Client Matlab instead of Java, but since its job is simple I consider this an improvement. Additionally, you don't have the Java path issue on the child processes anymore, since only the Server node actually needs any extra Java path. (This is probably why I didn't catch the matlabpool restart bug when I submitted my change to you; apologies for debugging in an inconsistent context!)

(I believe I went through some iterations trying some fancier things with the java.net.Socket like setting some additional flags, but in the end this simple thing seemed best.)

Let me know if you want a PR with these more aggressive changes.

@kotowicz
Copy link
Owner Author

thanks for your feedback. At this moment I would try to avoid making too many changes at the same time. For now I'd like to see whether the current implementation (v0.2.5) does work (in theory) for everybody. Once we have a stable, working solution, I suggest we try to look at different options for the future.

@chinasaur
Copy link
Contributor

That's fine; I'll try to find time to do some tests. Take a look at my fork if you are interested in the simplified approach; I think my fork is up to date.

@mtompkins
Copy link

I noticed I'm current with your release and my globals are getting dropped. I use them to init an SMTP send email address. I'll see if I can widdle it down to something you can replicate locally.

That said, unfortunately the problem still exists.

@chinasaur
Copy link
Contributor

@mtompkins, do your globals get stomped repeatedly, or just the first time parforprogress is loaded?

@mtompkins
Copy link

@chinasaur I'm not quite sure what you are asking so I'll answer as follows:

I init a single global variable which holds a simple email address.

I then use the bar during a parametric sweep (lasts a few days).
On a failure of the sweep it should send an email

When the script ends an email should be sent notifying a successful completion.
My latest sweep ended today (successfully) but there was an error at the email send routine as the global variable had been dropped.

If I test the sections that init the variable and send the email (bypassing the parametric sweep itself) everything works as expected.

@chinasaur
Copy link
Contributor

(Answer below is based on my fork, which is somewhat different, but I believe should apply equally here; kotowicz can correct if I'm mixed up.)

I believe the globals clearing happens when parforprogress adds itself to your javaclasspath. This used to happen every time parforprogress was called, but the new changes should make it only happen the first time it's called, and only if it's not on the classpath already.

You could try initiating something that uses parforprogress, interrupting it early, init your global and then rerun; it shouldn't stomp the global then. Try with something shorter than a few days first :).

If this fixes your problem, then you could arrange to have the appropriate classes added to javaclasspath on startup and then I believe you won't have to think about it anymore.

@mtompkins
Copy link

Gotcha, thank you. So init / paint the progress bar briefly then carry on basically until there is a larger rework.

@kotowicz
Copy link
Owner Author

kotowicz commented Feb 5, 2014

@mtompkins do you run your code like this? Please notice that you need to use the run_javaaddpath = 0 setting for it to work with globals.

 >> run_javaaddpath = 0;
 >> s = 'dummy task';
 >> n = 100;
 >> percentage = 0.1;
 >> do_debug = 0;
 >> ppm = ParforProgressStarter2(s, n, percentage, do_debug, run_javaaddpath)
 >> for j = 1 : n
 >>     your_computation();
 >>     ppm.increment(i); 
 >> end
 >> delete(ppm);

In case this doesn't work, could you please post the Matlab output showing which globals disappear after running ParforProgressStarter2? Thanks!

@mtompkins
Copy link

@kotowicz Thanks for the post. No I do not use the run_java* declaration. I will add that and advise results. Much appreciated. (Sample of how I call it below before changes)

btw - LOVE this little baby!

-- code --

try
    ppm = ParforProgressStarter2('Parametric Sweep: RSI with RAVI Transformer', row, 0.1);
catch me % make sure "ParforProgressStarter2" didn't get moved to a different directory
    if strcmp(me.message, 'Undefined function or method ''ParforProgressStarter2'' for input arguments of type ''char''.')
        error('ParforProgressStarter2 not in path.');
    else
        % this should NEVER EVER happen.
        msg{1} = 'Unknown error while initializing "ParforProgressStarter2":';
        msg{2} = me.message;
        print_error_red(msg);
        % backup solution so that we can still continue.
        ppm.increment = nan(1, nbr_files);
    end
end

try
    parfor ii = 1:row
        [~,~,shTest(ii)] = rsiRaviSIG_mex(vBarsTest,[x(ii,1) x(ii,2)],x(ii,3),x(ii,4),...
            x(ii,5),x(ii,6),x(ii,7),x(ii,8), x(ii,9), x(ii,10),...
            bigPoint,cost,scaling);
        [~,~,shVal(ii)] = rsiRaviSIG_mex(vBarsVal,[x(ii,1) x(ii,2)],x(ii,3),x(ii,4),...
            x(ii,5),x(ii,6),x(ii,7),x(ii,8),x(ii,9), x(ii,10),...
            bigPoint,cost,scaling); %#ok<PFBNS>
        ppm.increment(); %#ok<PFBNS> % update progressbar
    end; %parfor
catch ER
    % An error occurred during the parallel process ...
    subj = 'ALERT - Notice of MATLAB error';
    message = sprintf('Error while executing the parallel parametric sweep.\nThe error reported by MATLAB is:\n\n%s', ER.message);
    % Send the notice
    sendNotice(subj, message);
end

@mtompkins
Copy link

Ok - unfortunately same result after adding the additional parameter.

It is a bit useless to show you the error output as the global variable that is dropped isn't used in this function.

On a thought, I added the global variable to the script that runs PPBar2 so that it exists in the same workspace and test if it might be preserved as follows:

% Prevent loss of global variables when calling parforprogress
global sendTo; %#ok<NUSED>
run_javaaddpath = 0;

I put a breakpoint at the run_java* declaration to confirm the global had a value and it does...then let the script continue.

The failure then produces the error as the variable 'sendTo' is dropped.

@kotowicz
Copy link
Owner Author

kotowicz commented Feb 5, 2014

2 things:

  • what version of Matlab are you using?
  • please run global a; a = 10, then run your code (with run_javaaddpath = 0) and finally post the output of a after your code finished running.
    • if you get an error message unknown variable a, please run the code again, this time without calling your code inside the loop.

@mtompkins
Copy link

v2013a

'a' comes back undefined / dropped.

Testing with empty parfor same result. Testing with for loop same result.

@kotowicz
Copy link
Owner Author

kotowicz commented Feb 5, 2014

please set a break point in line 115, 117, 120 and 122. Then re-run the entire loop. Keep track of the break points where execution stops. Once you hit a break point, resume operation by typing dbcont. Finally, please post the line numbers here. thanks.

@mtompkins
Copy link

What is it you want from these breakpoints please?

@mtompkins
Copy link

breaks on 115
[type dbcont]

break on 117
[command windows displays]
117 pctRunOnAll(['addpath(''' dir_to_add ''')']);

[type dbcont]
execution completes with dropped variables

.... hope that helps.

@kotowicz
Copy link
Owner Author

kotowicz commented Feb 5, 2014

please download an edited version of ParforProgressStarter2.m from here:
https://gist.github.com/kotowicz/8825641

run your loop again, and post the "@kotowicz: " output that you get. thanks.

@kotowicz
Copy link
Owner Author

kotowicz commented Feb 5, 2014

if it breaks at line 115, then run_javaaddpath must be set to 1. please make sure it's set to 0 and try again.

@mtompkins
Copy link

OK that was extremely helpful (and I sadly admit I'm likely heavily to blame for my woes after your initial info). I may have missed a change in the input parameters from a version or just have been sloppy.

In any event, I was calling

ppm = ParforProgressStarter2('Parametric Sweep: maRsiPARMETS', row, 0.1);

When adding the run_java* as explained I never submitted the debug variable. If I add that and the run_java* everything completes properly. Apologies for the mea culpa where I'm to blame. Hopefully this bit of thread helps others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants