-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
for loops #63
Comments
In the following example we create a function called // Setup
clear all
set trace off
set more off
parallel setclusters 4
// Test data. You can specify the elements you want to loop in parallel here:
set obs 6
quietly {
gen ellist = "A" if _n == 1
replace ellist = "B" if _n == 2
replace ellist = "C" if _n == 3
replace ellist = "D" if _n == 4
replace ellist = "E" if _n == 5
replace ellist = "F" if _n == 6
}
// This program copies ellist into vname
program def myloop
args vname
// Creating the variable
gen `vname' = ""
// Looping through the data
forval i = 1/`=_N' {
qui replace `vname' = ellist[`i'] if _n == `i'
}
end
// Calling the program in serial fashion
myloop ellist2
// Calling the program using parallel, we need to pass the program in prog
parallel, prog(myloop): myloop ellist2_pll
// Do we get the same output?
list
// Same example but using mata --------------------------------------------------
mata
void myfunction(string scalar vname) {
// Creating the data
(void) st_addvar("str10", vname);
string matrix D, A;
D = st_sdata(., "ellist");
A = st_sdata(.,vname);
numeric scalar i;
for (i = 1; i <= rows(A); i++)
A[i] = D[i];
st_sstore(.,vname, A);
return;
}
end
// Serial and parallel fashion
m : myfunction("ellist_mata")
parallel, mata: m: myfunction("ellist_mata_pll")
// Do we get the same?
list
|
Thank you -- storing the element list and the output inside of variables is a neat trick that I hadn't thought of. I'll play around with this idea and see if I can make it work. |
I had this working a while ago, but I'm finding that something broke. Perhaps there is a bug in the latest version of your code? When I run the code above, I get the error
The log files have:
I appears the mata function (myfunction) is not being passed to child clusters. Any suggestions? I'm using the latest version of parallel from SSC. I can't figure out how to install directly from GitHub.
|
Perhaps you updated Stata? As you can see, the SSC version is pretty old. Instructions to install the dev version are here: https://github.com/gvegayon/parallel#development-version-latestmaster try following those and let us know. |
I've never been able to install Stata packages from GitHub. I always get some type of error. Here's what I got today:
. net install parallel, from(
https://raw.github.com/gvegayon/parallel/master/) replace
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable
to find valid certification path to requested target
https://raw.github.com/gvegayon/parallel/master/ either
1) is not a valid URL, or
2) could not be contacted, or
3) is not a Stata download site (has no stata.toc file).
r(5100);
Is it because the stata.toc file looks incomplete?
![image](https://user-images.githubusercontent.com/585071/53465237-46000080-3a0a-11e9-9f01-8a6703210030.png)
|
You should try downloading another version directly as a zip file as explained here: https://github.com/gvegayon/parallel/tree/sj-review#development-version-latestmaster |
I just reconfirmed that the |
Okay, thanks. I'll write Stata tech support and see if they have an idea.
I just put in a pull request with edits to the Stata.toc file that I
thought might help. But I can get the installation to work on my fork
either, so that might not be the issue.
- Keith
|
I do not understand how to implement a
for
loop using theparallel
command. I have something like this:... and I want to execute each instance of
cmd
in parallel rather than in sequence. That is, for each time through the loop, I want to fire up a new instance of Stata to runcmd
. If the length ofellist
is greater than the number of clusters, I'd wantparallel
manage the workload so that (1) the number of loops running at one time is equal to the number of clusters and (2) the next loop starts when a cluster becomes available.Can the
parallel
command do this kind of thing? How? Does it make a difference if I'm working in Stata versus or Mata? (I'm working in Mata.)Thanks,
Keith
P.S. I see a
parallel_for
Mata program is in development, but I don't know how to use it.P.P.S. I considered using
pll_id
inside the definition ofcmd
, but the problem is that the number of processes run will equal the number of clusters, not the number of elements inellist
.P.P.P.S. If you can do this with
for
loops, can you also do it withwhile
anddo
loops?The text was updated successfully, but these errors were encountered: