forked from pegasus-isi/pegasus
-
Notifications
You must be signed in to change notification settings - Fork 0
/
RELEASE_NOTES
6465 lines (4556 loc) · 241 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
===============================
Release Notes for PEGASUS 4.6.1
===============================
We are happy to announce the release of Pegasus 4.6.1. Pegasus 4.6.1
is a minor release of Pegasus and includes improvements and bug fixes
to the 4.6.0 release
New features and Improvements in 4.6.1 are
- support for MOAB submissions via glite. A new tool called
pegasus-configure-glite helps users setup their HTCondor GLite
directory for use with Pegasus
- pegasus-s3 now allows for downloading and uploading folders to
and from S3
- initial support for globus online in pegasus-transfer
planner automatically copies the user catalog files into a
directory called catalogs in the submit directory.
- changes to how worker package staging occurs for compute jobs.
New Features
--------------
1) [PM-1045] – There is a new command line tool
pegasus-configure-glite that automatically installs the Pegasus
shipped glite local attributes script to the condor glite
installation directory
2) [PM-1044] – Added glite scripts for moab submissions via the Glite
interface
3) [PM-1054] – kickstart has an option to ignore files in lib
interpose.
This is triggered by setting the environment variables
KICKSTART_TRACE_MATCH and KICKSTART_TRACE_IGNORE. The MATCH version
only traces files that match the patterns, and the IGNORE version
does NOT trace files that match the patterns. Only one of the two
can be specified.
4) [PM-1058] -pegasus can be now installed via homebrew on MACOSX
For details refer to documentation at
https://pegasus.isi.edu/documentation/macosx.php
5) [PM-1075] – pegasus-s3 to be able to download all files in a folder
pegasus-s3 has a –recursive option to allow users to download all
files from a folder in S3 or upload all files from a local
directory to S3 bucket.
6) [PM-680] – Add support for GlobusOnline to pegasus-transfer
Details on how to configure can be found at
https://pegasus.isi.edu/docs/4.6.1/transfer.php#transfer_globus_online
7) [PM-1043] – Improve CSV file read for Storage Constraints algorithm
8) [PM-1047] – Pegasus saves all the catalog files in submit dir in a
directory named catalogs. This enables for easier debugging later on
as everything is saved in the submit directory.
Improvements
--------------
1) [PM-1043] – Improve CSV file read for Storage Constraints algorithm
2) [PM-1057] – PegasusLite worker package download improvements
Pegasus exposes two additional properties to control behavior of
worker package staging for jobs. Users can use these to control
whether a PegasusLite job downloads a worker package from the
pegasus website or not , in case the shipped worker package does
not match the node architecture.
pegasus.transfer.worker.package.strict – enforce strict checks
against provided worker package. if a job comes with worker package
and it does not match fully with worker node architecture , it
falls down to pegasus download website. Default value is true.
pegasus.transfer.worker.package.autodownload – a boolean property
to indicate whether a pegasus lite job is allowed to download from
pegasus website. Defaults to true.
3) [PM-1059] – Implement backup for MySQL databases
4) [PM-1060] – expose a way to turn off kickstart stat options
5) [PM-1063] – improve performance for inserts into database replica catalog
6) [PM-1067] – pegasus-cluster -R should report the finish time and
duration, not the start time and duration
7) [PM-1078] – pegasus-statistics should take comma separated list of
values for -s option
8) [PM-1073] – condor_q changes in 8.5.x will affect pegasus-status
pegasus-status was updated to account for changes in the condor_q
output in the 8.5 series
Bugs Fixed
--------------
1) [PM-1077] – pegasus-remove on hierarchal workflows results in jobs
from the sub workflows still in the condor queue
DAGMan no longer condor_rm jobs in a workflow itself. Instead it
relies on condor schedd to do it. Pegasus generated sub workflow
description files did not trigger this . As a result,
pegasus-remove on a top level workflow still resulted in jobs from
the sub workflows to be in the condor queue. This is now
fixed. Pegasus generated dagman submit files have the right
expressions specified.
2) [PM-997] – pyOpenSSL v0.13 does not work with new version of
openssl (1.0.2d) and El Captain
3) [PM-1048] – PegasusLite should do a full version check for
pre-installed worker packages
PegasusLite does a full check ( including the patch version) with
the pegasus version installed on the node, when determining whether
to use the preinstalled version on the node or not.
4) [PM-1050] – pegasus-plan should not fail if -D arguments don’t
appear first
5) [PM-1051] – Error missing when nodes, cores, and ppn are all
specified
In 4.6.0 release, there was a bug where the error message thrown (
when user specified an invalid combination of task requirements)
was incorrect. This is fixed, and error messages have been improved
to also indicate a reason
6) [PM-1053] – pegasus-cluster does not know about new Kickstart
arguments
7) [PM-1055] – Interleaved libinterpose records
8) [PM-1061] – pegasus-analyzer should detect and report on failed job
submissions
pegasus-monitord did not populate the stampede workflow database
with information about job submission failures. As a result,
pegasus-analyzer for the cases where a job failed because of job
submission errors did not report any helpful information as to why
the job failed. This is now fixed.
9) [PM-1062] – pegasus dashboard shows some workflows twice
In the case where HTCondor crashes on a submit node, DAGMan logs
may miss a workflow end event. When monitord detects consecutive
start events, it creates and inserts a workflow end event. The end
event had the same timestamp as the new start event, because of
which underlying dashboard query retrieved multiple rows. This was
fixed by setting the timestamp for the artificial end event to be
one second less than the second start event.
10) [PM-1064] – pegasus-transfer prepends to PATH
pegasus-transfer used to prepend the system path with other
internal determined lookup directories based on environment
variables such as GLOBUS_LOCATION. As a result, in some cases,
user preferred copy of executables were not picked up. This is now
fixed.
11) [PM-1066] – wget errors because of network issues
pegasus-transfer now sets the OSG_SQUID_LOCATION/http_proxy
setting only for the first wget attempt
12) [PM-1068] – monitord fails when trying to open a job error file in
a workflow with condor recovery
monitord parses the job submit file whenever it notices job
submission log by DAGMan. This is done to avoid the case, where
because of HTCondor recovery a job may not have a ULOG_ job
submission event, because of which the internal state of the job
maybe uninitialized.
13) [PM-1069] – Dashboard invocation page gives an error if the task
has no invocation record
Dashboard did not display invocation records for Pegasus added
auxiliary jobs in the workflow. This was due to a bug in the query
that is now fixed.
14) [PM-1070] – monitord should handle case where jobs have missing
JOB_FAILURE/JOB_TERMINATED events
15) [PM-1072] – Worker package staging issues on OSX
16) [PM-1081] – pegasus-plan complains if output dir is set but site
catalog entry for local site does not storage directory specified
pegasus-plan complained if a storage directory was not specified
in the site catalog entry for site “local”, even if a user
specified a –output-dir option. This is now fixed. The planner
will create a default file server based entry for this case.
17) [PM-1082] – transfer jobs don’t have symlink destination URL even
though symlink is enabled
In the case, where there are multiple candidate replica locations
( some on preferred site and some on other sites), the destination
URL for the transfer jobs did not have a symlink URL. As a result
the data was never symlinked even though it was available locally
on the preferred site.
18) [PM-1083] – dashboard user home page requires a trailing /
To access a user home page on the dashboard, a trailing / needs to
be specified after the username in the URL. Dashboard was updated
to handle URL’s without trailing names.
19) [PM-1084] – credential handling for glite jobs
As part of credential handling the environment variable for the
staged credential was as environment key instead of the
+remote_environment classed key. As a result transfer jobs
running via Glite submission failed as the appropriate environment
variable was not set. This is fixed now.
20) [PM-1085] – -p 0 options for condor_dagman sub dax jobs result in
dagman ( 8.2.8) dying
Pegasus updated to generate the dagman submit files for sub
workflows to be compatible with 8.5.x series. However, the new
arguments added resulted in breaking workflows running with old
HTCondor versions. The offending argument is now set only if
condor version is more than 8.3.6
21) [PM-1086] – Never symlink executables
Pegasus adds chmod jobs to explicitly set the x bit of the
executables staged. If the executable is a symlinked executable,
then chmod fails. Symlinking is never triggered for staged
executables now.
===============================
Release Notes for PEGASUS 4.6.0
===============================
We are happy to announce the release of Pegasus 4.6.0. Pegasus 4.6.0
is a major release of Pegasus and includes all the bug fixes and
improvements in the 4.5.4 release
New features and Improvements in 4.6.0 are
- metadata support
- support for variable substitution
- constraints based cleanup algorithm
- common pegasus profiles to specify task requirements
- new command line client pegasus-init to configure pegasus and
pegasus-metadata to query workflow database for metadata
- support for fallback PFN's
Migration guide available at
http://pegasus.isi.edu/wms/docs/4.6.0dev/useful_tips.php#migrating_from_leq45
Debian and Ubuntu users: Please note that the Apt repository GPG key
has changed. To continue to get automatic updates, please follow the
instructions on the download page on how to install the new key.
New Features
--------------
1) Metadata support in Pegasus
Pegasus allows users to associate metadata at
- Workflow Level in the DAX
- Task level in the DAX and the Transformation Catalog
- File level in the DAX and Replica Catalog
Metadata is specified as a key value tuple, where both key and
values are of type String.
All the metadata ( user specified and auto-generated) gets
populated into the workflow database ( usually in the workflow
submit directory) by pegasus-monitord. The metadata in this
database can be be queried for using the pegasus-metadata command
line tool, or is also shown in the Pegasus Dashboard.
Documentation: https://pegasus.isi.edu/wms/docs/4.6.0/metadata.php
Relevant JIRA items
[PM-917] - modify the workflow database to associate metadata with
workflow, job and files
[PM-918] - modify pegasus-monitord to populate metadata into
stampede database
[PM-919] - pegasus-metadata command line tool
[PM-916] - identify and generate the BP events for metadata
[PM-913] - kickstart support for stat command line options
[PM-1025] - Document the metadata capability for 4.6
[PM-992] - automatically capture file metadata from kickstart and record it
[PM-892] - Add metadata to DAX schema
[PM-893] - Add metadata to Python DAX API
[PM-894] - Add metadata to site catalog schema
[PM-895] - Add metadata to transformation catalog text format
[PM-902] - support for metadata to JAVA DAX API
[PM-903] - add metadata to perl dax api
[PM-904] - support for parsing DAX 3.6 documents
[PM-978] - Update JDBCRC with the new schema
[PM-925] - support for 4.1 new site catalog schema with metadata extensions
[PM-991] - pegasus dashboard to display metadata stored in workflow database
2) Support for Variable Substitution
Pegasus Planner supports notion of variable expansions in the DAX
and the catalog files along the same lines as bash variable
expansion works. This is often useful, when you want paths in your
catalogs or profile values in the DAX to be picked up from the
environment. An error is thrown if a variable cannot be expanded.
Variable substitution is supported in the DAX, File Based Replica
Catalog, Transformation Catalog and the Site Catalog.
Documentation: https://pegasus.isi.edu/wms/docs/4.6.0/variable_expansion.php
Relevant JIRA items
[PM-831] - Add better support for variables
3) Constraints based Cleanup Algorithm
The planner now has support for a new cleanup algorithm called
constraint. The algoirthm adds cleanup nodes to constraint the
amount of storage space used by a workflow. The nodes remove files
no longer required during execution. The added cleanup node
guarantees limits on disk usage. The leaf cleanup nodes are also
added when this is selected.
[PM-850] - Integrate Sudarshan's cleanup algorithm
4) Common Pegasus Profiles to indicate Resource Requirements for jobs
Users can now specify Pegasus profiles to indicate resource
requirements for jobs. Pegasus will automatically, translate these
to the approprate condor, globus or batch system keys based on how
the job is executed.
The profiles are documented in the configuration chapter at ask
requirement profiles are documented here
https://pegasus.isi.edu/wms/docs/4.6.0/profiles.php#pegasus_profiles
[PM-962] - common pegasus profiles to indicate resource requirements for job
5) New client pegasus-init
A new command line client called "pegasus-init" that generates a
new workflow configuration based by asking the user a series of
questions. Based on the responses to these questions,
*pegasus-init* generates a workflow configuration including a DAX
generator, site catalog, properties file, and other artifacts that
can be edited to meet the user's needs.
[PM-1019] - pegasus-init client to setup pegasus on a machine
6) Support for automatic fallover to fallback file locations
Pegasus, now during Replica Selection orders all the candidate
replica's instead of selecting the best replica. This replica's are
ordered based on the strategy selected, and the ordered list is
passed to pegasus-transfer invocation. This allows users to specify
failover, or preferred location for discovering the input files.
By default, planner employs the following logic for the ordering of replicas
- valid file URL's . That is URL's that have the site attribute
matching the site where the executable pegasus-transfer is
executed.
- all URL's from preferred site (usually the compute site)
- all other remotely accessible ( non file) URL's
If a user wants to specify their own order preference , then they
should use the Regex Replica Selector and specify a ranked order
list of regular expressions in the properties.
Documentation:
https://pegasus.isi.edu/wms/docs/4.6.0/data_management.php#replica_selection
Relevant JIRA items:
[PM-1002] - Support symlinking against compute site datasets in
nonsharedfs mode with bypass of input file staging
[PM-1014] - Support for Fallback PFN while transferring raw input files
7) Support SGE via the HTCondor Glite/Batch GAHP support
Pegasus now has support for submitting to a local SGE cluster via
the HTCondor Glite/Blahp interfaces. More details can be found in
the documentation at
https://pegasus.isi.edu/wms/docs/4.6.0/glite.php
[PM-955] - Support for direct submission through SGE using
Condor/Glite/Blahp layer
8) Glite Style improvements
Users don't need to set extra pegasus profiles to enable jobs to
run correctly on glite style sites. By default, condor quoting for
jobs on glite style sites is disabled. Also, the -w option to
kickstart is always as batch gahp does not support specification of
a remote execution directory directly.
If the user knows that a compute site shares a file system with the
submit host, then they can get Pegasus to run the auxillary jobs in
local universe. This is especially helpful , when submitting to
local campus clusters using Glite and users don't want the pegasus
auxillary jobs to run through the cluster PBS|SGE queue.
Relevant JIRA items
[PM-934] - changed how environment is set for jobs submitted via
HTCondor Glite / Blahp layer
[PM-1024] - Use local universe for auxiliary jobs in glite/blahp mode
[PM-1037] - Disable Condor Quoting for jobs run on glite style
execution sites
[PM-960] - Set default working dir to scratch dir for glite style jobs
9) Support for PAPI CPU counters in kickstart
[PM-967] - Add support for PAPI CPU counters in Kickstart
10) Changes to worker package staging
Pegasus now by default, attempts to use the worker package out of
the Pegasus submit host installation unless a user has specified
finer grained attributes for the compute sites in the site catalog
or an entry is specified in the transformation catalog.
Relevant JIRA items
[PM-888] - Guess which worker package to use based on the submit host
11) [PM-953] - PMC now has the ability to set CPU affinity for multicore tasks.
12) [PM-954] - Add useful environment variables to PMC
13) [PM-985] - separate input and output replica catalogs
Users can specify a different output replica catalog optionally by
specifying the property with prefix pegasus.catalog.replica.output
This is useful when users want to separate the replica catalog
that they use for discovery of input files and the catalog where
the output files generated are registered. For example use a
Directory backed replica catalog backend to discover file
locations, and a file based replica catalog to catalog the
locations of the output files.
14) [PM-986] - input-dir option to pegasus-plan should be a comma
separated list
15) [PM-1031] - pegasus-db-admin should have an upgrade/dowgrade
option to update all databases from the dashboard database to current
pegasus version
16) [PM-882] - Create prototype integration between Pegasus and Aspen
17) [PM-964] - Add tips on how to use CPU affinity on condor
Improvements
--------------
1) [PM-924] - Merge transfer/cleanup/create-dir into one client
2) [PM-610] - Batch scp transfers in pegasus-transfer
pegasus-transfer now batches 70 transfers in a single scp
invocation against the same host.
3) [PM-611] - Batch rm commands in scp cleanup implementation
scp rm are now batched together at a level of 70 per group so that
we can keep the command lines short enough.
4) [PM-856] - pegasus-cleanup should use pegasus-s3's bulk delete
feature
s3 removes are now batched and passed in a temp file to pegasus-s3
5) [PM-890] - pegasus-version should include a Git hash
6) [PM-899] - Handling of database update versions from different branches
7) [PM-911] - Use ssh to call rm for sshftp URL cleanup
8) [PM-929] - Use make to build externals to make python development easier
9) [PM-937] - Discontinue support for Python 2.4 and 2.5
10) [PM-938] - Pegasus DAXParser always validates against latest supported DAX version
11) [PM-958] - Deprecate "gridstart" names in Kickstart
12) [PM-963] - Add support for wrappers in Kickstart
Kickstart supports an environment variable, KICKSTART_WRAPPER
that contains a set of command-line arguments to insert between
Kickstart and the application
13) [PM-965] - monitord amqp population
14) [PM-979] - Update documentation for new DB schema
15) [PM-984] - condor_rm on a pegasus-kickstart wrapped job does not
return stdout back
When a user condor_rm's their job, Condor sends the job a
SIGTERM. Previously this would cause Kickstart to die. This
commit changes Kickstart so that it catches the SIGTERM and
passes it on to the child instead. That way the child dies, but
not Kickstart, and Kickstart can report an invocation record
forthe job to provide the user with useful debugging info. This
same logic is also applied to SIGINT and SIGQUIT.
16) [PM-1018] - defaults for pegasus-plan to pick up properties and
other catalogs
pegasus will default the --conf option to pegasus-plan to
pegasus.properties in the current working directory.
In addition, the default locations for the various catalog files
now point to current working directory ( rc.txt, tc.txt,
sites.xml )
17) [PM-1038] - Update tutorial to reflect the defaults for Pegasus 4.6 release
Bugs Fixed
--------------
1) [PM-653] - pegasus.dagman.nofity should be removed in favor of
Pegasus level notifcaitons
2) [PM-897] - kickstart is reporting misleading permission error when
it is really a file not found
3) [PM-906] - Add Ubuntu apt repository
4) [PM-910] - Cleanup jobs should ignore "file not found" errors, but
not other errors
5) [PM-920] - Bamboo / title.xml problems
6) [PM-922] - Dashboard and monitoring interface contain Python that
is not valid for RHEL5
7) [PM-923] - Debian packages rebuild documentation
8) [PM-931] - For Subworkflows Monitord populates host.wf_id to be
wf_id of root_wf and not wf_id of sub workflow
9) [PM-944] - Make it possible to build Pegasus on SuSE (openSUSE and SLES)
10) [PM-1029] - Planner should ensure that local aux jobs run with the same Pegasus install as the planner
11) [PM-1035] - pegasus-analyzer fails when workflow db has no
entries
===============================
Release Notes for PEGASUS 4.5.4
===============================
We are happy to announce the release of Pegasus 4.5.4. Pegasus 4.5.4
is a minor release, which contains minor enhancements and fixes
bugs. This will most likely be the last release in the 4.5 series, and
unless you have specific reasons to stay with the 4.5.x series, we
recommend to upgrade to 4.6.0.
New Features
--------------
1) [PM-1003] - planner should report information about what options were
used in the planner
Planner now reports additional metrics such as command line
options, whether PMC was used and number of deleted tasks to the
metrics server.
2) [PM-1007] - "undelete" or attach/detach for pegasus-submitdir
pegasus-submit dir has two new commands : attach, which adds the
workflow to the dashboard (or corrects the path), and detach, which
removes the workflow from the dashboard.
3) [PM-1030] - pegasus-monitord should parse the new dagman output
that reports timestamps from condor user log
Starting 8.5.2 , HTCondor DAGMan record sthe condor job log
timestamps in the ULOG event messages in the end of the log
message. monitord was updated to prefer these timestamps for the
job events if present in the DAGMan logs.
Improvements
--------------
1) [PM-896] - Document events that monitord publishes
The netlogger messages generated by monitord that are used for
populated the workflow database and master database, are now
documented at
https://pegasus.isi.edu/wms/docs/4.5.4cvs/stampede_wf_events.php
2) [PM-995] - changes to Pegasus tutorial
Pegasus tutorial was reorganized and simplified to focus more on
the pegasus-dashboard, and debugging exercises
3) [PM-1033] - update monitord to handle updated log messages in dagman.out file
Starting 8.5.x series, some of the dagman log messages in
dagman.out file were updated to have HTCondor instead of
Condor. This broke the monitord parsing regex's and hence it was
not able to parse information from the dagman.out file. This is now
fixed.
4) [PM-1034] - Make it more difficult for users to break pegasus-submitdir archive
Adding locking mechanism internally, to make pegasus-submitdir more
robust , when a user accidently kills an archive operation .
5) [PM-1040] - pegasus-analyzer should be able to handle cases where the workflow failed to start
pegasus-analyzer now detects if a workflow failed to start because
of DAGMan fail on NFS error setting, and also displays any errors
in *.dag.lib.err files.
Bugs Fixed
--------------
1) [PM-921] - Specified env is not provided to monitord
The environment for pegasus-monitord is now set in the dagman.sub
file. The following order is used: pick system environment,
override it with env profiles in properties and then from the
local site entry in the site catalog.
2) [PM-999] - pegasus-transfer taking too long to finish in case of retries
pegasus-transfer has moved to a exponential back-off: min(5 **
(attempt_current + 1) + random.randint(1, 20), 300)
That means that failures for short running transfers will still
take time, but is necessary to ensure scalability of real world
workflows .
3) [PM-1008] - Dashboard file browser file list breaks with sub-directories
Dashboard filebrowser broke when there were sub directories in the
submit directory. this is now fixed.
4) [PM-1009] - File browser just says "Error" if submit_dir in workflow db is incorrect
File browser gives a more informative message when submit directory
recorded in the database does not actually exist.
5) [PM-1011] - OSX installer no longer works on El Capitan
El Capitan has a new "feature" that disables root from modifying
files in /usr with some exceptions (e.g. /usr/local). Since the
earlier installer installed Pegasus in /usr, it no longer
worked. Installer was updated to install Pegasus in /usr/local
instead.
6) [PM-1012] - pegasus-gridftp fails with "no key" error
The SSL proxies jar was updated . The error was triggered because
of following JGlobus issue:
https://github.com/jglobus/JGlobus/issues/146
7) [PM-1017] - pegasus-s3 fails with [SSL: CERTIFICATE_VERIFY_FAILED]
s3.amazonaws.com has a cert that was issued by a CA that is not in
the cacerts.txt file bundled with boto 2.5.2. Boto bundled with
Pegasus was updated to 2.38.0
8) [PM-1021] - kickstart stat for jobs in the workflow does not work for clustered jobs
kickstart stat did not work for clustered jobs. This is now fixed.
9) [PM-1022] - dynamic hierarchy tests failed randomly
The DAX jobs were not considered for cleanup. Because of this, if
there was a compute job that generated the DAX the subdax job
required, sometimes the cleanup of the dax file happened before the
subdax job finished. This is now fixed.
10) [PM-1039] - pegasus-analyzer fails with: TypeError: unsupported operand type(s) for -: 'int' and 'NoneType'
pegasus-analyzer threw a stacktrace when a workflow did not start
because of DAGMan NFS settings. This is now fixed.
11) [PM-1041] - pegasus-db-admin 4.5.4 gives a stack trace when run on pegasus 4.6 workflow submit dir
A clean error is displayed, if pegasus-db-admin from 4.5.4 is run
against a workflow submit directory from a higher Pegasus
version.
===============================
Release Notes for PEGASUS 4.5.3
===============================
We are happy to annouce the release of Pegasus 4.5.3. Pegasus 4.5.3 is
a minor release, which contains minor enhancements and fixes bugs in
the Pegasus 4.5.2 release.
The following issues were addressed and more information can be found
in the Pegasus Jira (https://jira.isi.edu/)
Bug Fixes:
[PM-980] - pegasus-plots fails with "-p all"
[PM-982] - MRC replica catalog backend does not work
[PM-987] - noop jobs created by Pegasus don't use DAGMan NOOP keyword
[PM-996] - Pegasus Statistics transformation stats columns getting
larger ad larger with more sub workflows
[PM-997] - pyOpenSSL v0.13 does not work with new version of openssl
(1.0.2d) and El Captain
Improvements:
[PM-976] - ignore register and transfer flags for input files
[PM-981] - register only based names for output files with deep LFN's
[PM-983] - data reuse algorithm should consider file locations while
cascading deletion upwards
[PM-984] - condor_rm on a pegasus-kickstart wrapped job does not
return stdout back
[PM-988] - pegasus-transfer should handle file://localhost/ URL's
[PM-989] - pegasus-analyzer debug job option should have a hard check
for output files
[PM-993] - Show dax/dag planning jobs in
failed/succesfull/running/failing tabs in dashboard
[PM-1000] - turn off concurrency limits by default
New Features:
[PM-985] - separate input and output replica catalog
[PM-986] - input-dir option to pegasus-plan should be a comma
separated list
===============================
Release Notes for PEGASUS 4.5.2
===============================
We are happy to annouce the release of Pegasus 4.5.2. Pegasus 4.5.2 is
a minor release, which contains minor enhancements and fixes bugs in
the Pegasus 4.5.1 release. The release addresses a critical fix for
systems running HTCondor 8.2.9 , whereby all dagman jobs for Pegasus
workflows fail on startup.
Enhancements
--------------
1) File locations in the DAX treated as a Replica Catalog
By default, file locations listed in the DAX override entries
listed in the Replica Catalog. Users can now set the boolean
property pegasus.catalog.replica.dax.asrc to treat the dax
locations along with the entries listed in the Replica Catalog for
Replica Selection.
Associated JIRA item https://jira.isi.edu/browse/PM-973
2) Pegasus auxillary tools now have support for iRods 4.x
Bugs Fixed
--------------
1) pegasus-dagman setpgid fails under HTCondor 8.2.9
Starting with version 8.2.9, HTCondor sets up the process group
already to match the pid, and hence the setpgid fails in the
pegasus-dagman wrapper around condor-dagman. Because of this all
Pegasus workflows fail to start on submit nodes with HTCondor
8.2.9 .
If you cannot upgrade to Pegasus version 4.5.2 and are running
HTCondor 8.2.9, you can set you can turn off HTCondor's setsid'ing
by setting the following in your condor configuration
USE_PROCESS_GROUPS = false
The pegasus-dagman wrapper now does not fatally fail, if setpgid
fails. More details at
https://jira.isi.edu/browse/PM-972
2) nonshareddfs execution does not work for Glite if auxiliary jobs
are planned to run remotely
For nonsharedfs execution to a local PBS|SGE cluster using the
GLite interface, Pegasus generated auxillary jobs had incorrect
paths to pegasus-kickstart in the submit files, if a job was
mapped to run on the remote ( non local ) site.
This is now fixed.
https://jira.isi.edu/browse/PM-971
===============================
Release Notes for PEGASUS 4.5.1
===============================
We are happy to annouce the release of Pegasus 4.5.1. Pegasus 4.5.1 is
a minor release, which contains minor enhancements and fixes bugs to
Pegasus 4.5.0 release.
Enhancements
--------------
1) pegasus-statistics reports workflow badput
pegasus-statistics now reports the workflow badput time, which is
the sum of all failed kickstart jobs. More details at
https://pegasus.isi.edu/wms/docs/4.5.1/plotting_statistics.php
Associated JIRA item https://jira.isi.edu/browse/PM-941
2) fast start option for pegasus-monitord
By default, when monitord starts tracking a live dagman.out file,
it sleeps intermittently, waiting for new lines to be logged in the
dagman.out file.
This behavior, however causes monitord to lag considerably
- when starting for large workflows
- when monitord gets restarted due to some failure by
pegasus-dagman, or we submit a rescue dag.
Users can now set the property pegasus.monitord.fast_start property
to enable it. For a future release, it will be the default
behavior.
Associated JIRA item https://jira.isi.edu/browse/PM-947
3) Support for throttling jobs across workflows using HTCondor
concurrency limits
Users can now throttle jobs across worklfows using HTCondor
concurrency limits. However, this only applies to vanilla universe
jobs.
Documentation at
https://pegasus.isi.edu/wms/docs/4.5.1sjob_throttling.php#job_throttling_across_workflows
Associated JIRA item https://jira.isi.edu/browse/PM-933
4) Support for submissions to local SGE cluster using the GLite interfaces
Prelimnary support for SGE clusters has been added in Pegasus. To
use this you need to copy the sge_local_submit_attributes.sh from
the Pegasus share directory and place it in your condor
installation.
The list of supported keys can be found here
https://pegasus.isi.edu/wms/docs/4.5.1/glite.php
Associated JIRA item https://jira.isi.edu/browse/PM-955
5) PEGASUS_SCRATCH_DIR set in the job environment for sharedfs deployment
Pegasus not sets an environment variable for the job that indicates
the PEGASUS scratch directory the job is executed in , in the case
of sharedfs deployments. This is the directory that is created by
the create dir job on the execution site for the workflow.
Associated JIRA item https://jira.isi.edu/browse/PM-961
6) New properties to control read timeout while setting up connections
to the database
User can now set pegasus.catalog.*.timeout to set the timeout value
in seconds. This should be set only if you encounter database
locked errors for your installation.
Associated JIRA item https://jira.isi.edu/browse/PM-943
7) Ability to prepend to system path before launcing an application
executable
Users can now associate an env profile named KICKSTART_PREPEND_PATH
with their jobs, to specify the PATH where application specific
modules are installed. kickstart will take this value and prepend
it to system path before launching the executable
Associated JIRA item https://jira.isi.edu/browse/PM-957
8) environment variables in condor submit files are specified using the newer condor syntax
For GLITE jobs the environment is specified using the key
+remote_environment. For all other jobs, the environment is
specified using the environment key but the value is in newer
format ( i.e key=value separated by whitespace)
Associated JIRA item https://jira.isi.edu/browse/PM-934
9) pass options to pegasus-monitord via properties
Users can now specify pegasus.monitord.arguments to pass extra
options with which pegasus-monitord is launched for the workflow at
runtime.
Associated JIRA item https://jira.isi.edu/browse/PM-948
10) pegasus-transfer support OSG stashcp
pegasus-transfer has support for the latest version of stashcp
Associated JIRA item https://jira.isi.edu/browse/PM-948
11) pegasus-dashboard improvements
pegasus-dashboard now loads the directory listing via a AJAX
calls. Makes the loading of the workflow details page much faster
for large workflows.
Show working dir. for a job_instance, and invocation in job
details and invocation details page.
Displays an appropriate error message if pegasus-db-admin update
of a database fails.
Added a HTML error page for DB Migration error.
Configure logging so Flask log messages show up in Apache logs
Associated JIRA item https://jira.isi.edu/browse/PM-940
12) PEGASUS_SITE environment variable is set in job's environment
https://jira.isi.edu/browse/PM-907
Bugs Fixed
--------------
1) InPlace cleanup failed if an intermediate file when used as input had transfer flag set to false
If an intermediate file ( an output file generated by a parent job)
was used as an input file to a child job with the transfer flag set
to false, then the associated cleanup job did not have a dependency
to the child job. As a result, the cleanup job could run before
the child job (that required it as input) could be run.
This is now fixed.
https://jira.isi.edu/browse/PM-969
2) Incorrect ( malformed) rescue dag submitted in case planner dies
because of memory related issues
For hieararchal workflows, if a sub worklfow fails then a rescue
dag for the sub workflow gets submitted on the job retry. The .dag
file for the sub workflow is generated by the planner. If the
planner fails during code generation an incoplete .dag file can be
submitted.
This is now fixed. The planner now writes the dag to a tmp file
before renaming it to the .dag extension when code completion is
done.
https://jira.isi.edu/browse/PM-966
3) Mismatched memory units in kickstart records
kickstart now reports all memory values in KB. Earlier the procs
element in the machine entry was reporting the value in bytes,
while the maxrss etc values in the usage elments were in KB.
This is now fixed.
https://jira.isi.edu/browse/PM-959
4) pegasus-analyzer did not work for sub workflows
There was a bug in the 4.5.0 release where pegasus-analyzer did not
pick up the stampede database for the sub workflows correctly. This
is now fixed.
https://jira.isi.edu/browse/PM-956
5) Rescue DAGS not submitted correctly for dag jobs
There was a bug in the 4.5.0 release as a result of the
.dag.condor.sub file was generated. As a result of that, the force
option was propogated for the dag jobs in the DAX ( dag jobs are
sub workflows that are not planned by Pegasus).
https://jira.isi.edu/browse/PM-949
6) nonsharedfs configuration did not work with Glite style submissions
In case of nonsharedfs, transfer_executable is set to true to
transfer the PegasusLite script. However, in the Glite case, that
was explicity disabled, which was preventing the workflows from
running successfully.
https://jira.isi.edu/browse/PM-950
7) pegasus-analyzer catches error for wrong directory instead of
listing the traceback
https://jira.isi.edu/browse/PM-946
8) pegasus-gridftp fails with: Invalid keyword "POSTALCODE"
pegasus-gridftp was failing against the XSEDE site stampede,
because of change in certificates at TACC. This was fixed by
udpating to the latest jglobus jars.