forked from HenrikBengtsson/matrixStats
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNEWS
1368 lines (799 loc) · 41.5 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Package: matrixStats
====================
Version: 0.54.0-9000 [2019-03-28]
* ...
DEPRECATED AND DEFUNCT:
* Calling indexByRow(x) where 'x' is a matrix is now defunct. Use
indexByRow(dim(x)) instead.
Version: 0.54.0 [2018-07-23]
PERFORMANCE:
* SPEEDUP: No longer using stopifnot() for internal validation, because
it comes with a great overhead. This was only used in weightedMad(),
col-, and rowWeightedMads(), as well as col- and rowAvgsPerColSet().
BUG FIXES:
* Despite being an unlikely use case, colLogSumExps(lx) / rowLogSumExps(lx)
now also accepts integer 'lx' values.
* The error produced when using indexByRow(dim) with prod(dim) >= 2^31 would
report garbage dimensions instead of 'dim'.
DEPRECATED AND DEFUNCT:
* Calling indexByRow(x) where 'x' is a matrix is deprecated. Use
indexByRow(dim(x)) instead.
Version: 0.53.1 [2018-02-10]
CODE REFACTORING:
* Now col-/rowSds() explicitly replicate all arguments that are passed to
col-/rowVars().
DOCUMENTATION:
* Added details on how weightedMedian(x, interpolate = TRUE) works.
BUG FIXES:
* colLogSumExps(lx, cols) / rowLogSumExps(lx, rows) gave an error if 'lx'
has rownames / colnames.
* col-/rowQuantiles() would lose rownames of output in certain cases.
Version: 0.53.0 [2018-01-23]
NEW FEATURES:
* Functions sum2(x) and means2(x) now accept also logical input 'x', which
corresponds to using as.integer(x) but without the need for neither coercion
nor internal extra copies. With sum2(x, mode = "double") it is possible to
count number of TRUE elements beyond 2^31-1, which base::sum() does not
support.
* Functions col-/rowSums2() and col-/rowMeans2() now accept also logical
input 'x'.
* Function binMeans(y, x, bx) now accepts logical 'y', which corresponds
to to using as.integer(y) but without the need for coercion to integer.
* Functions col-/rowTabulates(x) now support logical input 'x'.
* Now count() can count beyond 2^31-1.
* allocVector() can now allocate long vectors (longer than 2^31-1).
* Now sum2(x, mode = "integer") generates a warning if typeof(x) == "double"
asking if as.integer(sum2(x)) was intended.
* Inspired by Hmisc::wtd.var(), when sum(w) <= 1, now weightedVar(x, w)
produces an informative warning that the estimate is invalid.
CODE REFACTORING:
* Harmonized the ordering of the arguments of colAvgsPerColSet() with that
of rowAvgsPerColSet().
BUG FIXES:
* col-/rowLogSumExp() could core dump R for "large" number of columns/rows.
Thanks Brandon Stewart at Princeton University for reporting on this.
* count() beyond 2^31-1 would return invalid results.
* Functions col-/rowTabulates(x) did not count missing values.
* indexByRow(dim, idxs) would give nonsense results if 'idxs' had indices
greater than prod(dim) or non-positive indices; now it gives an error.
* indexByRow(dim) would give nonsense results when prod(dim) >= 2^31; now
it gives an informative error.
* col-/rowAvgsPerColSet() would return vector rather than matrix if
nrow(X) <= 1. Thanks to Peter Hickey (Johns Hopkins University) for
troubleshooting and providing a fix.
DEPRECATED AND DEFUNCT:
* Previously deprecated meanOver() and sumOver() are defunct. Use mean2()
and sum2() instead.
* Previously deprecated weightedVar(x, w, method = "0.14.2") is defunct.
* Dropped previously defunct weightedMedian(..., ties = "both").
* Dropped previously defunct argument 'centers' for col-/rowMads(). Use
'center' instead.
* Dropped previously defunct argument 'flavor' of colRanks() and rowRanks().
Version: 0.52.2 [2017-04-13]
BUG FIXES:
* Several of the row- and column-based functions would core dump R if the
matrix was of a data type other than logical, integer, or numeric, e.g.
character or complex. This is now detected and an informative error is
produced instead. Similarly, some vector-based functions could potentially
core dump R or silently return a nonsense result. Thank you Hervé Pagès,
Bioconductor Core, for the report.
DEPRECATED AND DEFUNCT:
* rowVars(..., method = "0.14.2") that was added for very unlikely needs of
backward compatibility of an invalid degree-of-freedom term is deprecated.
Version: 0.52.1 [2017-04-04]
BUG FIXES:
* The package test on matrixStats:::benchmark() tried to run even if
not all suggested packages were available.
Version: 0.52.0 [2017-04-03]
SIGNIFICANT CHANGES:
* Since anyNA() is a built-in function since R (>= 3.1.0), please use that
instead of anyMissing() part of this package. The latter will eventually
be deprecated. For consistency with the anyNA() name, colAnyNAs() and
rowAnyNAs() are now also available replacing the identically
colAnyMissings() and rowAnyMissings() functions, which will also be
deprecated in a future release.
* meanOver() was renamed to mean2() and sumOver() was renamed to sum2().
NEW FEATURES:
* Added colSums2() and rowSums2() which work like colSums() and rowSums()
of the base package but also supports efficient subsetting via optional
arguments 'rows' and 'cols'.
* Added colMeans2() and rowMeans2() which work like colMeans() and rowMeans()
of the base package but also supports efficient subsetting via optional
arguments 'rows' and 'cols'.
* Functions colDiffs() and rowDiffs() gained argument 'dim.'.
* Functions colWeightedMads() and rowWeightedMads() gained arguments
'constant' and 'center'. The current implementation only support scalars
for these arguments, which means that the same values are applied to all
columns and rows, respectively. In previous version a hard-to-understand
error would be produced if 'center' was of length greater than one; now
an more informative error message is given.
* Package is now silent when loaded; it no longer displays a startup message.
SOFTWARE QUALITY:
* Continuous-integration testing is now also done on macOS, in addition to
Linux and Windows.
* ROBUSTNESS: Package now registers the native API using also
R_useDynamicSymbols().
CODE REFACTORING:
* Cleaned up native low-level API and renamed native source code files
to make it easier to navigate the native API.
* Now using roxygen for help and NAMESPACE (was R.oo::Rdoc).
BUG FIXES:
* rowAnys(x) on numeric matrices 'x' would return rowAnys(x == 1) and
not rowAnys(x != 0). Same for colAnys(), rowAlls(), and colAlls().
Thanks Richard Cotton for reporting on this.
* sumOver(x) and meanOver(x) would incorrectly return -Inf or +Inf if the
intermediate sum would have that value, even if one of the following
elements would turn the intermediate sum into NaN or NA, e.g. with 'x'
as c(-Inf, NaN), c(-Inf, +Inf), or c(+Inf, NA).
* WORKAROUND: Benchmark reports generated by matrixStats:::benchmark() would
use any custom R prompt that is currently set in the R session, which may
not render very well. Now it forces the prompt to be the built-in "> " one.
DEPRECATED AND DEFUNCT:
* The package API is only intended for matrices and vectors of type
numeric, integer and logical. However, a few functions would still
return if called with a data.frame. This was never intended to work
and is now an error. Specifically, functions colAlls(), colAnys(),
colProds(), colQuantiles(), colIQRs(), colWeightedMeans(),
colWeightedMedians(), and colCollapse() now produce warnings if called
with a data.frame. Same for the corresponding row- functions.
The use of a data.frame will be produce an error in future releases.
* meanOver() and sumOver() are deprecated because they were renamed to
mean2() and sum2(), respectively.
* Previously deprecated (and ignored) argument 'flavor' of colRanks() and
rowRanks() is now defunct.
* Previously deprecated support for passing non-vector, non-matrix objects
to rowAlls(), rowAnys(), rowCollapse(), and the corresponding column-based
versions are now defunct. Likewise, rowProds(), rowQuantiles(),
rowWeightedMeans(), rowWeightedMedians(), and the corresponding column-based
versions are also defunct. The rationale for this is to tighten up the
identity of the matrixStats package and what types of input it accepts.
This will also help optimize the code further.
Version: 0.51.0 [2016-10-08]
PERFORMANCE AND MEMORY:
* SPEEDUP / CLEANUP: rowMedians() and colMedians() are now plain functions.
They were previously S4 methods (due to a Bioconductor legacy). The
package no longer imports the methods package.
* SPEEDUP: Now native API is formally registered allowing for faster lookup
of routines from R.
Version: 0.50.2 [2016-04-24]
BUG FIXES:
* Package now installs on R (>= 2.12.0) as claimed. Thanks to Mikko Korpela
at Aalto University School of Science, Finland, for troubleshooting and
providing a fix.
* logSumExp(c(-Inf, -Inf, ...)) would return NaN rather than -Inf. Thanks to
Jason Xu (University of Washington) for reporting and Brennan Vincent for
troubleshooting and contributing a fix.
Version: 0.50.1 [2015-12-14]
BUG FIXES:
* The Undefined Behavior Sanitizer (UBsan) reported on a memcall(src, dest, 0)
call when dest == null. Thanks to Brian Ripley and the CRAN check tools for
catching this. We could reproduce this with gcc 5.1.1 but not with gcc 4.9.2.
Version: 0.50.0 [2015-12-13]
NEW FEATURES:
* MAJOR FEATURE UPDATE: Subsetting arguments 'idxs', 'rows' and 'cols' were
added to all functions such that the calculations are performed on the
requested subset while avoiding creating a subsetted copy, i.e.
rowVars(x, cols = 4:6) is a much faster and more memory efficient version
than rowVars(x[, 4:6]) and even yet more efficient than
apply(x, MARGIN = 1L, FUN = var). These features were added by Dongcan Jiang,
Peking University, with support from the Google Summer of Code program.
A great thank you to Dongcan and to Google for making this possible.
Version: 0.15.0 [2015-10-26]
NEW FEATURES:
* CONSISTENCY: Now all weight arguments ('w' and 'W') default to NULL, which
corresponds to uniform weights.
CODE REFACTORING:
* ROBUSTNESS: Importing 'stats' functions in namespace.
BUG FIXES:
* weightedVar(x, w) used the wrong bias correction factor resulting in an
estimate that was tau too large, where
tau = ((sum(w) - 1) / sum(w)) / ((length(w) - 1) / length(w)).
Thanks to Wolfgang Abele for reporting and troubleshooting on this.
* weightedVar(x) with length(x) = 1 returned 0 no NA. Same for weightedSd().
* weightedMedian(x, w = NA_real_) returned 'x' rather than NA_real_. This
only happened for length(w) = 1.
* allocArray(dim) failed for prod(dim) >= .Machine$integer.max.
DEPRECATED AND DEFUNCT:
* CLEANUP: Defunct argument 'centers' for col-/rowMads(); use 'center'.
* weightedVar(x, w, method = "0.14.2") is deprecated.
Version: 0.14.2 [2015-06-23]
BUG FIXES:
* x_OP_y() and t_tx_OP_y() would return garbage on Solaris SPARC (and possibly
other architectures as well) when input was integer and had missing values.
Version: 0.14.1 [2015-06-17]
BUG FIXES:
* product(x, na.rm = FALSE) for integer 'x' with both zeros and NAs returned
zero rather than NA.
* weightedMean(x, w, na.rm = TRUE) did not handle missing values in 'x'
properly, if it was an integer. It would also return NaN if there were
weights 'w' with missing values, whereas stats::weighted.mean() would skip
such data points. Now weightedMean() does the same.
* (col|row)WeightedMedians() did not handle infinite weights as
weightedMedian() does.
* x_OP_y(x, y, OP, na.rm = FALSE) returned garbage iff 'x' or 'y' had
missing values of type integer.
* rowQuantiles() and rowIQRs() did not work for single-row matrices.
Analogously for the corresponding column functions.
* rowCumsums(), rowCumprods() rowCummins(), and rowCummaxs(), accessed
out-of-bound elements for Nx0 matrices where N > 0. The corresponding
column methods has similar memory errors for 0xK matrices where K > 0.
* anyMissing(list(NULL)) returned NULL; now FALSE.
* rowCounts() resulted in garbage if a previous column had NAs (because it
forgot to update index kk in such cases).
* rowCumprods(x) handled missing values and zeros incorrectly for integer
'x (not double); a zero would trump an existing missing value causing the
following cumulative products to become zero. It was only a zero that
trumped NAs; any other integer would work as expected. Note, this bug
was not in colCumprods().
* rowAnys(x, value, na.rm = FALSE) did not handle missing values in a numeric
'x' properly. Similarly, for non-numeric and non-logical 'x', row- and
colAnys(), row- and colAlls(), anyValue() and allValue() did not handle
when 'value' was a missing value.
* All of the above bugs were identified and fixed by Dongcan Jiang (Peking
University, China), who also added corresponding unit tests.
Version: 0.14.0 [2015-02-13]
SIGNIFICANT CHANGES:
* CLEANUP: anyMissing() is no longer an S4 generic. This was done as part of
the migration of making all functions of matrixStats plain R functions,
which minimizes calling overhead and it will also allow us to drop 'methods'
from the package dependencies. I've scanned all CRAN and Bioconductor
packages depending on matrixStats and none of them relied on anyMissing()
dispatching on class, so hopefully this move has little impact. The only
remaining S4 methods are now colMedians() and rowMedians().
NEW FEATURES:
* CONSISTENCY: Renamed argument 'centers' of col-/rowMads() to 'center'.
This is consistent with col-/rowVars().
* CONSISTENCY: col-/rowVars() now use na.rm = FALSE as the default
(na.rm = TRUE was mistakenly introduced as the default in v0.9.7).
PERFORMANCE AND MEMORY:
* SPEEDUP: The check for user interrupts at the C level is now done less
frequently of the functions. It does every k:th iteration, where
k = 2^20, which is tested for using (iter % k == 0). It turns out, at
least with the default compiler optimization settings that I use, that
this test is 3 times faster if k = 2^n where n is an integer. The
following functions checks for user interrupts: logSumExp(),
(col|row)LogSumExps(), (col|row)Medians(),, (col|row)Mads(),
(col|row)Vars(), and (col|row)Cum(Min|Max|prod|sum)s().
* SPEEDUP: logSumExp(x) is now faster if 'x' does not contain any missing
values. It is also faster if all values are missing or the maximum value
is +Inf - in both cases it can skip the actual summation step.
SOFTWARE QUALITY:
* ROBUSTNESS/TESTS: Package tests cover 96% of the code (was 91%).
CODE REFACTORING:
* CLEANUP: Package no longer depends on R.methodsS3.
BUG FIXES:
* all() and any() flavored methods on non-numeric and non-logical (e.g.
character) vectors and matrices with na.rm = FALSE did not give results
consistent with all() and any() if there were missing values. For
example, with x <- c("a", NA, "b") we have all(x == "a") == FALSE and
any(x == "a") == TRUE whereas our corresponding methods would return NA in
those cases. The methods fixed are allValue(), anyValue(), col-/rowAlls(),
and col-/rowAnys(). Added more package tests to cover these cases.
* logSumExp(x, na.rm = TRUE) would return NA if all values were NA and
length(x) > 1. Now it returns -Inf for all length(x):s.
Version: 0.13.1 [2015-01-21]
BUG FIXES:
* diff2() with differences >= 3 would *read* spurious values beyond the
allocated memory. This error, introduced in 0.13.0, was harmless in the
sense that the returned value was unaffected and still correct. Thanks
to Brian Ripley and the CRAN check tools for catching this. I could
reproduce it locally with 'valgrind'.
Version: 0.13.0 [2015-01-20]
SIGNIFICANT CHANGES:
* SPEEDUP/CLEANUP: Turned several S3 and S4 methods into plain R functions,
which decreases the overhead of calling the functions. After this there
are no longer any S3 methods. Remaining S4 methods are anyMissing() and
rowMedians().
NEW FEATURES:
* Added weightedMean(), which is ~10 times faster than stats::weighted.mean().
* Added count(x, value) which is a notably faster than sum(x == value). This
can also be used to count missing values etc.
* Added allValue() and anyValue() for all(x == value) and any(x == value).
* Added diff2(), which is notably faster than base::diff() for vectors, which
it is designed for.
* Added iqrDiff() and (col|row)IqrDiffs().
* CONSISTENCY: Now rowQuantiles(x, na.rm = TRUE) returns all NAs for rows
with missing values. Analogously for colQuantiles(), colIQRs(), rowIQRs()
and iqr(). Previously, all these functions gave an error saying missing
values are not allowed.
* COMPLETENESS: Added corresponding "missing" vector functions for already
existing column and row functions. Similarly, added "missing" column and
row functions for already existing vector functions, e.g. added iqr() and
count() to complement already existing (col|row)IQRs() and (col|row)Counts()
functions.
* ROBUSTNESS: Now column and row methods give slightly more informative error
messages if a data.frame is passed instead of a matrix.
DOCUMENTATION:
* Added vignette summarizing available functions.
PERFORMANCE AND MEMORY:
* SPEEDUP: (col|row)Diffs() are now implemented in native code and notably
faster than diff() for matrices.
* SPEEDUP: Made binCounts() and binMeans() a bit faster.
* SPEEDUP: Implemented weightedMedian() in native code, which made it ~3-10
times faster. Dropped support for ties = "both", because it would have to
return two values in case of ties, which made the API unnecessarily
complicated. If really needed, then call the function twice with
ties = "min" and ties = "max".
* SPEEDUP: (col|row)Anys() and (col|row)Alls() is now notably faster compared
to previous versions.
CODE REFACTORING:
* CLEANUP: In the effort of migrating anyMissing() into a plain R function,
the specific anyMissing() implementations for data.frame:s and and list:s
were dropped and is now handled by anyMissing() for "ANY", which is the only
S4 method remaining now. In a near future release, this remaining "ANY"
method will turned into a plain R function and the current S4 generic will
be dropped. We know of know CRAN and Bioconductor packages that relies on
it being a generic function. Note also that since R (>= 3.1.0) there is a
base::anyNA() function that does the exact same thing making anyMissing()
obsolete.
BUG FIXES:
* weightedMedian(..., ties = "both") would give an error if there was a tie.
Added package test for this case.
DEPRECATED AND DEFUNCT:
* weightedMedian(..., ties = "both") is now defunct.
Version: 0.12.2 [2014-12-07]
BUG FIXES:
* CODE FIX: The native code for product() on integer vector incorrectly used
C-level abs() on intermediate values despite those being doubles requiring
fabs(). Despite this, the calculated product would still be correct (at
least when validated on several local setups as well as on the CRAN servers).
Again, thanks to Brian Ripley for pointing out another invalid integer-double
coersion at the C level.
DEPRECATED AND DEFUNCT:
* weightedMedian(..., interpolate = FALSE, ties = "both") is defunct.
Version: 0.12.1 [2014-12-06]
SOFTWARE QUALITY:
* ROBUSTNESS: Updated package tests to check methods in more scenarios,
especially with both integer and numeric input data.
BUG FIXES:
* (col|row)Cumsums(x) where 'x' is integer would return garbage for columns
(rows) containing missing values.
* rowMads(x) where 'x' is numeric (not integer) would give incorrect results
for rows that had an *odd* number of values (no ties). Analogously issues
with colMads(). Added package tests for such cases too. Thanks to Brian
Ripley and the CRAN check tools for (yet again) catching another coding
mistake. Details: This was because the C-level calculation of the absolute
value of residuals toward the median would use integer-based abs() rather
than double-based fabs(). Now it fabs() is used when the values are double
and abs() when they are integers.
Version: 0.12.0 [2014-12-05]
* Submitted to CRAN.
Version: 0.11.9 [2014-11-26]
NEW FEATURES:
* Added (col|row)Cumsums(), (col|row)Cumprods(), (col|row)Cummins(), and
(col|row)Cummaxs().
BUG FIXES:
* (col|row)WeightedMeans() with all zero weights gave mean estimates with
values 0 instead of NaN.
Version: 0.11.8 [2014-11-25]
PERFORMANCE AND MEMORY:
* SPEEDUP: Implemented (col|row)Mads(), (col|row)Sds() and (col|row)Vars() in
native code.
* SPEEDUP: Made (col|row)Quantiles(x) faster for 'x' without missing values
(and default type = 7L quantiles). It should still be implemented in
native code though.
* SPEEDUP: Made rowWeightedMeans() faster.
BUG FIXES:
* (col|row)Medians(x) when 'x' is integer would give invalid median values in
case (a) it was calculated as the mean of two values ("ties"), and (b) the
sum of those values where greater than .Machine$integer.max. Now such ties
are calculated using floating point precision. Add lots of package tests.
Version: 0.11.6 [2014-11-16]
PERFORMANCE AND MEMORY:
* SPEEDUP: Now (col|row)Mins(), (col|row)Maxs() and (col|row)Ranges() are
implemented in native code providing a significant speedup.
* SPEEDUP: Now colOrderStats() also is implemented in native code, which
indirectly makes colMins(), colMaxs() and colRanges() faster.
* SPEEDUP: colTabulates(x) no longer uses rowTabulates(t(x)).
* SPEEDUP: colQuantiles(x) no longer uses rowQuantiles(t(x)).
DEPRECATED AND DEFUNCT:
* CLEANUP: Argument 'flavor' of (col|row)Ranks() is now ignored.
Version: 0.11.5 [2014-11-15]
SIGNIFICANT CHANGES:
* (col|row)Prods() now uses default method = "direct" (was "expSumLog").
PERFORMANCE AND MEMORY:
* SPEEDUP: Now colCollapse(x) no longer utilizes rowCollapse(t(x)). Added
package tests for (col|row)Collapse().
* SPEEDUP: Now colDiffs(x) no longer uses rowDiffs(t(x)). Added package tests
for (col|row)Diffs().
* SPEEDUP: Package no longer utilizes match.arg() due to its overhead; methods
sumOver(), (col|row)Prods() and (col|row)Ranks() were updated.
Version: 0.11.4 [2014-11-14]
NEW FEATURES:
* Added support for vector input to several of the row- and column methods
as long as the "intended" matrix dimension is specified via argument 'dim'.
For instance, rowCounts(x, dim = c(nrow, ncol)) is the same as
rowCounts(matrix(x, nrow, ncol)), but more efficient since it avoids
creating/allocating a temporary matrix.
PERFORMANCE AND MEMORY:
* SPEEDUP: Now colCounts() is implemented in native code. Moreover,
(col|row)Counts() are now also implemented in native code for logical input
(previously only for integer and double input). Added more package tests
and benchmarks for these functions.
Version: 0.11.3 [2014-11-11]
SIGNIFICANT CHANGES:
* Turned sdDiff(), madDiff(), varDiff(), weightedSd(), weightedVar() and
weightedMad() into plain functions (were generic functions).
CODE REFACTORING:
* Removed unnecessary usage of '::'.
Version: 0.11.2 [2014-11-09]
SIGNIFICANT CHANGES:
* SPEEDUP: Implemented indexByRow() in native code and it is no longer a
generic function, but a regular function, which is also faster to call.
The first argument of indexByRow() has been changed to 'dim' such that one
should use indexByRow(dim(X)) instead of indexByRow(X) as in the past.
The latter form is still supported, but deprecated.
NEW FEATURES:
* Added allocVector(), allocMatrix() and allocArray() for faster allocation
numeric vectors, matrices and arrays, particularly when filled with
non-missing values.
DEPRECATED AND DEFUNCT:
* Calling indexByRow(X) with a matrix 'X' is deprectated. Instead call it
with indexByRow(dim(X)).
Version: 0.11.1 [2014-11-07]
NEW FEATURES:
* Better support for long vectors.
* PRECISION: Using greater floating-point precision in more internal
intermediate calculations, where possible.
SOFTWARE QUALITY:
* ROBUSTNESS: Although unlikely, with long vectors support for binCounts()
and binMeans() it is possible that a bin gets a higher count than what
can be represented by an R integer (.Machine$integer.max = 2^31-1). If
that happens, an informative warning is generated and the bin count is
set to .Machine$integer.max. If this happens for binMeans(), the
corresponding mean is still properly calculated and valid.
CODE REFACTORING:
* CLEANUP: Cleanup and harmonized the internal C API such there are two
well defined API levels. The high-level API is called by R via .Call()
and takes care of most of the argument validation and construction of
the return value. This function dispatch to functions in the low-level
API based on data type(s) and other arguments. The low-level API is
written to work with basic C data types only.
BUG FIXES:
* Package incorrectly redefined R_xlen_t on R (>= 3.0.0) systems where
LONG_VECTOR_SUPPORT is not supported.
Version: 0.11.0 [2014-11-02]
NEW FEATURES:
* Added sumOver() and meanOver(), which are notably faster versions of
sum(x[idxs]) and mean(x[idxs]). Moreover, instead of having to do
sum(as.numeric(x)) to avoid integer overflow when 'x' is an integer vector,
one can do sumOver(x, mode = "numeric"), which avoids the extra copy
created when coercing to numeric (this numeric copy is also twice as large
as the integer vector). Added package tests and benchmark reports for
these functions.
Version: 0.10.4 [2014-11-01]
PERFORMANCE AND MEMORY:
* SPEEDUP: Made anyMissing(), logSumExp(), (col|row)Medians(),
(col|row)Counts() slightly faster by making the native code assign
the results directly to the native vector instead of to the R vector,
e.g. ansp[i] = v where ansp = REAL(ans) instead of REAL(ans)[i] = v.
* Added benchmark reports for anyMissing() and logSumExp().
Version: 0.10.3 [2014-10-01]
BUG FIXES:
* binMeans() returned 0.0 instead of NA_real_ for empty bins.
Version: 0.10.2 [2014-09-01]
BUG FIXES:
* On some systems, the package failed to build on R (<= 2.15.3) with
compilation error: "redefinition of typedef 'R_xlen_t'".
Version: 0.10.1 [2014-06-09]
PERFORMANCE AND MEMORY:
* Added benchmark reports for also non-matrixStats functions col-/rowSums()
and col-/rowMeans().
* Now all colNnn() and rowNnn() methods are benchmarked in a combined report
making it possible to also compare colNnn(x) with rowNnn(t(x)).
Version: 0.10.0 [2014-06-07]
SOFTWARE QUALITY:
* Relaxed some packages tests such that they assert numerical correctness via
all.equal() rather than identical().
* Submitted to CRAN.
BUG FIXES:
* The package tests for product() incorrectly assumed that the value of
prod(c(NaN, NA)) is uniquely defined. However, as documented in
help("is.nan"), it may be NA or NaN depending on R system/platform.
Version: 0.9.7 [2014-06-05]
BUG FIXES:
* Introduced a bug in v0.9.5 causing col-/rowVars() and hence also
col-/rowSds() to return garbage. Add package tests for these now.
* Submitted to CRAN.
Version: 0.9.6 [2014-06-04]
NEW FEATURES:
* Added signTabulate() for tabulating the number of negatives, zeros,
positives and missing values. For doubles, the number of negative and
positive infinite values are also counted.
PERFORMANCE AND MEMORY:
* SPEEDUP: Now col-/rowProds() utilizes new product() function.
* SPEEDUP: Added product() for calculating the product of a numeric
vector via the logarithm.
Version: 0.9.5 [2014-06-04]
SIGNIFICANT CHANGES:
* SPEEDUP: Made weightedMedian() a plain function (was an S3 method).
* CLEANUP: Now only exporting plain functions and generic functions.
* SPEEDUP: Turned more S4 methods into S3 methods, e.g. rowCounts(),
rowAlls(), rowAnys(), rowTabulates() and rowCollapse().
NEW FEATURES:
* Added argument 'method' to col-/rowProds() for controlling how the product
is calculated.
PERFORMANCE AND MEMORY:
* SPEEDUP: Package is now byte compiled.
* SPEEDUP: Made rowProds() and rowTabulates() notably faster.
* SPEEDUP: Now rowCounts(), rowAnys(), rowAlls() and corresponding column
methods can search for any value in addition to the default TRUE. The
search for a matching integer or double value is done in native code,
which is notably faster (and more memory efficient because it avoids
creating any new objects).
* SPEEDUP: Made colVars() and colSds() notably faster and rowVars() and
rowSds() a slightly bit faster.
* Added benchmark reports, e.g. matrixStats:::benchmark('colMins').
Version: 0.9.4 [2014-05-23]
SIGNIFICANT CHANGES:
* SPEEDUP: Turned several S4 methods into S3 methods, e.g. indexByRow(),
madDiff(), sdDiff() and varDiff().
Version: 0.9.3 [2014-04-26]
NEW FEATURES:
* Added argument 'trim' to madDiff(), sdDiff() and varDiff().
Version: 0.9.2 [2014-04-04]
BUG FIXES:
* The native code of binMeans(x, bx) would try to access an out-of-bounds
value of argument 'y' iff 'x' contained elements that are left of all bins
in 'bx'. This bug had no impact on the results and since no assignment was
done it should also not crash/core dump R. This was discovered thanks to
new memtests (ASAN and valgrind) provided by CRAN.
Version: 0.9.1 [2014-03-31]
BUG FIXES:
* rowProds() would throw "Error in rowSums(isNeg) : 'x' must be an array of
at least two dimensions" on matrices where all rows contained at least one
zero. Thanks to Roel Verbelen at KU Leuven for the report.
Version: 0.9.0 [2014-03-26]
NEW FEATURES:
* Added weighedVar() and weightedSd().
Version: 0.8.14 [2013-11-23]
PERFORMANCE AND MEMORY:
* MEMORY: Updated all functions to do a better job of cleaning out temporarily
allocated objects as soon as possible such that the garbage collector can
remove them sooner, iff wanted. This increase the chance for a smaller
memory footprint.
* Submitted to CRAN.
Version: 0.8.13 [2013-10-08]
NEW FEATURES:
* Added argument 'right' to binCounts() and binMeans() to specify whether
binning should be done by (u,v] or [u,v). Added system tests validating
the correctness of the two cases.
CODE REFACTORING:
* Bumped up package dependencies.
Version: 0.8.12 [2013-09-26]
PERFORMANCE AND MEMORY:
* SPEEDUP: Now utilizing anyMissing() everywhere possible.
Version: 0.8.11 [2013-09-21]
SOFTWARE QUALITY:
* ROBUSTNESS: Now importing 'loadMethod' from 'methods' package such that
'matrixStats' S4-based methods also work when 'methods' is not loaded, e.g.
when 'Rscript' is used, cf. Section 'Default packages' in
'R Installation and Administration'.
* ROBUSTNESS: Updates package system tests such that the can run with only
the 'base' package loaded.
Version: 0.8.10 [2013-09-15]
CODE REFACTORING:
* CLEANUP: Now only importing two functions from the 'methods' package.
* Bumped up package dependencies.
Version: 0.8.9 [2013-08-29]
NEW FEATURES:
* CLEANUP: Now the package startup message acknowledges argument
'quietly' of library()/require().
Version: 0.8.8 [2013-07-29]
DOCUMENTATION:
* The dimension of the return value was swapped in help("rowQuantiles").
Version: 0.8.7 [2013-07-28]
PERFORMANCE AND MEMORY:
* SPEEDUP: Made (col|row)Mins() and (col|row)Maxs() much faster.
BUG FIXES:
* rowRanges(x) on an Nx0 matrix would give an error. Same for colRanges(x)
on an 0xN matrix. Added system tests for these and other special cases.
Version: 0.8.6 [2013-07-20]
CODE REFACTORING:
* Bumped up package dependencies.
BUG FIXES:
* Forgot to declare S3 methods (col|row)WeightedMedians().
Version: 0.8.5 [2013-05-25]
PERFORMANCE AND MEMORY:
* Minor speedup of (col|row)Tabulates() by replacing rm() calls with NULL
assignments.
Version: 0.8.4 [2013-05-20]
DOCUMENTATION:
* CRAN POLICY: Now all Rd \usage{} lines are at most 90 characters long.
Version: 0.8.3 [2013-05-10]
PERFORMANCE AND MEMORY:
* SPEEDUP: binCounts() and binMeans() now uses Hoare's Quicksort for
presorting 'x' before counting/averaging. They also no longer test in
every iteration (== for every data point) whether the last bin has been
reached or not, but only after completing a bin.