-
Notifications
You must be signed in to change notification settings - Fork 89
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add reducer operations (with an 'axis' parameter). (#115)
Includes the non-reducers (mean, var, etc.). * [WIP] Add reducer operations (with an 'axis' parameter). * Start a new studies/reducers.py to work out the 'axis != -1' logic. * Started with a NumPy example. * Get the shape right, first. * Defined 'index' and 'parents', which will pass down reducer information from all types. * Added ByteMaskedArray. * Fixed assertion in NumpyArray shape-strides checking. * NumpyArray with multidimensional shape is solved, but I don't think I'll use it. There are too many restrictions when we want to apply 'axis', 'semigroup', or descend from variable-length structrues * I'll need 'carry' operations on everything. * Getting close... * axis=0 for a depth2 jagged array is strange * Building an index for that 'axis=0' in jagged depth 2 case. * Now going for conformance with NumPy. * This is probably pretty close (and much simpler). * I'll need some sort of 'parents', but not the basic ones. * This 'something' might be along the right lines... * It is\! * Properly passed it down to RawArray; now we just need to wrap up the result. * Make offsets out of distincts. * Correctly handling the axis=0, regular jagged-2 case. * The nearly-regular cases are correct. * The algorithm for 'axis=0', jagged-2 works for all irregular cases. * Remove false starts and debugging code. * Move established tests to a separate file. * RegularArray and ListArray just defer to ListOffsetArray. * A regular, depth-2 'axis=-1' case also works. * An irregular, depth-2 'axis=-1' case also works. * All the 'axis' cases work for jagged depths 1 and 2 (which, by induction, is everything). Regular agrees with NumPy and irregular is a meaningful generalization. * start thinking about axis depth * Solved the problem of reducing variable-depth RecordArrays. * Enough studies; time to start writing C++. * Implemented *Array::branch_depth. * Moved some things in util and added Reducer enum class. * Start a new file for all the kernels in the reduce_next operations. * Working through the implementation of ListOffsetArray::reduce_next. * Finished writing nonlocal part of IndexedOffsetArray::reducer_next (tested nothing). * [skip ci] save work for now * Should compile again. * Implemented, but have not tested, ListOffsetArray::reduce_next. * Replace util::Reducer enum with an extensible class. * Start on ReducerProd. * Finished writing NumpyArray::reducer_next (tested nothing). * Add test for PR115. * Switch. * I have a testing procedure (and first test passed). * Two tests pass. * Three tests pass. * A lot of tests work. * All of the study tests have been moved to C++. * Implemented and tested RecordArray::reduce_next. * [skip ci] Implemented an idea for Record::reduce_next but have not tested it (not even compilation). * Record::reduce_next is a good definition. * EmptyArray::reduce_next is a good definition. * IndexedArray::reduce_next is implemented and tested, but maybe a better definition is needed. * *Array::reduce_next has been implemented and tested. * [skip ci] save changes... * Allow reducers to return different types. * Implemented sum, prod, any, all. * Implemented 'count' at the cost of having to hide the old 'count' (will become 'sizes' or 'lengths'). * Implemented 'count_nonzero' and avoid 'count's collision with old 'count'. * Implemented 'min'. * Implemented 'max', but 'sum' and 'prod' are supposed to promote to int64. * [skip ci] save work * 'sum' and 'prod' promote to int64 (and uint64 for unsigned types). * Apply 'mask' so that we don't have to think of min/max as an operation with an identity (which is non-intuitive for integer types). * Try to get Windows right. * Implemented 'keepdims' and try again to get Windows right. * Try to get Windows right again. * Windows found a bug. * Fix compilation errors in the #ifdef Windows. * Just working on compiler errors through CI. * Windows does not downcast 64-bit arrays into 32-bit. (Good\!) * Make 'return_type' and 'return_typesize' agree with 'apply'. * [skip ci] Working on the non-reducers. * The non-reducers work. I think this PR is done. * Update README.
- Loading branch information
Showing
43 changed files
with
6,100 additions
and
130 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
0.1.116 | ||
0.1.117 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,245 @@ | ||
# BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE | ||
|
||
from __future__ import absolute_import | ||
|
||
import numpy | ||
|
||
import awkward1._util | ||
import awkward1._numpy | ||
import awkward1.layout | ||
import awkward1.operations.convert | ||
|
||
def count(array, axis=None, keepdims=False): | ||
layout = awkward1.operations.convert.tolayout(array, allowrecord=False, allowother=False) | ||
if axis is None: | ||
def reduce(xs): | ||
if len(xs) == 1: | ||
return xs[0] | ||
else: | ||
return xs[0] + reduce(xs[1:]) | ||
return reduce([numpy.size(x) for x in awkward1._util.completely_flatten(layout)]) | ||
else: | ||
behavior = awkward1._util.behaviorof(array) | ||
return awkward1._util.wrap(layout.count(axis=axis, mask=False, keepdims=keepdims), behavior) | ||
|
||
@awkward1._numpy.implements(numpy.count_nonzero) | ||
def count_nonzero(array, axis=None, keepdims=False): | ||
layout = awkward1.operations.convert.tolayout(array, allowrecord=False, allowother=False) | ||
if axis is None: | ||
def reduce(xs): | ||
if len(xs) == 1: | ||
return xs[0] | ||
else: | ||
return xs[0] + reduce(xs[1:]) | ||
return reduce([numpy.count_nonzero(x) for x in awkward1._util.completely_flatten(layout)]) | ||
else: | ||
behavior = awkward1._util.behaviorof(array) | ||
return awkward1._util.wrap(layout.count_nonzero(axis=axis, mask=False, keepdims=keepdims), behavior) | ||
|
||
@awkward1._numpy.implements(numpy.sum) | ||
def sum(array, axis=None, keepdims=False): | ||
layout = awkward1.operations.convert.tolayout(array, allowrecord=False, allowother=False) | ||
if axis is None: | ||
def reduce(xs): | ||
if len(xs) == 1: | ||
return xs[0] | ||
else: | ||
return xs[0] + reduce(xs[1:]) | ||
return reduce([numpy.sum(x) for x in awkward1._util.completely_flatten(layout)]) | ||
else: | ||
behavior = awkward1._util.behaviorof(array) | ||
return awkward1._util.wrap(layout.sum(axis=axis, mask=False, keepdims=keepdims), behavior) | ||
|
||
@awkward1._numpy.implements(numpy.prod) | ||
def prod(array, axis=None, keepdims=False): | ||
layout = awkward1.operations.convert.tolayout(array, allowrecord=False, allowother=False) | ||
if axis is None: | ||
def reduce(xs): | ||
if len(xs) == 1: | ||
return xs[0] | ||
else: | ||
return xs[0] * reduce(xs[1:]) | ||
return reduce([numpy.prod(x) for x in awkward1._util.completely_flatten(layout)]) | ||
else: | ||
behavior = awkward1._util.behaviorof(array) | ||
return awkward1._util.wrap(layout.prod(axis=axis, mask=False, keepdims=keepdims), behavior) | ||
|
||
@awkward1._numpy.implements(numpy.any) | ||
def any(array, axis=None, keepdims=False): | ||
layout = awkward1.operations.convert.tolayout(array, allowrecord=False, allowother=False) | ||
if axis is None: | ||
def reduce(xs): | ||
if len(xs) == 1: | ||
return xs[0] | ||
else: | ||
return xs[0] or reduce(xs[1:]) | ||
return reduce([numpy.any(x) for x in awkward1._util.completely_flatten(layout)]) | ||
else: | ||
behavior = awkward1._util.behaviorof(array) | ||
return awkward1._util.wrap(layout.any(axis=axis, mask=False, keepdims=keepdims), behavior) | ||
|
||
@awkward1._numpy.implements(numpy.all) | ||
def all(array, axis=None, keepdims=False): | ||
layout = awkward1.operations.convert.tolayout(array, allowrecord=False, allowother=False) | ||
if axis is None: | ||
def reduce(xs): | ||
if len(xs) == 1: | ||
return xs[0] | ||
else: | ||
return xs[0] and reduce(xs[1:]) | ||
return reduce([numpy.all(x) for x in awkward1._util.completely_flatten(layout)]) | ||
else: | ||
behavior = awkward1._util.behaviorof(array) | ||
return awkward1._util.wrap(layout.all(axis=axis, mask=False, keepdims=keepdims), behavior) | ||
|
||
@awkward1._numpy.implements(numpy.min) | ||
def min(array, axis=None, keepdims=False): | ||
layout = awkward1.operations.convert.tolayout(array, allowrecord=False, allowother=False) | ||
if axis is None: | ||
def reduce(xs): | ||
if len(xs) == 0: | ||
return None | ||
elif len(xs) == 1: | ||
return xs[0] | ||
else: | ||
x, y = xs[0], reduce(xs[1:]) | ||
return x if x < y else y | ||
tmp = awkward1._util.completely_flatten(layout) | ||
return reduce([numpy.min(x) for x in tmp if len(x) > 0]) | ||
else: | ||
behavior = awkward1._util.behaviorof(array) | ||
return awkward1._util.wrap(layout.min(axis=axis, mask=True, keepdims=keepdims), behavior) | ||
|
||
@awkward1._numpy.implements(numpy.max) | ||
def max(array, axis=None, keepdims=False): | ||
layout = awkward1.operations.convert.tolayout(array, allowrecord=False, allowother=False) | ||
if axis is None: | ||
def reduce(xs): | ||
if len(xs) == 0: | ||
return None | ||
elif len(xs) == 1: | ||
return xs[0] | ||
else: | ||
x, y = xs[0], reduce(xs[1:]) | ||
return x if x > y else y | ||
tmp = awkward1._util.completely_flatten(layout) | ||
return reduce([numpy.max(x) for x in tmp if len(x) > 0]) | ||
else: | ||
behavior = awkward1._util.behaviorof(array) | ||
return awkward1._util.wrap(layout.max(axis=axis, mask=True, keepdims=keepdims), behavior) | ||
|
||
### The following are not strictly reducers, but are defined in terms of reducers and ufuncs. | ||
|
||
def moment(x, n, weight=None, axis=None, keepdims=False): | ||
with numpy.errstate(invalid="ignore"): | ||
if weight is None: | ||
sumw = count(x, axis=axis, keepdims=keepdims) | ||
sumwxn = sum(x**n, axis=axis, keepdims=keepdims) | ||
else: | ||
sumw = sum(x*0 + weight, axis=axis, keepdims=keepdims) | ||
sumwxn = sum((x*weight)**n, axis=axis, keepdims=keepdims) | ||
return numpy.true_divide(sumwxn, sumw) | ||
|
||
@awkward1._numpy.implements(numpy.mean) | ||
def mean(x, weight=None, axis=None, keepdims=False): | ||
with numpy.errstate(invalid="ignore"): | ||
if weight is None: | ||
sumw = count(x, axis=axis, keepdims=keepdims) | ||
sumwx = sum(x, axis=axis, keepdims=keepdims) | ||
else: | ||
sumw = sum(x*0 + weight, axis=axis, keepdims=keepdims) | ||
sumwx = sum(x*weight, axis=axis, keepdims=keepdims) | ||
return numpy.true_divide(sumwx, sumw) | ||
|
||
@awkward1._numpy.implements(numpy.var) | ||
def var(x, weight=None, ddof=0, axis=None, keepdims=False): | ||
with numpy.errstate(invalid="ignore"): | ||
xmean = mean(x, weight=weight, axis=axis, keepdims=keepdims) | ||
if weight is None: | ||
sumw = count(x, axis=axis, keepdims=keepdims) | ||
sumwxx = sum((x - xmean)**2, axis=axis, keepdims=keepdims) | ||
else: | ||
sumw = sum(x*0 + weight, axis=axis, keepdims=keepdims) | ||
sumwxx = sum((x - xmean)**2 * weight, axis=axis, keepdims=keepdims) | ||
if ddof != 0: | ||
return numpy.true_divide(sumwxx, sumw) * numpy.true_divide(sumw, sumw - ddof) | ||
else: | ||
return numpy.true_divide(sumwxx, sumw) | ||
|
||
@awkward1._numpy.implements(numpy.std) | ||
def std(x, weight=None, ddof=0, axis=None, keepdims=False): | ||
with numpy.errstate(invalid="ignore"): | ||
return numpy.sqrt(var(x, weight=weight, ddof=ddof, axis=axis, keepdims=keepdims)) | ||
|
||
def covar(x, y, weight=None, axis=None, keepdims=False): | ||
with numpy.errstate(invalid="ignore"): | ||
xmean = mean(x, weight=weight, axis=axis, keepdims=keepdims) | ||
ymean = mean(y, weight=weight, axis=axis, keepdims=keepdims) | ||
if weight is None: | ||
sumw = count(x, axis=axis, keepdims=keepdims) | ||
sumwxy = sum((x - xmean)*(y - ymean), axis=axis, keepdims=keepdims) | ||
else: | ||
sumw = sum(x*0 + weight, axis=axis, keepdims=keepdims) | ||
sumwxy = sum((x - xmean)*(y - ymean)*weight, axis=axis, keepdims=keepdims) | ||
return numpy.true_divide(sumwxy, sumw) | ||
|
||
def corr(x, y, weight=None, axis=None, keepdims=False): | ||
with numpy.errstate(invalid="ignore"): | ||
xmean = mean(x, weight=weight, axis=axis, keepdims=keepdims) | ||
ymean = mean(y, weight=weight, axis=axis, keepdims=keepdims) | ||
xdiff = x - xmean | ||
ydiff = y - ymean | ||
if weight is None: | ||
sumwxx = sum(xdiff**2, axis=axis, keepdims=keepdims) | ||
sumwyy = sum(ydiff**2, axis=axis, keepdims=keepdims) | ||
sumwxy = sum(xdiff*ydiff, axis=axis, keepdims=keepdims) | ||
else: | ||
sumwxx = sum((xdiff**2)*weight, axis=axis, keepdims=keepdims) | ||
sumwyy = sum((ydiff**2)*weight, axis=axis, keepdims=keepdims) | ||
sumwxy = sum((xdiff*ydiff)*weight, axis=axis, keepdims=keepdims) | ||
return numpy.true_divide(sumwxy, numpy.sqrt(sumwxx * sumwyy)) | ||
|
||
def linearfit(x, y, weight=None, axis=None, keepdims=False): | ||
with numpy.errstate(invalid="ignore"): | ||
if weight is None: | ||
sumw = count(x, axis=axis, keepdims=keepdims) | ||
sumwx = sum(x, axis=axis, keepdims=keepdims) | ||
sumwy = sum(y, axis=axis, keepdims=keepdims) | ||
sumwxx = sum(x**2, axis=axis, keepdims=keepdims) | ||
sumwxy = sum(x*y, axis=axis, keepdims=keepdims) | ||
else: | ||
sumw = sum(x*0 + weight, axis=axis, keepdims=keepdims) | ||
sumwx = sum(x*weight, axis=axis, keepdims=keepdims) | ||
sumwy = sum(y*weight, axis=axis, keepdims=keepdims) | ||
sumwxx = sum((x**2)*weight, axis=axis, keepdims=keepdims) | ||
sumwxy = sum(x*y*weight, axis=axis, keepdims=keepdims) | ||
delta = (sumw*sumwxx) - (sumwx*sumwx) | ||
intercept = numpy.true_divide(((sumwxx*sumwy) - (sumwx*sumwxy)), delta) | ||
slope = numpy.true_divide(((sumw*sumwxy) - (sumwx*sumwy)), delta) | ||
intercept_error = numpy.sqrt(numpy.true_divide(sumwxx, delta)) | ||
slope_error = numpy.sqrt(numpy.true_divide(sumw, delta)) | ||
|
||
intercept = awkward1.operations.convert.tolayout(intercept, allowrecord=True, allowother=True) | ||
slope = awkward1.operations.convert.tolayout(slope, allowrecord=True, allowother=True) | ||
intercept_error = awkward1.operations.convert.tolayout(intercept_error, allowrecord=True, allowother=True) | ||
slope_error = awkward1.operations.convert.tolayout(slope_error, allowrecord=True, allowother=True) | ||
|
||
scalar = not isinstance(intercept, awkward1.layout.Content) and not isinstance(slope, awkward1.layout.Content) and not isinstance(intercept_error, awkward1.layout.Content) and not isinstance(slope_error, awkward1.layout.Content) | ||
|
||
if not isinstance(intercept, (awkward1.layout.Content, awkward1.layout.Record)): | ||
intercept = awkward1.layout.NumpyArray(numpy.array([intercept])) | ||
if not isinstance(slope, (awkward1.layout.Content, awkward1.layout.Record)): | ||
slope = awkward1.layout.NumpyArray(numpy.array([slope])) | ||
if not isinstance(intercept_error, (awkward1.layout.Content, awkward1.layout.Record)): | ||
intercept_error = awkward1.layout.NumpyArray(numpy.array([intercept_error])) | ||
if not isinstance(slope_error, (awkward1.layout.Content, awkward1.layout.Record)): | ||
slope_error = awkward1.layout.NumpyArray(numpy.array([slope_error])) | ||
|
||
out = awkward1.layout.RecordArray([intercept, slope, intercept_error, slope_error], ["intercept", "slope", "intercept_error", "slope_error"]) | ||
out.setparameter("__record__", "LinearFit") | ||
if scalar: | ||
out = out[0] | ||
|
||
return awkward1._util.wrap(out, awkward1._util.behaviorof(x, y)) | ||
|
||
__all__ = [x for x in list(globals()) if not x.startswith("_") and x not in ("collections", "numpy", "awkward1")] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.