Skip to content

Commit

Permalink
Start the high-level layer: awkward.Array. (#28)
Browse files Browse the repository at this point in the history
There is now a high-level layer, assignable and dressable types, and `StringBehavior` as a leading example.

* Start PR #28 (skipping issue number).

* All Content arrays get a 'type' parameter to override the bare type.

* The 'type' can be assigned; placeholders for all checks.

* Added 'FIXME' to all the places where Type::none() might need to be replaced.

* REMOVED *Type::compatible because that will be checked against arrays, not other types.

* Stub for tests.

* Start developing the high-level Array.

* awkward.Array.__repr__

* Corrected awkward.Array.__repr__

* Completed awkward.Array.__str__ and __repr__; they fit within GitHub and StackOverflow text boxes without scrolling.

* Drop dangling '[' and ']' at the level of tokens, not characters.

* String will be the first example of type-dressing.

* Types have 'parameters' (dict) in Python.

* Use JSON to ingest Records in __repr__ tests to avoid random key order in old versions of Python.

* Developing dressed types; need to set visibility to 'hidden'. Is that a problem for MacOS?

* Suppose all static libraries have hidden symbols. Is MacOS happy?

* Try again: CMP0063 was preventing the visibility preset from taking effect.

* Print out the CMake version to diagnose MacOS.

* Explicit EXPORT_SYMBOL for the symbols that should be exported.

* Declare shared_ptr containers for Slice and Iterator, to see if that affects their visibility issues in MacOS (*Type are now fine, which suggests the difference).

* The new (smaller) set of symbol errors were all associated with a C++ test; applying the same visibility=hidden to the C++ tests..

* Finally back to DressedTypes: PyDress and PyDressParameters both compile.

* DressedType can be constructed, but __repr__ doesn't work.

* DressedType constructor, string representation, and equality-checking works.

* DressedType constructor, string representation, and equality-checking works.

* Type representations appear as 'string' and 'bytes'.

* To go further, we must fix a few Type::none()s.

* All *::shallow_copy now put type_ in the type slot.

* Fix tests before major change.

* [skip ci] Broke compilation; switching to laptop.

* [skip ci] Compiles up to pyawkward.cpp.

* [skip ci] Closer, but pyawkward.cpp still doesn't compile.

* Compiles and tests pass.

* Moved id_ and innertype_ up to Content (protected).

* Replacing 'innertype' with 'type'.

* Fixed compilation bug.

* Should compile and pass tests now.

* Single 'innertype' method for bare and dressed cases.

* Correct wrapping and string representation.

* Special case for Python 2.7 (string representations).

* More fixes for Python 2.7.

* Now fix the non-Python 2.7.

* *::getitem_range, *::carry, and NumpyArray::contiguous* do not change type or lose type, but Content's creation of RegularArrays creates new ones without type.

* Renamed Record* methods as a first step toward making some of them universal.

* The 'numfields', 'fieldindex', 'key', 'haskey', 'keyaliases', and 'keys' methods are now defined for all Content types.

* The 'numfields', 'fieldindex', 'key', 'haskey', 'keyaliases', and 'keys' methods are now defined for all Type classes.

* *::getitem_field always drops Type. *::getitem_fields keeps Type if the keys retained are within the keys of that Type.

* Filled in all FIXMEs for Type transformations. NumpyArray never changes Type because it is wrapped by the appropriate number of RegularTypes when the Type is queried.

* Defined *Type::level, which drops DressedType and OptionType (through UnionTypes), but keeps everything else at the same level.

* Defined *Type::shallow_equal.

* Tested List*Array::accepts and RegularArray::accepts.

* Finished up to but NOT INCLUDING RecordArray::accepts.

* Finished implementing *Array::accepts, but haven't tested them all.

* *Array::content and RecordArray::field should return a reference so that the user can modify its 'id' and 'type' more naturally.

* Implemented all propagations down in *Array::settype (NOT TESTED).

* Fixed 32-bit warning with clearer code.

* Start tests for type propagation.

* Strings work; this may finish off the PR.

* Clean up test. Drop Python 3.4 support because Azure did.

* Renaming and improved test_string2.

* Fixed Python 2.7.

* Try to fix the lack of 'pytest' on Windows.

* Numba is not available in some parts of the testing matrix.
  • Loading branch information
jpivarski authored Dec 6, 2019
1 parent 8581bdf commit 95a0512
Show file tree
Hide file tree
Showing 68 changed files with 2,608 additions and 977 deletions.
13 changes: 6 additions & 7 deletions .ci/azure-buildtest-awkward.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,12 @@ jobs:
python -m pip install -r requirements.txt
python -m pip install -r requirements-test.txt
python -c "import numpy; print(numpy.__version__)"
python -c "import pytest; print(pytest.__version__)"
displayName: "Install"
- script: |
python setup.py build
pytest -vv tests
python -m pytest -vv tests
displayName: "Build and test"
- job: MacOS
Expand Down Expand Up @@ -119,11 +120,12 @@ jobs:
python -m pip install -r requirements.txt
python -m pip install -r requirements-test.txt
python -c "import numpy; print(numpy.__version__)"
python -c "import pytest; print(pytest.__version__)"
displayName: "Install"
- script: |
python setup.py build
pytest -vv tests
python -m pytest -vv tests
displayName: "Build and test"
- job: Linux
Expand All @@ -141,10 +143,6 @@ jobs:
python.version: "2.7"
python.architecture: "x64"
numpy.version: "1.16.5"
"py34-np13":
python.version: "3.4"
python.architecture: "x64"
numpy.version: "1.13.1"
"py35-np13":
python.version: "3.5"
python.architecture: "x64"
Expand Down Expand Up @@ -182,9 +180,10 @@ jobs:
fi
python -m pip install -r requirements-test.txt
python -c "import numpy; print(numpy.__version__)"
python -c "import pytest; print(pytest.__version__)"
displayName: "Install"
- script: |
python setup.py build
pytest -vv tests
python -m pytest -vv tests
displayName: "Build and test"
11 changes: 10 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

cmake_minimum_required(VERSION 2.8.12.2)
message("-- CMake version " ${CMAKE_VERSION})
project (awkward)
cmake_policy(SET CMP0063 NEW)

set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
Expand Down Expand Up @@ -30,23 +32,30 @@ enable_testing()
macro(addtest name filename)
add_executable(${name} ${filename})
target_link_libraries(${name} PRIVATE awkward-static awkward-cpu-kernels-static)
set_target_properties(${name} PROPERTIES CXX_VISIBILITY_PRESET hidden)
add_test(${name} ${name})
endmacro(addtest)

add_library(awkward-cpu-kernels-objects OBJECT ${CPU_KERNEL_SOURCES})
set_target_properties(awkward-cpu-kernels-objects PROPERTIES POSITION_INDEPENDENT_CODE 1)
add_library(awkward-cpu-kernels-static STATIC $<TARGET_OBJECTS:awkward-cpu-kernels-objects>)
add_library(awkward-cpu-kernels SHARED $<TARGET_OBJECTS:awkward-cpu-kernels-objects>)
set_target_properties(awkward-cpu-kernels-objects PROPERTIES CXX_VISIBILITY_PRESET hidden)
set_target_properties(awkward-cpu-kernels-static PROPERTIES CXX_VISIBILITY_PRESET hidden)
set_target_properties(awkward-cpu-kernels PROPERTIES CXX_VISIBILITY_PRESET hidden)

add_library(awkward-objects OBJECT ${LIBAWKWARD_SOURCES})
set_target_properties(awkward-objects PROPERTIES POSITION_INDEPENDENT_CODE 1)
add_library(awkward-static STATIC $<TARGET_OBJECTS:awkward-objects>)
add_library(awkward SHARED $<TARGET_OBJECTS:awkward-objects>)
target_link_libraries(awkward-static PRIVATE awkward-cpu-kernels-static)
target_link_libraries(awkward PRIVATE awkward-cpu-kernels-static)
set_target_properties(awkward-objects PROPERTIES CXX_VISIBILITY_PRESET hidden)
set_target_properties(awkward-static PROPERTIES CXX_VISIBILITY_PRESET hidden)
set_target_properties(awkward PROPERTIES CXX_VISIBILITY_PRESET hidden)
addtest(PR016 tests/test_PR016_finish_getitem_for_rawarray.cpp)
addtest(PR019 tests/test_PR019_use_json_library.cpp)

pybind11_add_module(layout src/pyawkward.cpp)
set_target_properties(layout PROPERTIES CXX_VISIBILITY_PRESET default)
set_target_properties(layout PROPERTIES CXX_VISIBILITY_PRESET hidden)
target_link_libraries(layout PRIVATE awkward-static)
9 changes: 3 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,6 @@ Completed items are ☑check-marked. See [closed PRs](https://github.com/scikit-
* [X] Test all (tested in mock [studies/fillable.py](tree/master/studies/fillable.py)).
* [X] JSON → Awkward via header-only [RapidJSON](https://rapidjson.org) and `awkward.fromiter`.
* [ ] Explicit broadcasting functions for jagged and non-jagged arrays and scalars.
* [ ] ~~Structure-preserving ufunc-like operation on the C++ side that applies a lambda function to inner data. The Python `__array_ufunc__` implementation will _call_ this to preserve structure.~~
* [ ] Extend `__getitem__` to take jagged arrays of integers and booleans (same behavior as old).
* [ ] Full suite of array types:
* [X] `EmptyArray`: 1-dimensional array with length 0 and unknown type (result of `UnknownFillable`, compatible with all types of arrays).
Expand All @@ -85,21 +84,19 @@ Completed items are ☑check-marked. See [closed PRs](https://github.com/scikit-
* [ ] `ChunkedArray`: same as the old version, except that the type is a union if chunks conflict, not an error, and knowledge of all chunk sizes is always required. (Maybe `AmorphousChunkedArray` would fill that role.)
* [ ] `RegularChunkedArray`: like a `ChunkedArray`, but all chunks are known to have the same size.
* [ ] `VirtualArray`: same as the old version, including caching, but taking C++11 lambda functions for materialization, get-cache, and put-cache. The pybind11 layer will connect this to Python callables.
* [ ] Derived classes with ufunc-defined `Methods` and Numba extensions:
* [ ] `StringArray`: a `ListArray`/`ListOffsetArray` of characters with special methods and an optional encoding.
* [ ] `PyVirtualArray`: takes a Python lambda (which gets carried into `VirtualArray`).
* [ ] `PyObjectArray`: same as the old version.
* [X] Describe high-level types using [datashape](https://datashape.readthedocs.io/en/latest/) and possibly also an in-house schema. (Emit datashape _strings_ from C++.)
* [ ] Type compatibility: option to treat nonexistent record fields as nullable data.
* [ ] Describe mid-level "persistence types" with no lengths, somewhat minimal JSON, optional dtypes/compression.
* [ ] Describe low-level layouts independently of filled arrays (JSON or something)?
* [ ] Layer 1 interface `Array`:
* [X] Layer 1 interface `Array`:
* [ ] Pass through to the layout classes in Python and Numba.
* [ ] Pass through Numpy ufuncs using [NEP 13](https://www.numpy.org/neps/nep-0013-ufunc-overrides.html) (as before).
* [ ] Pass through other Numpy functions using [NEP 18](https://www.numpy.org/neps/nep-0018-array-function-protocol.html) (this would be new).
* [ ] `RecordArray` fields (not called "columns" anymore) through Layer 1 `__getattr__`.
* [ ] Special Layer 1 `Record` type for `RecordArray` elements, supporting some methods and a visual representation based on `Identity` if available, all fields if `recordtype == "tuple"`, or the first field otherwise.
* [ ] Mechanism for adding user-defined `Methods` like `LorentzVector`, as before, but only on Layer 1.
* [X] Mechanism for adding user-defined `Methods` like `LorentzVector`, as before, but only on Layer 1.
* [X] High-level classes for characters and strings.
* [ ] Inerhit from Pandas so that all Layer 1 arrays can be DataFrame columns.
* [ ] Full suite of operations:
* [X] `awkward.tolist`: same as before.
Expand Down
2 changes: 1 addition & 1 deletion VERSION_INFO
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.26
0.1.28
5 changes: 5 additions & 0 deletions awkward1/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,13 @@

import awkward1.layout
import awkward1._numba
import awkward1.highlevel
from awkward1.highlevel import Array
from awkward1.highlevel import Record

from awkward1.operations.convert import *
from awkward1.operations.describe import *

from awkward1.behavior.string import *

__version__ = awkward1.layout.__version__
2 changes: 1 addition & 1 deletion awkward1/_numba/array/recordarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

@numba.extending.typeof_impl.register(awkward1.layout.RecordArray)
def typeof(val, c):
return RecordArrayType([numba.typeof(x) for x in val.values()], val.lookup, val.reverselookup, numba.typeof(val.id))
return RecordArrayType([numba.typeof(x) for x in val.fields()], val.lookup, val.reverselookup, numba.typeof(val.id))

@numba.extending.typeof_impl.register(awkward1.layout.Record)
def typeof(val, c):
Expand Down
1 change: 1 addition & 0 deletions awkward1/behavior/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE
67 changes: 67 additions & 0 deletions awkward1/behavior/string.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

import codecs

import numpy

import awkward1.highlevel

class CharBehavior(awkward1.highlevel.Array):
@staticmethod
def typestr(baretype, parameters):
encoding = parameters.get("encoding")
if encoding is None:
return "char"
elif codecs.getdecoder(encoding) is codecs.getdecoder("utf-8"):
return "utf8"
else:
return "encoded[{0}]".format(repr(encoding))

def __bytes__(self):
return numpy.asarray(self.layout).tostring()

def __str__(self):
encoding = self.type.nolength().parameters.get("encoding")
if encoding is None:
return str(self.__bytes__())
else:
return self.__bytes__().decode(encoding)

def __repr__(self):
encoding = self.type.nolength().parameters.get("encoding")
if encoding is None:
return repr(self.__bytes__())
else:
return repr(self.__bytes__().decode(encoding))

def __iter__(self):
for x in str(self):
yield x

class StringBehavior(awkward1.highlevel.Array):
@staticmethod
def typestr(baretype, parameters):
encoding = baretype.inner().parameters.get("encoding")
if encoding is None:
return "bytes"
elif codecs.getdecoder(encoding) is codecs.getdecoder("utf-8"):
return "string"
else:
return "string[{0}]".format(repr(encoding))

def __iter__(self):
if self.type.nolength().inner().parameters.get("encoding") is None:
for x in super(StringBehavior, self).__iter__():
yield x.__bytes__()
else:
for x in super(StringBehavior, self).__iter__():
yield x.__str__()

def __eq__(self, other):
raise NotImplementedError("return one boolean per string, not lists of booleans per character")

char = awkward1.layout.DressedType(awkward1.layout.PrimitiveType("uint8"), CharBehavior)
utf8 = awkward1.layout.DressedType(awkward1.layout.PrimitiveType("uint8"), CharBehavior, encoding="utf-8")

bytestring = awkward1.layout.DressedType(awkward1.layout.ListType(char), StringBehavior)
string = awkward1.layout.DressedType(awkward1.layout.ListType(utf8), StringBehavior)
Loading

0 comments on commit 95a0512

Please sign in to comment.