Start the high-level layer: awkward.Array. (#28)

There is now a high-level layer, assignable and dressable types, and `StringBehavior` as a leading example. * Start PR #28 (skipping issue number). * All Content arrays get a 'type' parameter to override the bare type. * The 'type' can be assigned; placeholders for all checks. * Added 'FIXME' to all the places where Type::none() might need to be replaced. * REMOVED *Type::compatible because that will be checked against arrays, not other types. * Stub for tests. * Start developing the high-level Array. * awkward.Array.__repr__ * Corrected awkward.Array.__repr__ * Completed awkward.Array.__str__ and __repr__; they fit within GitHub and StackOverflow text boxes without scrolling. * Drop dangling '[' and ']' at the level of tokens, not characters. * String will be the first example of type-dressing. * Types have 'parameters' (dict) in Python. * Use JSON to ingest Records in __repr__ tests to avoid random key order in old versions of Python. * Developing dressed types; need to set visibility to 'hidden'. Is that a problem for MacOS? * Suppose all static libraries have hidden symbols. Is MacOS happy? * Try again: CMP0063 was preventing the visibility preset from taking effect. * Print out the CMake version to diagnose MacOS. * Explicit EXPORT_SYMBOL for the symbols that should be exported. * Declare shared_ptr containers for Slice and Iterator, to see if that affects their visibility issues in MacOS (*Type are now fine, which suggests the difference). * The new (smaller) set of symbol errors were all associated with a C++ test; applying the same visibility=hidden to the C++ tests.. * Finally back to DressedTypes: PyDress and PyDressParameters both compile. * DressedType can be constructed, but __repr__ doesn't work. * DressedType constructor, string representation, and equality-checking works. * DressedType constructor, string representation, and equality-checking works. * Type representations appear as 'string' and 'bytes'. * To go further, we must fix a few Type::none()s. * All *::shallow_copy now put type_ in the type slot. * Fix tests before major change. * [skip ci] Broke compilation; switching to laptop. * [skip ci] Compiles up to pyawkward.cpp. * [skip ci] Closer, but pyawkward.cpp still doesn't compile. * Compiles and tests pass. * Moved id_ and innertype_ up to Content (protected). * Replacing 'innertype' with 'type'. * Fixed compilation bug. * Should compile and pass tests now. * Single 'innertype' method for bare and dressed cases. * Correct wrapping and string representation. * Special case for Python 2.7 (string representations). * More fixes for Python 2.7. * Now fix the non-Python 2.7. * *::getitem_range, *::carry, and NumpyArray::contiguous* do not change type or lose type, but Content's creation of RegularArrays creates new ones without type. * Renamed Record* methods as a first step toward making some of them universal. * The 'numfields', 'fieldindex', 'key', 'haskey', 'keyaliases', and 'keys' methods are now defined for all Content types. * The 'numfields', 'fieldindex', 'key', 'haskey', 'keyaliases', and 'keys' methods are now defined for all Type classes. * *::getitem_field always drops Type. *::getitem_fields keeps Type if the keys retained are within the keys of that Type. * Filled in all FIXMEs for Type transformations. NumpyArray never changes Type because it is wrapped by the appropriate number of RegularTypes when the Type is queried. * Defined *Type::level, which drops DressedType and OptionType (through UnionTypes), but keeps everything else at the same level. * Defined *Type::shallow_equal. * Tested List*Array::accepts and RegularArray::accepts. * Finished up to but NOT INCLUDING RecordArray::accepts. * Finished implementing *Array::accepts, but haven't tested them all. * *Array::content and RecordArray::field should return a reference so that the user can modify its 'id' and 'type' more naturally. * Implemented all propagations down in *Array::settype (NOT TESTED). * Fixed 32-bit warning with clearer code. * Start tests for type propagation. * Strings work; this may finish off the PR. * Clean up test. Drop Python 3.4 support because Azure did. * Renaming and improved test_string2. * Fixed Python 2.7. * Try to fix the lack of 'pytest' on Windows. * Numba is not available in some parts of the testing matrix.
scikit-hep · Dec 6, 2019 · 95a0512 · 95a0512
1 parent 8581bdf
commit 95a0512
Show file tree

Hide file tree

Showing 68 changed files with 2,608 additions and 977 deletions.
diff --git a/.ci/azure-buildtest-awkward.yml b/.ci/azure-buildtest-awkward.yml
@@ -69,11 +69,12 @@ jobs:
  python -m pip install -r requirements.txt
  python -m pip install -r requirements-test.txt
  python -c "import numpy; print(numpy.__version__)"
+ python -c "import pytest; print(pytest.__version__)"
  displayName: "Install"
 
  - script: |
  python setup.py build
- pytest -vv tests
+ python -m pytest -vv tests
  displayName: "Build and test"
 
  - job: MacOS
@@ -119,11 +120,12 @@ jobs:
  python -m pip install -r requirements.txt
  python -m pip install -r requirements-test.txt
  python -c "import numpy; print(numpy.__version__)"
+ python -c "import pytest; print(pytest.__version__)"
  displayName: "Install"
 
  - script: |
  python setup.py build
- pytest -vv tests
+ python -m pytest -vv tests
  displayName: "Build and test"
 
  - job: Linux
@@ -141,10 +143,6 @@ jobs:
  python.version: "2.7"
  python.architecture: "x64"
  numpy.version: "1.16.5"
- "py34-np13":
- python.version: "3.4"
- python.architecture: "x64"
- numpy.version: "1.13.1"
  "py35-np13":
  python.version: "3.5"
  python.architecture: "x64"
@@ -182,9 +180,10 @@ jobs:
  fi
  python -m pip install -r requirements-test.txt
  python -c "import numpy; print(numpy.__version__)"
+ python -c "import pytest; print(pytest.__version__)"
  displayName: "Install"
 
  - script: |
  python setup.py build
- pytest -vv tests
+ python -m pytest -vv tests
  displayName: "Build and test"
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -1,7 +1,9 @@
 # BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE
 
 cmake_minimum_required(VERSION 2.8.12.2)
+message("-- CMake version " ${CMAKE_VERSION})
 project (awkward)
+cmake_policy(SET CMP0063 NEW)
 
 set(CMAKE_CXX_STANDARD 11)
 set(CMAKE_CXX_STANDARD_REQUIRED ON)
@@ -30,23 +32,30 @@ enable_testing()
 macro(addtest name filename)
  add_executable(${name} ${filename})
  target_link_libraries(${name} PRIVATE awkward-static awkward-cpu-kernels-static)
+ set_target_properties(${name} PROPERTIES CXX_VISIBILITY_PRESET hidden)
  add_test(${name} ${name})
 endmacro(addtest)
 
 add_library(awkward-cpu-kernels-objects OBJECT ${CPU_KERNEL_SOURCES})
 set_target_properties(awkward-cpu-kernels-objects PROPERTIES POSITION_INDEPENDENT_CODE 1)
 add_library(awkward-cpu-kernels-static STATIC $<TARGET_OBJECTS:awkward-cpu-kernels-objects>)
 add_library(awkward-cpu-kernels SHARED $<TARGET_OBJECTS:awkward-cpu-kernels-objects>)
+set_target_properties(awkward-cpu-kernels-objects PROPERTIES CXX_VISIBILITY_PRESET hidden)
+set_target_properties(awkward-cpu-kernels-static PROPERTIES CXX_VISIBILITY_PRESET hidden)
+set_target_properties(awkward-cpu-kernels PROPERTIES CXX_VISIBILITY_PRESET hidden)
 
 add_library(awkward-objects OBJECT ${LIBAWKWARD_SOURCES})
 set_target_properties(awkward-objects PROPERTIES POSITION_INDEPENDENT_CODE 1)
 add_library(awkward-static STATIC $<TARGET_OBJECTS:awkward-objects>)
 add_library(awkward SHARED $<TARGET_OBJECTS:awkward-objects>)
 target_link_libraries(awkward-static PRIVATE awkward-cpu-kernels-static)
 target_link_libraries(awkward PRIVATE awkward-cpu-kernels-static)
+set_target_properties(awkward-objects PROPERTIES CXX_VISIBILITY_PRESET hidden)
+set_target_properties(awkward-static PROPERTIES CXX_VISIBILITY_PRESET hidden)
+set_target_properties(awkward PROPERTIES CXX_VISIBILITY_PRESET hidden)
 addtest(PR016 tests/test_PR016_finish_getitem_for_rawarray.cpp)
 addtest(PR019 tests/test_PR019_use_json_library.cpp)
 
 pybind11_add_module(layout src/pyawkward.cpp)
-set_target_properties(layout PROPERTIES CXX_VISIBILITY_PRESET default)
+set_target_properties(layout PROPERTIES CXX_VISIBILITY_PRESET hidden)
 target_link_libraries(layout PRIVATE awkward-static)
diff --git a/README.md b/README.md
@@ -65,7 +65,6 @@ Completed items are ☑check-marked. See [closed PRs](https://github.com/scikit-
  * [X] Test all (tested in mock [studies/fillable.py](tree/master/studies/fillable.py)).
  * [X] JSON → Awkward via header-only [RapidJSON](https://rapidjson.org) and `awkward.fromiter`.
  * [ ] Explicit broadcasting functions for jagged and non-jagged arrays and scalars.
- * [ ] ~~Structure-preserving ufunc-like operation on the C++ side that applies a lambda function to inner data. The Python `__array_ufunc__` implementation will _call_ this to preserve structure.~~
  * [ ] Extend `__getitem__` to take jagged arrays of integers and booleans (same behavior as old).
  * [ ] Full suite of array types:
  * [X] `EmptyArray`: 1-dimensional array with length 0 and unknown type (result of `UnknownFillable`, compatible with all types of arrays).
@@ -85,21 +84,19 @@ Completed items are ☑check-marked. See [closed PRs](https://github.com/scikit-
  * [ ] `ChunkedArray`: same as the old version, except that the type is a union if chunks conflict, not an error, and knowledge of all chunk sizes is always required. (Maybe `AmorphousChunkedArray` would fill that role.)
  * [ ] `RegularChunkedArray`: like a `ChunkedArray`, but all chunks are known to have the same size.
  * [ ] `VirtualArray`: same as the old version, including caching, but taking C++11 lambda functions for materialization, get-cache, and put-cache. The pybind11 layer will connect this to Python callables.
- * [ ] Derived classes with ufunc-defined `Methods` and Numba extensions:
- * [ ] `StringArray`: a `ListArray`/`ListOffsetArray` of characters with special methods and an optional encoding.
  * [ ] `PyVirtualArray`: takes a Python lambda (which gets carried into `VirtualArray`).
  * [ ] `PyObjectArray`: same as the old version.
  * [X] Describe high-level types using [datashape](https://datashape.readthedocs.io/en/latest/) and possibly also an in-house schema. (Emit datashape _strings_ from C++.)
- * [ ] Type compatibility: option to treat nonexistent record fields as nullable data.
  * [ ] Describe mid-level "persistence types" with no lengths, somewhat minimal JSON, optional dtypes/compression.
  * [ ] Describe low-level layouts independently of filled arrays (JSON or something)?
- * [ ] Layer 1 interface `Array`:
+ * [X] Layer 1 interface `Array`:
  * [ ] Pass through to the layout classes in Python and Numba.
  * [ ] Pass through Numpy ufuncs using [NEP 13](https://www.numpy.org/neps/nep-0013-ufunc-overrides.html) (as before).
  * [ ] Pass through other Numpy functions using [NEP 18](https://www.numpy.org/neps/nep-0018-array-function-protocol.html) (this would be new).
  * [ ] `RecordArray` fields (not called "columns" anymore) through Layer 1 `__getattr__`.
  * [ ] Special Layer 1 `Record` type for `RecordArray` elements, supporting some methods and a visual representation based on `Identity` if available, all fields if `recordtype == "tuple"`, or the first field otherwise.
- * [ ] Mechanism for adding user-defined `Methods` like `LorentzVector`, as before, but only on Layer 1.
+ * [X] Mechanism for adding user-defined `Methods` like `LorentzVector`, as before, but only on Layer 1.
+ * [X] High-level classes for characters and strings.
  * [ ] Inerhit from Pandas so that all Layer 1 arrays can be DataFrame columns.
  * [ ] Full suite of operations:
  * [X] `awkward.tolist`: same as before.

diff --git a/VERSION_INFO b/VERSION_INFO
@@ -1 +1 @@
-0.1.26
+0.1.28
diff --git a/awkward1/__init__.py b/awkward1/__init__.py
@@ -2,8 +2,13 @@
 
 import awkward1.layout
 import awkward1._numba
+import awkward1.highlevel
+from awkward1.highlevel import Array
+from awkward1.highlevel import Record
 
 from awkward1.operations.convert import *
 from awkward1.operations.describe import *
 
+from awkward1.behavior.string import *
+
 __version__ = awkward1.layout.__version__
diff --git a/awkward1/_numba/array/recordarray.py b/awkward1/_numba/array/recordarray.py
@@ -11,7 +11,7 @@
 
 @numba.extending.typeof_impl.register(awkward1.layout.RecordArray)
 def typeof(val, c):
- return RecordArrayType([numba.typeof(x) for x in val.values()], val.lookup, val.reverselookup, numba.typeof(val.id))
+ return RecordArrayType([numba.typeof(x) for x in val.fields()], val.lookup, val.reverselookup, numba.typeof(val.id))
 
 @numba.extending.typeof_impl.register(awkward1.layout.Record)
 def typeof(val, c):

diff --git a/awkward1/behavior/__init__.py b/awkward1/behavior/__init__.py
@@ -0,0 +1 @@
+# BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE
diff --git a/awkward1/behavior/string.py b/awkward1/behavior/string.py
@@ -0,0 +1,67 @@
+# BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE
+
+import codecs
+
+import numpy
+
+import awkward1.highlevel
+
+class CharBehavior(awkward1.highlevel.Array):
+ @staticmethod
+ def typestr(baretype, parameters):
+ encoding = parameters.get("encoding")
+ if encoding is None:
+ return "char"
+ elif codecs.getdecoder(encoding) is codecs.getdecoder("utf-8"):
+ return "utf8"
+ else:
+ return "encoded[{0}]".format(repr(encoding))
+
+ def __bytes__(self):
+ return numpy.asarray(self.layout).tostring()
+
+ def __str__(self):
+ encoding = self.type.nolength().parameters.get("encoding")
+ if encoding is None:
+ return str(self.__bytes__())
+ else:
+ return self.__bytes__().decode(encoding)
+
+ def __repr__(self):
+ encoding = self.type.nolength().parameters.get("encoding")
+ if encoding is None:
+ return repr(self.__bytes__())
+ else:
+ return repr(self.__bytes__().decode(encoding))
+
+ def __iter__(self):
+ for x in str(self):
+ yield x
+
+class StringBehavior(awkward1.highlevel.Array):
+ @staticmethod
+ def typestr(baretype, parameters):
+ encoding = baretype.inner().parameters.get("encoding")
+ if encoding is None:
+ return "bytes"
+ elif codecs.getdecoder(encoding) is codecs.getdecoder("utf-8"):
+ return "string"
+ else:
+ return "string[{0}]".format(repr(encoding))
+
+ def __iter__(self):
+ if self.type.nolength().inner().parameters.get("encoding") is None:
+ for x in super(StringBehavior, self).__iter__():
+ yield x.__bytes__()
+ else:
+ for x in super(StringBehavior, self).__iter__():
+ yield x.__str__()
+
+ def __eq__(self, other):
+ raise NotImplementedError("return one boolean per string, not lists of booleans per character")
+
+char = awkward1.layout.DressedType(awkward1.layout.PrimitiveType("uint8"), CharBehavior)
+utf8 = awkward1.layout.DressedType(awkward1.layout.PrimitiveType("uint8"), CharBehavior, encoding="utf-8")
+
+bytestring = awkward1.layout.DressedType(awkward1.layout.ListType(char), StringBehavior)
+string = awkward1.layout.DressedType(awkward1.layout.ListType(utf8), StringBehavior)