-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A qc.matrix design proposal #114
Comments
My thoughts (mostly the same as when we first talked about this @tbenthompson):
Overall:
|
Thanks Marc!! First, an alternate proposal: don't change anything but make it clear which methods are "supported" and which are simply accidentally inherited from their parent classes (np.ndarray and scipy.sparse.csr_matrix). I would be pretty happy with this option! In some ways, I prefer it. This alternate proposal is also the least amount of work. On to the original proposal.
My basic logic is that this will:
I'll expand on both these points. Removing API inconsistencies Smaller API You also asked:
Yes, that or a StandardizedMatrix. The overhead is very low. I just measured it and it seems to be on the order of 100 microseconds. We could probably make that smaller, but it's very small already. |
I would rather ditch inherited methods than preserve an inconsistent API. Nevertheless, I think it's reasonable at this point to just clarify which methods are officially supported (i.e. changes will first prompt a deprecation warning and imply a major update) and which aren't (i.e. changes may occur without warning). I wouldn't mind if the |
A qc.matrix design proposal:
I've been uncomfortable with the qc.matrix direction lately. This morning, I've thinking API and some of the recent feature addition/API consistency changes. I think the API surface area for qc.matrix is way too large. I don't want to create a huge library at the moment. If we do want to do that, I think we're several months away from being able to release the library.
I'd like to propose that we adopt a design goal for qc.matrix: the project is built as a backend for matrix operations in an algorithmic setting and should not be used for constructing or manipulating your data matrix -- similar in spirit to a LightGBM Dataset object. In light of that, I'd like to propose that the public API consist of two subsets:
from_pandas
as the main entrypoint. This is probably the most common way for users to touch qc.matrix. Most users will never use anything else.from_sparse
andfrom_dense
entrypoints based on thecsc_to_split
function.matvec
,transpose_matvec
,sandwich
andstandardize
methods ofSplitMatrix
andStandardizedMatrix
. These are the features used by quantcore.glm.DenseMatrix
,SparseMatrix
,CategoricalMatrix
should not be used directly.Overall the goal here is to minimize the API surface area to a total of three free functions, two classes and four methods on those classes. Behavior would be consistent and clear. I want to provide a small number of operations that are each very powerful. Deep systems with a small surface area are useful, easy to understand and easy to maintain.
If the above proposal has everyone's support, then a lot of our current issues are actually no longer needed:
__getitem__
#101: I'd vote that we explicitly not support getitem.__matmul__
for all the classes? Should we define the more basic__mul__
and__rmul__
?" --> I would say "no and no". If we have only SplitMatrix as outward-facing API, then these methods will not exist and will naturally throw errors. That's great.I think this will be a fairly quick modification/refactoring and will make the release process much less stressful.
The text was updated successfully, but these errors were encountered: