Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressed Table Representation #19

Open
kasimebrahim opened this issue Oct 25, 2018 · 3 comments
Open

Compressed Table Representation #19

kasimebrahim opened this issue Oct 25, 2018 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@kasimebrahim
Copy link
Collaborator

kasimebrahim commented Oct 25, 2018

OverView
Data representation has been discussed a number of times opencog#12, opencog#14. This will merely conceder the idea of compressed table, In order to avoid re-execution of a program over redundant inputs.

Loose Hypothesis
To store a matrix in atomspace 'opencog/matrix' uses a mechanism that looks like Dictionary Of Keys(DOK) representation for sparse matrix. And provides sets of elementary operations to access the stored Atoms as a matrix giving an abstraction of the portion of the atomspace as a matrix.
For example a table might be stored in DOK as:

 
    _____________
    |w1 |w2 |w3 |
----+---+---+----
|w1 |0  |1  |0  |    => (w1, w2)=1
----+---+---+----
|w2 |1  |0  |1  |    => (w2, w1)=1 & (w2, w3)=1
----+---+---+----
|w3 |0  |1  |0  |    => (w3, w2)=1
_________________

And For a typical dataset

_____________
|f1 |f2 |out|
----+---+----
|1.5|F  |T  |
----+---+----
|1.5|F  |F  |
----+---+----
|1  |T  |T  |
_____________

We can use the atomese bellow to represent the redundant two rows(row 1 & 2).

// (row1, f1) and (row2, f1) are squashed
EvaluationLink (count=2, value=1.5)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Schema "f1"

// (row1, f2) and (row2, f2) are squashed
EvaluationLink (count=2, value=F)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "f2"

// (row1, out) and (row2, out) are squashed
EvaluationLink (count=1, value={T,F})
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "out"

The next thing we need is a matrix-API that can read the above atomese as a matrix.
An example API [based on opencog/matrix] would contain:-

left_type -> row type
right_type -> column type
pair_type -> EvaluationLink in the above example

Matrix_API (Type left_type, Type right_type, Type pair_type, string identifier);

Matrix_API::get_value (Hanlde pair); // we can overload for (left, right)
Matrix_API::get_count (Handle pair);
Matrix_API::get_left (Handle pair);
Matrix_API::get_right (Handle pair);
Matrix_API::get_pair (Handle left, Handle right);
Matrix_API::get_values_of_row (Handle left);
Matrix_API::get_values_of_column (Handle right)
Matrix_API::make_pair (Handle left, Handle right);
Matrix_API::fetch_all_pairs (Handle identifier);

// and we might want to add more

And guys forgive me if I got it all wrong!

@kasimebrahim kasimebrahim added enhancement New feature or request help wanted Extra attention is needed labels Oct 25, 2018
@ngeiswei
Copy link
Member

Interesting. This representation has the side effect that input features can also represent distributions.

Question: how would the squashed out feature represent the following

_____________
|f1 |f2 |out|
----+---+----
|1.5|F  |T  |
----+---+----
|1.5|F  |F  |
----+---+----
|1.5|F  |T  |
_____________

?

Also, what feature in opencog/matrix are missing in order to allow such representation (sorry I haven't studied opencog/matrix in depth, hopefully I can do that soon)?

@kasimebrahim
Copy link
Collaborator Author

I am not sure I get every thing either. And the squashing thing is not discussed in opencog/matrix. It just encourages to add more values to the pair. And hopefully we can exploit that to compress redundancies.
For the table above, one way to squash it would be to have a value and a corresponding count.

// the out puts
EvaluationLink (count={2, 1}, value={T,F})
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "out"
 // the inputs
EvaluationLink (count=3, value=F)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "f2"
EvaluationLink (count=3, value=1.5)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Schema "f1"

It is a bit naive and if it is going to work it is going to requires a lot of cleaning.

@ngeiswei
Copy link
Member

I see...

I wonder about the following

// the out puts
EvaluationLink (count={2, 1}, value={T,F})
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "out"
 // the inputs
EvaluationLink (value=F)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "f2"
EvaluationLink (value=1.5)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Schema "f1"

which would even be closer to CTable representation. Not sure how well it fits with the matrix framework though (still need to look into it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants