Compressed Table Representation #19

kasimebrahim · 2018-10-25T08:50:11Z

OverView
Data representation has been discussed a number of times opencog#12, opencog#14. This will merely conceder the idea of compressed table, In order to avoid re-execution of a program over redundant inputs.

Loose Hypothesis
To store a matrix in atomspace 'opencog/matrix' uses a mechanism that looks like Dictionary Of Keys(DOK) representation for sparse matrix. And provides sets of elementary operations to access the stored Atoms as a matrix giving an abstraction of the portion of the atomspace as a matrix.
For example a table might be stored in DOK as:

 
    _____________
    |w1 |w2 |w3 |
----+---+---+----
|w1 |0  |1  |0  |    => (w1, w2)=1
----+---+---+----
|w2 |1  |0  |1  |    => (w2, w1)=1 & (w2, w3)=1
----+---+---+----
|w3 |0  |1  |0  |    => (w3, w2)=1
_________________

And For a typical dataset

_____________
|f1 |f2 |out|
----+---+----
|1.5|F  |T  |
----+---+----
|1.5|F  |F  |
----+---+----
|1  |T  |T  |
_____________

We can use the atomese bellow to represent the redundant two rows(row 1 & 2).

// (row1, f1) and (row2, f1) are squashed
EvaluationLink (count=2, value=1.5)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Schema "f1"

// (row1, f2) and (row2, f2) are squashed
EvaluationLink (count=2, value=F)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "f2"

// (row1, out) and (row2, out) are squashed
EvaluationLink (count=1, value={T,F})
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "out"

The next thing we need is a matrix-API that can read the above atomese as a matrix.
An example API [based on opencog/matrix] would contain:-

left_type -> row type
right_type -> column type
pair_type -> EvaluationLink in the above example

Matrix_API (Type left_type, Type right_type, Type pair_type, string identifier);

Matrix_API::get_value (Hanlde pair); // we can overload for (left, right)
Matrix_API::get_count (Handle pair);
Matrix_API::get_left (Handle pair);
Matrix_API::get_right (Handle pair);
Matrix_API::get_pair (Handle left, Handle right);
Matrix_API::get_values_of_row (Handle left);
Matrix_API::get_values_of_column (Handle right)
Matrix_API::make_pair (Handle left, Handle right);
Matrix_API::fetch_all_pairs (Handle identifier);

// and we might want to add more

And guys forgive me if I got it all wrong!

ngeiswei · 2018-10-25T13:12:20Z

Interesting. This representation has the side effect that input features can also represent distributions.

Question: how would the squashed out feature represent the following

_____________
|f1 |f2 |out|
----+---+----
|1.5|F  |T  |
----+---+----
|1.5|F  |F  |
----+---+----
|1.5|F  |T  |
_____________

?

Also, what feature in opencog/matrix are missing in order to allow such representation (sorry I haven't studied opencog/matrix in depth, hopefully I can do that soon)?

kasimebrahim · 2018-10-25T13:59:18Z

I am not sure I get every thing either. And the squashing thing is not discussed in opencog/matrix. It just encourages to add more values to the pair. And hopefully we can exploit that to compress redundancies.
For the table above, one way to squash it would be to have a value and a corresponding count.

// the out puts
EvaluationLink (count={2, 1}, value={T,F})
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "out"
 // the inputs
EvaluationLink (count=3, value=F)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "f2"
EvaluationLink (count=3, value=1.5)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Schema "f1"

It is a bit naive and if it is going to work it is going to requires a lot of cleaning.

ngeiswei · 2018-10-26T05:35:45Z

I see...

I wonder about the following

// the out puts
EvaluationLink (count={2, 1}, value={T,F})
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "out"
 // the inputs
EvaluationLink (value=F)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Predicate "f2"
EvaluationLink (value=1.5)
    Predicate "|identifier|"
    ListLink
        Schema "r1" 
        Schema "f1"

which would even be closer to CTable representation. Not sure how well it fits with the matrix framework though (still need to look into it).

kasimebrahim added enhancement New feature or request help wanted Extra attention is needed labels Oct 25, 2018

Eman22S mentioned this issue Dec 10, 2019

Correcting AS-MOSES outputs:Approach Suggestion #115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressed Table Representation #19

Compressed Table Representation #19

kasimebrahim commented Oct 25, 2018 •

edited by ngeiswei

Loading

ngeiswei commented Oct 25, 2018

kasimebrahim commented Oct 25, 2018

ngeiswei commented Oct 26, 2018

Compressed Table Representation #19

Compressed Table Representation #19

Comments

kasimebrahim commented Oct 25, 2018 • edited by ngeiswei Loading

ngeiswei commented Oct 25, 2018

kasimebrahim commented Oct 25, 2018

ngeiswei commented Oct 26, 2018

kasimebrahim commented Oct 25, 2018 •

edited by ngeiswei

Loading