Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Robustness of tICA with atomPairs #3

Open
kyleabeauchamp opened this issue Feb 23, 2014 · 13 comments
Open

Test Robustness of tICA with atomPairs #3

kyleabeauchamp opened this issue Feb 23, 2014 · 13 comments

Comments

@kyleabeauchamp
Copy link
Collaborator

So I did simple experiment where I grabbed random selections of atom pairs, then calculated the tica eigenvectors. They're pretty robust even with a small subset of atom pairs. This suggests that we could probably do something very affordable here and get very reproducible results.

Here is the driver code:

import itertools
import numpy as np
import mdtraj as md
import msmbuilder as msmb

trj0 = md.load("./system.subset.pdb")
trajectories = [md.load("./Trajectories/trj%d.h5" % i) for i in range(5)]

atom_indices = np.arange(trj0.n_atoms)
pair_indices = np.array(list(itertools.combinations(atom_indices, 2)))

n_choose = 1000
n_trials = 10
for i in range(n_trials):
    pair_indices_subset = pair_indices[np.random.choice(len(pair_indices), n_choose, replace=False)]
    metric = msmb.metrics.AtomPairs(atom_pairs=pair_indices_subset)
    tica = msmb.reduce.tICA(1, prep_metric=metric)
    for trj in trajectories:
        tica.train(trajectory=trj)
    tica.solve()
    print tica.vals[0:5]
@kyleabeauchamp
Copy link
Collaborator Author

Here is the output with n_choose = 1000:

[ 0.99846791  0.99703695  0.9949128   0.99367476  0.99108845]
[ 0.99873218  0.99752497  0.99571222  0.99401903  0.99208341]
[ 0.99866843  0.99694094  0.99412184  0.99386481  0.99124987]
[ 0.99869627  0.99747165  0.99560134  0.99424324  0.99198322]
[ 0.99863268  0.99761802  0.99451782  0.99385544  0.99209675]
[ 0.99858944  0.99738642  0.99437492  0.99334386  0.99190517]
[ 0.99879572  0.9973424   0.99469924  0.99371557  0.99128765]
[ 0.99842968  0.99708175  0.99483048  0.99432551  0.99139395]
[ 0.9985902   0.99716386  0.99439764  0.99396922  0.99203883]
[ 0.99873333  0.99725475  0.99410783  0.99406397  0.99165713]

@kyleabeauchamp
Copy link
Collaborator Author

So obviously the eigenvectors are a better test, but I think this is probably a good heuristic for now.

@kyleabeauchamp
Copy link
Collaborator Author

In terms of timescales, it looks like uncertainty is closer to 25% or so, which is quite reasonable.

@kyleabeauchamp
Copy link
Collaborator Author

Here's the output for n_choose = 2000. There's one slower timescale, but again the results are pretty converged with respect to sampling the different sets of atom pairs.

[ 0.99944528  0.99863409  0.99792424  0.99704434  0.99560072]
[ 0.99947721  0.99869917  0.9973045   0.99696457  0.99604443]
[ 0.99947978  0.99865651  0.99761327  0.99720337  0.99635589]
[ 0.99942784  0.99867257  0.99760534  0.99710405  0.99591845]

@jchodera
Copy link
Member

What are the vals that are printed here? Are these tICA singular values (eigenvalues)?

Can you compute a "score" from the sum of these? How many degrees of freedom would you practically use?

@kyleabeauchamp
Copy link
Collaborator Author

Eigenvalues

@kyleabeauchamp
Copy link
Collaborator Author

In fact, we could probably do even better by being smart about atom selection--only using heavy atoms, for example.

@jchodera
Copy link
Member

This looks very promising!

@jchodera
Copy link
Member

Another idea: From the 1000 distances you randomly select, the ones with the highest tICA projection magnitudes could be retained and the lowest ones discarded to try other random distances. A few iterations of this might "enrich" the importance of the subset of distances.

@kyleabeauchamp
Copy link
Collaborator Author

We could also try products of our features--nonlinear kernel tica...
On Feb 23, 2014 8:48 AM, "John Chodera" [email protected] wrote:

Another idea: From the 1000 distances you randomly select, the ones with
the highest tICA projection magnitudes could be retained and the lowest
ones discarded to try other random distances. A few iterations of this
might "enrich" the importance of the subset of distances.

Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-35832054
.

@jchodera
Copy link
Member

We could also try products of our features--nonlinear kernel tica...

Is there a paper about this yet?

@kyleabeauchamp
Copy link
Collaborator Author

No, haven't heard from Christian in a while.

@kyleabeauchamp
Copy link
Collaborator Author

So I guess I should say "not sure"...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants