Skip to content
This repository has been archived by the owner on Aug 15, 2022. It is now read-only.

direction of future architecture #44

Open
zrisher opened this issue Nov 17, 2014 · 2 comments
Open

direction of future architecture #44

zrisher opened this issue Nov 17, 2014 · 2 comments

Comments

@zrisher
Copy link

zrisher commented Nov 17, 2014

Hi all,

I came upon Asari when looking for the best Rails gem for the best search solution given my architecture. Given the trend towards cloud compute, AWS's dominance, the amount of work they're putting into their platform, and Asari's position as the only gem compatiable with the newest API, I think this gem is well-positioned for a serious amount of interest over the next few years.

Compared to the most popular Rails search gems out there, Asari is missing a lot of functionality. The big name here has always been Sunspot, an adapter for Solr. Some things Sunspot can do that Asari can't:

  • All query options - Faceting, Native Geo, Stats, and More Like This. Some of these things have been added in Asari's forks, but not to the same level of usability as Sunspot
  • A block-oriented search function. This allows for better error checking, the ability to nest arbitrary ruby logic within the search block, and greater readability especially with nested logic
  • Various thread-safe, retrying, instrumented, logged, and asynchronous (with sunspot_index_queue) sessions
  • Config generators and indexing rake tasks
  • ActiveRecord objects attached to results retrieved through a single DB request
  • A local server version for development and testing - obviously Sunspot accomplishes this with a managed local Solr install, but I think it's worth considering integrating sunspot-solr as this feature is HUGE for saving development time, simplifying tests, and improving effective coverage. The AWS SDKs provide good stubbing ability though, if we chose to go down that route instead.
  • Schema configuration provided by class method on Searchable classes - schema file works too for many people I'm sure, but an in-object config function some may find to be a more logical ordering, provides more flexibility for index-related config functions (esp if adapted to a simpler interface than just knowing what's going on in CloudSearch), and can be adapted at runtime via meta programming

Ransack, which works with native DB search functions, has seen a lot of nice additions recently too - it has some nice form helpers (with translations) we may want to include.

So, establishing that there's a lot of functionality left to build and that hopefully a big user base will grow around this use case, how should we build this?

In my mind, the easiest solution would be to get Sunspot working directly with CloudSearch. That way we could get all this functionality without any of the coding or maintenance.

The question has been asked a few places - will AWS ever expose the native Solr API? If so, Sunspot will work out of the box. Something tells me this will never happen - CloudSearch provides a large amount of abstraction on top of Solr, i.e. the support for array fields and the simplified option set. I have a support request in to their team to see if they'll speak to current plans. I'll bring any replies to this thread.

Another option would be creating a new gem, say sunspot_cloudsearch, that monkey-patches Sunspot to work with CloudSearch. Most of the actual client functionality comes from RSolr which could be re-implemented using existing Asari code. (That said, I think any future implementation should use the AWS SDK V2 client instead - it supports every single operation we could want and takes care of all the HTTP stuff.) Once that's done, some patches to the Sunspot core could get much or all of the above working.

A final option would be copying the current code for most of this from Sunspot and building it into this repo. So far, this project has kept it super simple, which is great. Including all of the above makes things significantly more complex. Depending on how it's accomplished (and if the maintainers are interested in taking this next step), perhaps this should all be done in a separate gem.

After a month of taking a stab at extending Asari myself, I need to get back to tactical work. But I'd like to wrap up what I've done so far in a way that demonstrates where to go next. So I wanted to ask the question:

How do the maintainers of Asari envision it growing in the future? What's the architecture?

Sorry for the long explanation, I'm hoping to get schooled by someone who's spent a lot of time researching this problem.

@lgleasain
Copy link
Collaborator

This is a tough one. If Amazon ever makes Cloud Search expose RSolr then the answer will be to go fold it into Sunspot. As far as extending Sunspot to work with Cloudsearch goes...... When Tommy and I started working on this, Amazon wasn't using the Lucene backend. Now it is, via a very proprietary api. Could it be make to work with Sunspot? In theory yes. For us the question is, is it worth it? There are not a lot of Ruby projects that use Cloudsearch. Because it is proprietary the consensus I have had talking with a lot of people is that it gives you a limited audience. That's not to say that for some it is very useful.

I would welcome help to add in more functionality to Asari and to fill in some of these gaps. With that being said if you wanted to go down the road of making Sunspot work with Cloudsearch I would probably go with a new gem. One challenge you would have no matter what is with the local dev/test environment. I looked into this over a year ago and came to the conclusion that I didn't want to deal with maintaining a server that may or may not run the same as my production system. Given that you have to go through the Amazon API, you would probably have to mock this with a Sunspot solution to make sure that you are developing with the same feature set you are going to be deploying with.

@zrisher
Copy link
Author

zrisher commented Nov 19, 2014

Thanks @lgleasain. You're right, it's a pretty proprietary API and it's unfortunate they've decided to architect it this way. As an abstract, long-term solution I can see how that choice made the most sense for them. They're banking on communities creating their own solutions to plug into this API, which sucks for us, and it means that interest in these tools will probably only ever be as large as what can be supported by existing tools.

I think the question of "can we easily replace Sunspot's Solr reliance with a CloudSearch adapter, and can that adapter's functionality be replicated by Solr in test and dev?" will take a lot of time digging through Sunspot's code and comparing the search tool external interfaces in depth. Do you remember exactly what functionality CloudSearch can do that Solr can't? Looking at this blog, every feature they mention being different has converged by now. The one difference I know for sure (anecdotally through experimentation) is that CloudSearch has fulltext-searchable Text Arrays, whereas Solr deals in literal String Arrays only.

Getting to the answer of that question will tell us whether it's worth coupling the logic to that degree. If we can't mock CloudSearch using Solr through Sunspot, then it's probably best just to copy in most of its (fairly static by now) functional architecture for the features above. If we can, then maybe best to start a new gem that ties into Sunspot.

If you or anyone else has time to help answer this question (while waiting to hear from AWS if they'll ever expose that Solr API), please coordinate with me and I'll provide all the info I have. I can work on this in spare time, but unfortunately my startup needs me on other projects now.

In the meantime, until we have a good idea of how to best implement the above functionality list, could we please keep this issue open?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants