Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some suggested improvements for this module #9

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

drunken-monkey
Copy link

First off: Amazing job, thanks a lot for this module! Looks really good!
I did find some areas to improve, though, most of which you seem to have been aware of anyways. Would be great if you'd consider some or all of these to this project. (It also includes a fix for #7.)

@drunken-monkey
Copy link
Author

Wow, Code Climate is pretty harsh … Please tell me which of those you'd actually consider problems and want me to fix!
Also, rather unrelatedly: Note that you could also add a locked and hidden processor to implement any changes you want done on the index as a UX improvement (like disabling cron-indexing if "solr_document" is the only datasource). That plus form altering should do the trick.

@DavidACameron-USDA
Copy link
Owner

Wow, this is incredible. Thanks so much for your hard work. You fixed several things that have been bugging me for a while, but that I hadn't even noted anywhere like ID field select list. I read through the changes and they look good. I can't wait to test them. I so wish I could work on this today, but I can't. I have unrelated stuff for work that I need to get done. I will come back to it this evening.

@DavidACameron-USDA
Copy link
Owner

Yeah, CodeClimate really likes to call you out on things. I had to lock it down a bit just to stop it from doing things like complaining about non-Camel case variable names in function definitions. I wasn't going to make them different from however they're written in the API just to make CC happy. I'll look at the issues it identified and see if there's anything to be done about them.

I know a couple will go away once the default query setting is implemented. CC doesn't like that the setting's code is commented-out, neither mine or the change you added. On a related note, we do need the setting. It was necessary to fix a problem with Sarnia, which I should have remembered since I worked on that issue.

@drunken-monkey
Copy link
Author

Glad you like it. ;) It's no problem if you don't have the time to review right away, I totally understand.
Good to know the default query option is needed. Should I already add that, too, then?

@DavidACameron-USDA
Copy link
Owner

If you feel like adding the setting, sure. Otherwise I can do it.

@DavidACameron-USDA
Copy link
Owner

I've been checking out all the changes this evening. Most of it looks really good. I have two or three questions and I want to do more testing on the new field config settings. When I tried to add and display a language field Views got really mad at me and printed a bunch of illegal offset warnings. I'd like to figure out why. It's almost 2AM though and I'm done for the night.

@drunken-monkey
Copy link
Author

drunken-monkey commented May 25, 2017

You're right, I messed up a bit with the language field (and all the other getItem*()methods, really). Should be fixed now.
However, I'm not sure why you set the value of SolrField to be the Search API field, and not just it's values? Any specific reason? I've changed it to just use the values, and that also fixes the problem with the __toString() method being called and its return value displayed for empty fields. Can't see anything that broke because of it, so I assume it works like this, too.
See also this Search API Solr issue for getting rid of some notices.

@drunken-monkey
Copy link
Author

Added the "Default query" option.
However, it seems to me like that code wouldn't actually ever be triggered, since Solarium automatically defaults to the ":" query if no other is specified (cf. \Solarium\QueryType\Select\Query\Query::$options). Should the setting override that default query, too?

@DavidACameron-USDA
Copy link
Owner

DavidACameron-USDA commented May 25, 2017

However, I'm not sure why you set the value of SolrField to be the Search API field, and not just it's values? Any specific reason?

I didn't know what I was doing. I just did what I thought made sense, especially when it came to the TypedData parts of the module. If you see a better, simpler way to do anything then that's great.

However, it seems to me like that code wouldn't actually ever be triggered, since Solarium automatically defaults to the ":" query if no other is specified (cf. \Solarium\QueryType\Select\Query\Query::$options). Should the setting override that default query, too?

I suppose I should have finished investigating whether or not we actually need the setting before saying that we definitely need it. In Sarnia, the setting was necessary because other, non-eDisMax query parsers default to not selecting any records when no query has been run. So you end up with a blank search page, which some people don't want. If Solarium's default query takes care of this, then we wouldn't need the setting. It's been a few weeks since I added the requestHandler setting, but I think when I tested that, handlers with other parsers still resulted in the blank page. I may be mis-remembering.

@DavidACameron-USDA
Copy link
Owner

DavidACameron-USDA commented May 26, 2017

Ok, so the questions I had are:

  1. I have a couple of concerns about the prettier property labels. After the change on my particular test server I end up with a list that looks like this:
    prettier_fields
    My first concern is that if an admin is expecting to see the field's machine name from the schema, then it might be a little confusing. I also wonder if someone might see the field prefixes like Ss, Sm, etc. and wonder what those are and if there might be something wrong. My thought is that leaving the labels as-is with their underscores might give some hint that they're machine names. Neither of these concerns are probably a big deal, but I'd like to know why you felt this change was a good idea. That way we're on the same page.

My second concern about it is that if underscores are removed from the leading characters of the string, then that label can appear out of alphabetical order with the other labels. You can see it in the screenshot where Version is at the top of the list. I think it would be a good idea to sort the array of labels after they're changed.

  1. I understand why the Language field config setting was implemented in the datasource plugin. It was a good idea. By the way, the change you made earlier today fixed the issue I was having. Views is properly displaying the document language using the Item Language field handler now.

What I don't understand is the purpose behind implementing the Label and URL config settings. They're not hurting anything, obviously, but I don't know how they're used or what benefit they have. Unlike the language, it doesn't look like there is any Views field handler for them. At least, no handler showed up when I added fields to those settings. I'm curious to know how implementing getItemLabel() and getItemUrl() benefits us, if you don't mind explaining it to me.

@DavidACameron-USDA
Copy link
Owner

DavidACameron-USDA commented May 26, 2017

I found an issue with the language configuration setting. That field must also be added to the index's field list. I configured the language field, added that language field to the index, added the Item Language field to a view, and verified that it was working.

After that, I wanted to see what would happen if the field were removed from the index. I suspected that it might cause a problem and it did. When I refreshed the view, no results were returned and a warning message was printed: "The Solr field ss_language has not been configured in the index."

I have an idea for how this might be fixed, but I'm too tired to test it at the moment. I'm looking at the query that's sent to Solr and I see search_api_language. If this is like search_api_id, then maybe we can just re-map search_api_language to our field just like I did for the ID in search_api_solr_datasource_search_api_solr_field_mapping_alter(). I don't know if that will work or not, but that's what I'll try first.

My guess is that the label and URL fields would have the same problem, but since I don't know how to use those configured fields (see above) I don't know how to test them.

@drunken-monkey
Copy link
Author

My first concern is that if an admin is expecting to see the field's machine name from the schema, then it might be a little confusing. I also wonder if someone might see the field prefixes like Ss, Sm, etc. and wonder what those are and if there might be something wrong. My thought is that leaving the labels as-is with their underscores might give some hint that they're machine names. Neither of these concerns are probably a big deal, but I'd like to know why you felt this change was a good idea. That way we're on the same page.

As the PR title says, this is just a suggestion. If there are no prefixes or suffixes, this generally makes the labels look nicer, and doesn't really add confusion, I think. And my thinking was that if you use an external source for the Solr index, you'd be more likely to have a schema with fixed fields, not dynamic ones like with the Search API Solr module. I could easily be mistaken there, though, it's just a guess. In any case, keeping the raw field names there is probably the safer bet. (It also avoids ambiguity in case two field names map to the same "pretty label".) Or, as an alternative, we could include the raw names in the labels, like so: "Ds changed (ds_changed)".
Regarding sorting: That's true, but should just be fixed by the Search API itself – I really don't know why we haven't done that yet. This avoids sorting in the majority of cases, where the list isn't displayed to the user anyways.

What I don't understand is the purpose behind implementing the Label and URL config settings.

As you say, they're not hurting anything, and they are supported by the framework, so I don't see a reason why we shouldn't have them.
The item URL will be used if you check the "Link this field to its item" option in a Views field handler, so that's definitely useful to have.
I honestly don't know whether the item label is currently really used anywhere – however, it's there in the framework, and there are always more Search API modules appearing, so there's a good chance it will be used by some of them, and this way we'll already integrate with those, even without knowing about them.

I found an issue with the language configuration setting. That field must also be added to the index's field list. […] My guess is that the label and URL fields would have the same problem

Your suggested fix for the language field sounds good, thanks, didn't think of that. Unfortunately, label and URL don't have those mappings available – see \Drupal\search_api\Backend\BackendPluginBase::getSpecialFields(). So we'd need to find some solution for those as well.
Easier suggestion: Just include a note in the description saying that this will only work correctly if you add the field to the index, too.
Harder "proper" suggestion: Include the selected fields in the retrieved fields (in search_api_solr_datasource_search_api_solr_query_alter()), too, and then either add them to the result items manually or by adding them to the fields mapping.

@drunken-monkey
Copy link
Author

Your suggested fix for the language field sounds good, thanks

Have to go back on that: it does sound good, but unfortunately it doesn't seem like it'll work that easily. The problem is that the Solr backend still won't add the field to the item in SearchApiSolrBackend::extractResults(). There is special code there for adding the item ID, but not for any of the other "special" fields.
However, I do have a third suggestion for solving this, or a variant to the second, "proper" solution: with this Solr module patch we could just get the field values directly from the Solr document, not needing explicit fields on the Search API item. This would make the fields that can be displayed more or less independent of those for which filters can be added – we could, in theory, even make that configurable separately, or re-add the Views mappings for non-indexed Solr fields (i.e., the search_api_datasource_*_solr_document tables).
NB: It would be possible without too much hassle to this without the Solr patch, too, by just implementing something like it in our hook_search_api_solr_search_results_alter() implementation. However, as noted in the Solr issue, it's simpler with that patch. (And I don't think Markus will disapprove of it.)

@drunken-monkey
Copy link
Author

The Solr patch got committed, and I think I've fixed the problem with retrieving the language (etc.).
In this new version, the SolrDocument typed-data object also returns a property if it has its definition but can't retrieve any values for it. This, I think, is more in line with how typed data works in general – and it avoids nasty errors if not all documents have the same set of fields.
Not sure whether you still want to keep the, now unused, \Drupal\search_api_solr_datasource\Plugin\DataType\SolrField::createFromField() method? We could also refactor to move the logic that's now in \Drupal\search_api_solr_datasource\Plugin\DataType\SolrDocument::get() there again – your call.
In any case, though, I think the code that tries to retrieve the field from the item should not be based on the field ID, but on the property paths. Otherwise, this will (needlessly) break if a user changes the field's machine name in the UI. (We could of course say they just shouldn't do that, but it seems to me we can very easily support this, so why woulnd't we?) I've committed that change, too – please try it out!

@sylus
Copy link

sylus commented Jun 12, 2017

I've pulled @drunken-monkey patch to my local and updated all my cores. Everything still works great and haven't noticed any regressions :)

@DavidACameron-USDA
Copy link
Owner

Cool. Thank you for letting me know. I made it through reviewing most of the changes. I just haven't had time lately to finish. I'm sorry guys. I'll try to wrap it up in the next couple of days.

@ravi-sidd
Copy link

Where will the data come from? Can @Dcameron please help?

@DavidACameron-USDA
Copy link
Owner

@ravi-sidd You should not use this module. Its functionality was merged into Search API Solr 8.x-2.x.

But to answer your question, it allows searching of any Solr data. It doesn't matter where the data comes from. It was made for people who have non-Drupal Solr data that they need to search.

@ravi-sidd
Copy link

Hey @Dcameron Thanks for writing. Really appreciate your quick response. I have a scenario like this - Could you please suggest on how to achieve this?

  • I have a site where I have setup search_api, search_api_solr
  • I am using Acquia search module because Acquia happens to be the connector
  • I have setup Content and Media datasources on Search API index
    This is for all internal site content.

Now there is a need where I need to web scrapping and fetch data from other external non Drupal sites-

  • I am able to scrap the data
  • Now I need to insert/push the data directly to Solr
    In other words want to index this data without storing anywhere on Drupal.

Any suggestions or thoughts on how to achieve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants