Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported APPEND mode for custom datasets #78

Open
raphi opened this issue Jul 10, 2018 · 10 comments
Open

Unsupported APPEND mode for custom datasets #78

raphi opened this issue Jul 10, 2018 · 10 comments

Comments

@raphi
Copy link
Contributor

raphi commented Jul 10, 2018

Hi team,

I'm using the Algolia plugin to sync my data once processed with Dataiku. However, I'm not able to use the option Append instead of overwrite

screen shot 2018-07-10 at 11 34 58 am

screen shot 2018-07-10 at 11 33 10 am

Is there anyway to circumvent this limitation?
Can we allow append mode on Algolia datasets?
I'm down to send a PR if you point me in the right direction!

Thanks a lot

cc. @cstenac @jereze

@alexcombessie
Copy link
Contributor

alexcombessie commented Jul 20, 2018

Hi Raphi,
To best help you, I would like to understand the use case you are trying to address. Could you detail your goal and send me a screenshot of your flow? Are you using scenarios to automate it? Cheers,
Alex

@raphi
Copy link
Contributor Author

raphi commented Jul 20, 2018

Hi @alexcombessie ,

I'd like from two data pipeline to push data into the same Algolia index. Here is a truncated screenshot of our current pipeline:

screen shot 2018-07-20 at 8 13 37 pm

We apply different process to different types of data, but at the end, we want to send both into the same Algolia index. Does it make sense?

No we are not using scenarios.

Thanks for your help!

@raphi
Copy link
Contributor Author

raphi commented Jul 26, 2018

Hi @alexcombessie

sorry to ping you again, but this is a real blocker for us... and we are stuck on it. Anything on our side we can do?

Please let me know, thanks

@alexcombessie
Copy link
Contributor

How about trying to sync to a 'dummy' copy of your dataset in APPEND MODE then sync that to algolia. This may work with an extra step, but without code.

@raphi
Copy link
Contributor Author

raphi commented Jul 27, 2018

The issue is that my two dataset that I want to sync to Algolia don't have the schema. Algolia being schemaless, we can have different records with different attributes. Syncing the two datasets into one force to have the same schema / columns.

Also, that doesn't fix the issue that for every sync with Algolia, it's deleting all objects before indexing them, which is not a behaviour we expect in that case.

@alexcombessie
Copy link
Contributor

Have you tried to partition your two input datasets, stack them and then sync the partitioned stack to Algolia? You may have to adapt the plugin code to take into account partitioning.

@raphi
Copy link
Contributor Author

raphi commented Jul 31, 2018

We cannot use the partitioned datasets :/

@alexcombessie
Copy link
Contributor

Are you using the free edition of Dataiku? Partitioning management is included in our Enterprise license.

@raphi
Copy link
Contributor Author

raphi commented Jul 31, 2018

Yes we are currently using the free edition.

@alexcombessie
Copy link
Contributor

Partitioning will be the simplest way for you to manage this flow without developing your own custom code. As highlighted in https://www.dataiku.com/dss/editions/, the free edition is not the best suited for this type of advanced workflows. I suggest you speak to [email protected] to discuss Enterprise licensing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants