Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Custom extractor chaining with valid json. #13

Open
pbreah opened this issue Sep 3, 2022 · 1 comment
Open

Feature: Custom extractor chaining with valid json. #13

pbreah opened this issue Sep 3, 2022 · 1 comment

Comments

@pbreah
Copy link

pbreah commented Sep 3, 2022

@ThaNarie

Thank you for sharing your very nice lib. It helps to have this nice configurable data extractor.

I want to propose working on a feature that would take it to the next level - if you would accept a pull request. It's about having the ability to add your own pre-defined extractors and have the ability to extend it - without adding inline functions on the json configuration.

The reason is simple: I can add my extractor functions then simply provide a valid json structure that uses my custom extractor functions in a powerful way.

This is the idea:

import extractFromHTML from 'html-extract-data';
// immediately add my custom extractors
extractFromHTML.addExtractors({
  // p1 and p2 are just my extractor's custom parameters
  'myCustom': (extract, element, p1, p2) => {
    const d = extract({ query: '.js-image', attr: 'alt' });
    if (p1 === 'myTest1') {
      // do logic 1 to d
    }
    if (p2 === 'myTest2') {
      // do logic 2 to d
    }
    // etc.
    return d;
  },
  // on this case my custom extractor has just one parameter (p1)
  'myCustom2': (extract, element, p1) => {
   let d;
    if (p1) {
      // data received from output of preview function or explicit parameter
      d = p1.indexOf('dummy-data') !== -1 ? 'found': 'not-found';
    }
    return d;
  }
});

// at a later stage I pass a pure JSON config
const data = extractFromHTML(
  html,
  {
    query: '.grid-item',
    list: true,
    self: {
      'category': 'data-category',
      'id': { attr: 'data-id', convert: 'number' },
    },
    data: {
      title: 'h2',
      description: { query: 'p', html: true },
      tags: { query: '.tags > .tag', list: true },    
      price: { query: '.price', convert: parseFloat },
      date: { query: '.date', convert: 'date' },
      // myCustom is executed with the 1st parameter of true and 2nd with false, then the output of myCustom is used to run myCustom2
      image: 'myCustom true false | myCustom2'
    }
  });

As you can see this is a simple but powerful feature that uses pure JSON and my own extractors in a powerful chain - the output of one extractor into another. This could also work to "pipe" data to existing functions you added (like convert) so users can create powerful string expressions with existing features.

Would you accept a pull request with a feature like this?

Thanks,

@ThaNarie
Copy link
Owner

ThaNarie commented Sep 9, 2022

Hi @pbreah!

Thanks for your interrest and thought out feature request!
Unfortunately, it goes the opposite direction of where I want this to go.

Where want to make it more "code" to make sure TypeScript can help the developer better when configuring, you seem to want to go the "no-code" direction, even calling it a "json configuration" – which it clearly isn't :)

However, there might be a way to make it work as an optional element, something like:

const data = extractFromHTML(
  html,
  configFromJson(config, options),
)

Where config is your proposed JSON config, and options can at least contain the custom extractors that the function will be using. configFromJson would then convert your json-like config to something that does use functions and the library can already work with.

Doing it like that doesn't even require any change to the library, as it's something you can create yourself and design how you see fit. But since it can be tree-shaken out if it's not used, I'm not opposed to include it in here for other people to use once you finish your work :)

How does this sound to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants