Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate use/recommendation of 'provided' scope for connectors #4455

Closed
planetf1 opened this issue Jan 6, 2021 · 15 comments
Closed

Investigate use/recommendation of 'provided' scope for connectors #4455

planetf1 opened this issue Jan 6, 2021 · 15 comments
Assignees
Labels
build-improvement Build improvements - maven, gradle, GitHub actions gradle Gradle build (new initiative) pinned Keep open (do not time out)

Comments

@planetf1
Copy link
Member

planetf1 commented Jan 6, 2021

Benefit: 3rd party connectors - Reduce likelihood of incompatibilities with base egeria code & make easier to build

When we builds a connector in egeria we express dependencies on a variety of modules - be it in gradle or maven, most likely as compile&run . Those dependencies would include various egeria packages as well as third party packages that might be unique to the connector

We then build the jar with dependencies, those classes will get pulled into the uber jar

If the connector is always hosted in the server chassis, at runtime, some of these classes are already found. Plus the version from the connector since by adding to loader.path it may also be in the classpath as defined by the spring classloader.

So there could be two versions of the same library. If they’re the same we’d not notice. If they differ we would.
(We don’t currently use shading in terms of renaming).

Where connectors are being developed outside the egeria tree the risk of this is more pronounced

Ideally we could consider clarifying what the egeria 'platform' is, and referencing any packages in that platform as 'provided'. To assist in this we would also need a strict definition of the platform, such as a maven bom

We could also recommend this approach to third party developers to reduce any issue of mismatched code

It may also be that we need to be more careful and explicit in building our assembly in expressing what is packaged. Using maven scope only may not be flexible enough. But to have nothing too much work. Gradle tries to add flexibility by having ‘configurations’ which can be treated differently (ie for a client, server etc). The net question is the same though. what exactly do we want to distribute, what do we want to load…

@planetf1 planetf1 added the build-improvement Build improvements - maven, gradle, GitHub actions label Jan 6, 2021
@planetf1
Copy link
Member Author

planetf1 commented Jan 6, 2021

Currently we only use 'provided' scope for gaian & derby - since both relate to libraries we do not ourselves ship, and instead require them to be in-situ on the target (somewhat of a similar scenario)

Gradle lends itself better to a broader range of configurations -- for example we could far more easily lay out seperate trees in the assembly for 'clients', server, utilities, and in each case define the core common set that would be suited as 'provided'.

Draft set of tasks

  • Identify core set for server chassis
  • Do we need a maven bom to reflect
  • checking if any build ordering impact
  • ensuring assembly has all 'provided' libraries ie in lib (ideally making use of scope)
  • updating build of connectors accordingly
  • reflecting same in gradle
  • looking at commonality for clients

@davidradl
Copy link
Member

Makes sense. One comment. I assume that when we have multiple database connectors, there could / would be clashing classes e.g. different jdbc drivers. And different connectors would depend on different parts of the Server Chassis. For integration connectors and governance action framework connectors. I am thinking we should group the 'provided' library content per consuming connector, for ease of consumption.

@planetf1
Copy link
Member Author

planetf1 commented Jan 7, 2021

@davidradl I think the jdbc drivers themselves will have unique names. if they have an uber jar with other dependencies there is a risk (there always is) of divergent versions, though those libs may use shading - worthy of more checks. But I don't think the top level driver per se will be an issue.

For the server chassis, even though connectors may depend on different egeria modules at runtime, I'm thinking for simplicity we need to define a limited set of 'platforms' - maybe only one - that will ALWAYS be there.

However Most recently we have created a stripped down chassis which only has the very minimal capabilities. This can be assumed to always be there, but many other egeria modules something like a view service may depend on may, or may not, be there.

In that case whilst we can use 'provided' scope in the build process, we need more... We need to ensure code is always dynamically loaded, and that errors are sensibly detected and reported, as deciding what is present in the chassis at runtime becomes more of a deployment question

In essence we need to move the focus on that assembly from the build process to deployment - but to do this in such a way that anyone deploying can do it fairly easy. With say 5, 10 modules that might be viable. with 200-400+ (in egeria) we need more.

Thinking about this more, provided scope is only part of the approach - it adds the build flexibility, but doesn't solve the problem of the deployer getting the environment right.

@bogdan-sava
Copy link
Contributor

No need to have dependencies at all.
At runtime, if jar is missing from the lib folder, will ClassNotFoundException will be catch and wrapped and logged in audit log.

For that I need to change a little the #4347 . In the PR I moved the dependencies in profile, but if they are removed works just fine if the JARs are in the lib folder.
Already test that.

PS: I kept dependencies in the PR to be able to merge it before other changes.

@davidradl
Copy link
Member

@bogdan-sava I am looking of https://github.com/odpi/egeria-connector-ibm-information-server/blob/master/datastage-adapter/src/main/java/org/odpi/egeria/connectors/ibm/datastage/dataengineconnector/DataStageConnector.java which has an import org.odpi.openmetadata.frameworks.connectors.properties.ConnectionProperties;

So this is a dependancy that needs to be resolved at compile time. It is brought in using

   <dependency>
                <groupId>org.odpi.egeria</groupId>
                <artifactId>open-connector-framework</artifactId>
                <version>${open-metadata.version}</version>
                <scope>provided</scope>
            </dependency>

This means that the dependancy should be provided at runtime.

So without the provided dependancy in the pom file it would not compile.

@planetf1
Copy link
Member Author

planetf1 commented Jan 7, 2021

Yes - provided is needed for compile time (there will be a gradle equiv).

The main assembly I think also should be changed so that we provide the dependencies in server/lib or similar - not just connectors, but all. The chassis should be much leaner by default - as indeed you added in optional profile definition @bogdan-sava . I don't think we need the uber jar for the chassis long term, though do need to be ok with the dev use case also.

Note though some experimentation/investigation is needed here -- for example if we remove dependencies, we also - in maven at least - remove the control of build ordering. We also change how the assembly is built, as it may no longer know what dependencies to include, so need to adopt an alternative mechanism for this (gradle may make this easier)

I think we should start considering making this change in the next, or next+1 dev iteration ie Feb/Mar.

@planetf1
Copy link
Member Author

planetf1 commented Jan 7, 2021

When looking at the postgres connector specifically & trying a fix for this generic issue there, I note that gradle now has 'compileOnly' which is pretty much what we want for 'provided' :-)

However clarification over what can be assumed to be present, modifying assemblies to include all the libs as jars (not an uber jar), plus docs still need to be worked here, ie in the base

@cmgrote
Copy link
Member

cmgrote commented Jan 12, 2021

I would advocate for the approach of using the provided scope and still allowing a semi-uber-jar for the connector to be built (with its unique dependencies, but leaving out the Egeria dependencies), for these reasons:

  • This will ensure that we have a singular "release" with pre-defined depedencies (critically versions) baked-in
  • Should significantly improve the ability for us to identify issues and debug them accordingly (rather than needing to second-guess what versions of libraries someone might be running alongside the connector)
  • Will also mean that our code analysis mechanisms (security, etc) will identify potential issues with the baked-in dependencies (don't think there'd be any way for us to do this if users are manually choosing their dependency versions in the /lib directory?)

This is the approach I've taken in both the IGC and Atlas connectors (at least, explicitly including only the non-Egeria dependencies in the uber jar they each build, though I may not be explicitly using the provided scope)...

@planetf1
Copy link
Member Author

Also note that any recommendations should comment on use of snapshots (and usr of -U or the gradle equiv) - see odpi/egeria-database-connectors#14 . This is more relevant in the initial stages of building a new connector alongside core dev work, whilst when using a stable base using the last released version of egeria is 100% the way to go.

Leaving any further eval/doc on this until we've made base packaging changes ~ Feb 2021 to unpack dependencies in our assembly

@planetf1
Copy link
Member Author

This is part of the work to slim down the chassis - @bogdan-sava assigned to you to manage/coordinate but feel free to reassign particular changes needed back once you've figured out the overall approach.

@planetf1 planetf1 added the gradle Gradle build (new initiative) label Jun 2, 2021
@planetf1 planetf1 removed this from the 2021.05 (2.10) milestone Jun 30, 2021
@planetf1 planetf1 self-assigned this Aug 3, 2021
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Oct 15, 2021
@planetf1 planetf1 added pinned Keep open (do not time out) and removed no-issue-activity Issues automatically marked as stale because they have not had recent activity. labels Oct 18, 2021
@bogdan-sava
Copy link
Contributor

Using 'provided' is not suitable in connectors pom.
The reason is that connector might be dependency in a different project and if that project decide to package as far, 3rd party will not be considerate unless are explicit added in project pom or present to classpath on runtime.

For the connectors that are to be loaded on demands instead of being packaged in the executable, optional assembly jar-with-dependencies with 3rd party packages inside and without core egeria frameworks is more suitable. This way we avoid making missing adding 3rd party dependencies and their transient dependencies to classpath.

If you agree, let's close this investigate issue.

@planetf1
Copy link
Member Author

planetf1 commented Oct 20, 2021

I think what we need to make this work (an what is tried to be suggested above) is

  • Clarity on what the base platform is -- ie what can we always assume is present. This can be a packaging pom (we'd need to exclude this from the dependency check)
  • Use of 'provided' or 'compileOnly' for the scope of this 'base platform' dependency
  • Building jar-with-dependencies OR inclusion in 'lib' directory of all additional modules that are regular dependencies (ie compile or Implementation) (this is where the extra dependencies needed for a particular connector come in)

This would extend not just to connectors, but everything outside the core platform, so for example 'optional' OMAS modules etc.

I think we're both saying the same thing? As is chris with reference to 'semi-uber' jar. Semi as we're eliminating all the duplicates from the core platform which aren't needed as they are 'always there'

@planetf1
Copy link
Member Author

planetf1 commented Oct 20, 2021

Other potential impacts

  • Maven assembly
  • Gradle assembly
  • CI/CD process - publishing of artifacts
  • Docs around use of additional connectors
  • Developer information on compile/debug ie in intellij (but also cli)

If this direction sounds good I"m fine we close this issue, and create a new clean issue to track the implementation. This can be a high level issue to which we attach child issues (those can then be independently assigned & worked on & merged). First step being to edit a sequence of steps to get us to the goal.

One thing I'm wondering is whether to skip the maven version of this and just go gradle. the downside is it could delay the work as gradle builds needs verification/some completion. However last time we rejigged this area the maven assembly/build was problematic. I'm somewhat neutral. The reason I mention is that then also have the option to think more carefully about what maven artifacts are published/advertised which could further make it easier for those using the platform. The

@planetf1
Copy link
Member Author

I think we should close this and focus on #4667 as there is so much overlap in the two discussions. We can open new issues
on any particular async actions we need to take.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build-improvement Build improvements - maven, gradle, GitHub actions gradle Gradle build (new initiative) pinned Keep open (do not time out)
Projects
None yet
Development

No branches or pull requests

4 participants