Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds implementation for a Split Assigner and a Split Reader #44

Conversation

prodriguezdefino
Copy link
Collaborator

The assigner is in charge of creating the units of work for the Source enumerator (in an upcoming branch) and the enumerator will use the assigner output to deliver those units of works (the splits) to a split reader, in charge of getting the actual data from the source.


private static Credentials createCredentialsFromFile(String file) {
try {
return GoogleCredentials.fromStream(new FileInputStream(file));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9% of developers fix this issue

PATH_TRAVERSAL_IN: This API (java/io/FileInputStream.(Ljava/lang/String;)V) reads a file whose location might be specified by user input


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

/** An options object that covers the possible {@link Credentials} configurations. */
@AutoValue
@PublicEvolving
public abstract class CredentialsOptions implements Serializable {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoValueCouldNotWrite: Could not write generated class com.google.cloud.flink.bigquery.common.config.AutoValue_CredentialsOptions: javax.annotation.processing.FilerException: Attempt to recreate a file for type com.google.cloud.flink.bigquery.common.config.AutoValue_CredentialsOptions

❗❗ 2 similar findings have been found in this PR

🔎 Expand here to view all instances of this finding
File Path Line Number
flink-connector-bigquery/src/main/java/com/google/cloud/flink/bigquery/source/config/BigQueryReadOptions.java 41
flink-connector-bigquery/src/main/java/com/google/cloud/flink/bigquery/common/config/BigQueryConnectOptions.java 34

Visit the Lift Web Console to find more details in your report.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

@sonatype-lift
Copy link

sonatype-lift bot commented Jun 28, 2023

🛠 Lift Auto-fix

Some of the Lift findings in this PR can be automatically fixed. You can download and apply these changes in your local project directory of your branch to review the suggestions before committing.1

# Download the patch
curl https://lift.sonatype.com/api/patch/github.com/GoogleCloudDataproc/flink-bigquery-connector/44.diff -o lift-autofixes.diff

# Apply the patch with git
git apply lift-autofixes.diff

# Review the changes
git diff

Want it all in a single command? Open a terminal in your project's directory and copy and paste the following command:

curl https://lift.sonatype.com/api/patch/github.com/GoogleCloudDataproc/flink-bigquery-connector/44.diff | git apply

Once you're satisfied, commit and push your changes in your project.

Footnotes

  1. You can preview the patch by opening the patch URL in the browser.

@prodriguezdefino prodriguezdefino marked this pull request as ready for review July 11, 2023 22:32
@prodriguezdefino
Copy link
Collaborator Author

/gcbrun

@prodriguezdefino
Copy link
Collaborator Author

/gcbrun

Copy link
Member

@davidrabinowitz davidrabinowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please convert the asserts in the unit tests to use google-truth assert (they are more readable and less error-prone). Other than that it looks ok

@prodriguezdefino
Copy link
Collaborator Author

/gcbrun

@prodriguezdefino
Copy link
Collaborator Author

/gcbrun


/** A simple split assigner based on the BigQuery {@link ReadSession} streams. */
@Internal
public class BigQuerySourceSplitAssigner {

This comment was marked as resolved.

@prodriguezdefino
Copy link
Collaborator Author

prodriguezdefino commented Oct 10, 2023 via email


import org.apache.flink.annotation.Internal;

import org.apache.flink.shaded.guava30.com.google.common.collect.Lists;

This comment was marked as resolved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for DP 2.1 we plan on supporting 1.15.4 flink version which uses guava30 and for DP 2.2 we can upgrade it to guava31 when it is available.

@prodriguezdefino
Copy link
Collaborator Author

prodriguezdefino commented Oct 10, 2023 via email

projectId,
dataset,
table);
return this.readOptions

This comment was marked as resolved.

@prodriguezdefino
Copy link
Collaborator Author

prodriguezdefino commented Oct 10, 2023 via email

@prodriguezdefino
Copy link
Collaborator Author

prodriguezdefino commented Oct 10, 2023 via email

@jayadeep-jayaraman
Copy link
Collaborator

In this case then, should I keep the dependencies as they are?

On Mon, Oct 9, 2023 at 8:34 PM Jayadeep Jayaraman @.> wrote: @.* commented on this pull request. ------------------------------ In flink-connector-bigquery/src/main/java/com/google/cloud/flink/bigquery/source/split/BigQuerySourceSplitAssigner.java <#44 (comment)> : > + * the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ + +package com.google.cloud.flink.bigquery.source.split; + +import org.apache.flink.annotation.Internal; + +import org.apache.flink.shaded.guava30.com.google.common.collect.Lists; Yes for DP 2.1 we plan on supporting 1.15.4 flink version which uses guava30 and for DP 2.2 we can upgrade it to guava31 when it is available. — Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2HMF3QJ4CMFWPOUFT5N6LX6S65JAVCNFSM6AAAAAAZXWE26KVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTMNRWGAYTSOBVG4 . You are receiving this because you authored the thread.Message ID: </pull/44/review/1666019857@ github.com>

Yes

Copy link
Member

@davidrabinowitz davidrabinowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you use the Flink built-in shaded guava. Does the package change between flink versions? As we shade the connector, can we use our own guava, especially as versions previous to 32.0.1 are affected with security issues ? See https://mvnrepository.com/artifact/com.google.guava/guava


public abstract String getRowRestriction();

@Nullable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace Nullable fields with Optional as it better indicate that he field may have no value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@prodriguezdefino
Copy link
Collaborator Author

/gcbrun

@prodriguezdefino
Copy link
Collaborator Author

prodriguezdefino commented Oct 30, 2023

I see you use the Flink built-in shaded guava. Does the package change between flink versions? As we shade the connector, can we use our own guava, especially as versions previous to 32.0.1 are affected with security issues ? See https://mvnrepository.com/artifact/com.google.guava/guava

It does change change between Flink versions, AFAIK the last released one is 31.1 shaded with version 17.0 https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-guava/31.1-jre-17.0

Also, there is a rule on check style that disallow the usage of guava related replacements, similar problems we when through with the request to use Google Truth specific libraries.

I can open another issue to cover this particular version migration, since is not a trivial change, @davidrabinowitz please advise.

@davidrabinowitz
Copy link
Member

I see you use the Flink built-in shaded guava. Does the package change between flink versions? As we shade the connector, can we use our own guava, especially as versions previous to 32.0.1 are affected with security issues ? See https://mvnrepository.com/artifact/com.google.guava/guava

It does change change between Flink versions, AFAIK the last released one is 31.1 shaded with version 17.0 https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-guava/31.1-jre-17.0

Also, there is a rule on check style that disallow the usage of guava related replacements, similar problems we when through with the request to use Google Truth specific libraries.

I can open another issue to cover this particular version migration, since is not a trivial change, @davidrabinowitz please advise.

Does it mean we will need to create different connectors for each Flink version?

@davidrabinowitz davidrabinowitz merged commit 12745ae into GoogleCloudDataproc:main Oct 31, 2023
4 checks passed
prodriguezdefino added a commit to prodriguezdefino/flink-bigquery-connector that referenced this pull request Oct 31, 2023
@prodriguezdefino
Copy link
Collaborator Author

I see you use the Flink built-in shaded guava. Does the package change between flink versions? As we shade the connector, can we use our own guava, especially as versions previous to 32.0.1 are affected with security issues ? See https://mvnrepository.com/artifact/com.google.guava/guava

It does change change between Flink versions, AFAIK the last released one is 31.1 shaded with version 17.0 https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-guava/31.1-jre-17.0
Also, there is a rule on check style that disallow the usage of guava related replacements, similar problems we when through with the request to use Google Truth specific libraries.
I can open another issue to cover this particular version migration, since is not a trivial change, @davidrabinowitz please advise.

Does it mean we will need to create different connectors for each Flink version?

I guess there is a reason why the runtime wants to take control of guava related versions (alongside other libraries), Flink may have run into problems opening the connectors to use any dependencies. They go a long way into the project pom config trying to quickly surface errors and emit messages to avoid the usage of certain packages on the dependency tree.

We can try to see in the future to shade all our dependencies into the distribution jarfile, see #47 where this distro jar project is added. It will imply validating afterwards against few versions of the runtime; that work is not trivial since we don't know how a specific version of our dependencies may meddle with the runtime or even other connectors that run with the same assumptions imposed by the runtime.

Also, can try to get rid of the code explicit dependencies on guava related packages and rely on java native capabilities to implement the needed functionalities. That will avoid the need of having vendored references on the actual code, but we still need to see how this code's dependencies on guava (to say one) will work with the versions imposed by the runtime.

@jayehwhyehentee jayehwhyehentee deleted the split_assigner_and_reader branch December 12, 2023 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants