Skip to content
This repository has been archived by the owner on Jun 29, 2021. It is now read-only.

METAMODEL-1225 Add ColumnTypingStrategies and use them in CSV module. #240

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

LiveSatomi
Copy link

There are two strategies for column typing, all text and manual.
The CSV module can use manual typing to filter based on numerical
greater than and less than operations.

There are two strategies for column typing, all text and manual.
The CSV module can use manual typing to filter based on numerical
greater than and less than operations.
@LiveSatomi
Copy link
Author

As mentioned in the JIRA comment, let me know if there is too much duplication between the typing package and naming package.
Also, do we need to handle trying to update the datastore with a data type that is different than the one you configured? I haven't tested but it seems to me you can now make a write to csv data stores that you can't read back (unless you reconfigured the datacontext to use string types which will always work)

@@ -492,7 +490,7 @@ public void testMaterializeTable() throws Exception {
public void testAlternativeDelimitors() throws Exception {
File file = new File("src/test/resources/csv_semicolon_singlequote.csv");
CsvDataContext dc = new CsvDataContext(file, semicolonConfiguration);
Table table = dc.getSchemas().get(0).getTables().get(0);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tripped me up for a while, it turns out getSchemas().get(0) returns the schema for information_schema while getDefaultSchema() returns the schema for specified resource. Now that the typing on the schema matters for reads dataSet.next() includes a cast which was failing.

Copy link
Contributor

@kaspersorensen kaspersorensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, I'm especially happy to see that you've aimed for consistency with previous work in the project. The "session" concept strikes me as a bit odd TBH, but it's consistent so I don't feel like I want to advocate against it :-) A few remarks here and there.

Copy link
Contributor

@arjansh arjansh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part. I did mostly have code style related comments.

*/
package org.apache.metamodel.schema.typing;

import org.apache.metamodel.schema.ColumnType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import org.apache.metamodel.schema.ColumnType is never used.

* Defines the context for configuring the type for a single column in a
* {@link ColumnTypingStrategy} session.
*/
public interface ColumnTypingContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have both a ColumnTypingContext interface and a ColumnTypingContextImpl. In this case you have exactly one implementation of an interface, for which I don't expect you want to create different implementations in the future, so I would personally drop the interface and rename the implementation to ColumnTypingContext.

*/
package org.apache.metamodel.schema.typing;

import org.apache.metamodel.schema.ColumnType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import org.apache.metamodel.schema.ColumnType is never used.

* @param table The table that contains the column
* @param columnIndex the index in the table of the column being configured.
*/
public ColumnTypingContextImpl( Table table, int columnIndex ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly a styling thing, but normally we don't put spaces after and before parentheses in constructor and method definitions and calls, also we try to promote making parameters (and variables) final when possible, so then this constructor definition would look like this:

    public ColumnTypingContextImpl(final Table table, final int columnIndex) {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will go back and double check the space padding in the rest of the files. My work uses different formatting rules and I'm not used to handling multiple rulesets in my IDE.

I agree with making parameters final when possible. Does this also apply to parameters of lambda functions? for example an instance of the ColumnTypingSession after adding the default implementation to close:

        return (final ColumnTypingContext ctx) -> {
            if ( iterator.hasNext() ) {
                return iterator.next();
            }
            return null;
        };

or

        return ctx -> {
            if ( iterator.hasNext() ) {
                return iterator.next();
            }
            return null;
        };

Personally I treat parameters as final whether they are marked as such or not, and I don't see any final lambda parameters in the project so far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a valid point, because with the second example you're able to assign a new Object (if you were able to instantiate ColumnTypingContext) to the ctx parameter. Even though I wouldn't consider doing that.

It's just that in practice I always see the second example (the one you implemented), so for lambdas let's not start adding finals (and types) to the formal parameters now.

* Ends the column typing session.
*/
@Override
void close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the close method is already defined on the Closeable interface, it's not necessary to define it here again. You could change this to a default implementation of the close method, i.e.:

    @Override
    default void close() {
    }

In this manner you wouldn't have to implement it on the anonymous inner classes returned by the startColumnTypingSession on the CustomColumnTypingStrategy and DefaultColumnTypingStrategy classes.

*
* @return a strategy that defaults to text
*/
public static ColumnTypingStrategy defaultStrategy() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java method names are by convention preferably verbs, so then this should be getDefaultStrategy and the ones below should be getCustomStrategy.

@Test
public void testTypeConfiguration() throws Exception {
ColumnTypingStrategy strategy = ColumnTypingStrategies.customTypes(ColumnType.STRING, ColumnType.NUMBER);
try ( final ColumnTypingSession session = strategy.startColumnTypingSession() ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a try with resources statement the resource is implicitly final, so here you can remove the final modifier.

@@ -22,8 +22,11 @@
import org.apache.metamodel.factory.AbstractDataContextFactory;
import org.apache.metamodel.factory.DataContextProperties;
import org.apache.metamodel.factory.ResourceFactoryRegistry;
import org.apache.metamodel.schema.ColumnType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import org.apache.metamodel.schema.ColumnType is never used.

@@ -833,8 +831,14 @@ public void testCustomColumnNames() throws Exception {
final String thirdColumnName = "third";
final String fourthColumnName = "fourth";

final ColumnType firstColumnType = ColumnType.STRING;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it adds real value to put these constants into the xxxColumnType variables, especially since they're only used once. It doesn't make the code more understandable or more elegant.


public void testCustomTyping() throws Exception {
ColumnType columnType1 = ColumnType.DATE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here as above. In my opinion just using the constant is more clear than adding it to a variable which doesn't add a meaning to it.

@arjansh
Copy link
Contributor

arjansh commented Apr 30, 2020

@kaspersorensen You had some comments on this pull requests, which mark it with "changes requested". Have these been addressed? If so, can we merge the pull request?

@kaspersorensen
Copy link
Contributor

Sorry guys @LiveSatomi @arjansh , had been buried in work for a bit. Gave this my approval now :-)

Copy link
Contributor

@arjansh arjansh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look at this pull request, to see why it hadn't been merged yet. I had one more comment. Once that's fixed, I'll merge it. I'd like to apologize that it took me this long to get back to this pull request.

char separatorChar, char quoteChar, char escapeChar, boolean failOnInconsistentRowLength,
boolean multilineValues) {

public CsvConfiguration(int columnNameLineNumber, ColumnNamingStrategy columnNamingStrategy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By replacing the old public constructor with this constructor, compile problems can occur in other projects using this library. Can you please retain the old constructor and have it invoke this constructor with a null value for the ColumnTypeStrategy parameter?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants