Skip to content

Commit

Permalink
Merge pull request apache#3658 from sramazzina/STATIC-SCHEMA
Browse files Browse the repository at this point in the history
fix apache#3657 Static Schema Definition
  • Loading branch information
hansva authored Mar 5, 2024
2 parents 99ac781 + 29a50dc commit bc88aa5
Show file tree
Hide file tree
Showing 78 changed files with 4,342 additions and 86 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ bin
.DS_Store
routes.yml
*.tgf
.vscode

rebel.xml
rebel-remote.xml
Expand Down
1 change: 1 addition & 0 deletions assemblies/core/lib/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hop</groupId>
<artifactId>hop-ui</artifactId>
Expand Down
8 changes: 8 additions & 0 deletions assemblies/core/lib/src/assembly/assembly.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,12 @@
<outputDirectory>.</outputDirectory>
</fileSet>
</fileSets>
<dependencySets>
<dependencySet>
<useProjectArtifact>false</useProjectArtifact>
<includes>
<include>org.apache.hop:hop-plugins-static-schema:jar</include>
</includes>
</dependencySet>
</dependencySets>
</assembly>
5 changes: 5 additions & 0 deletions assemblies/lib/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,11 @@
<artifactId>hop-core</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hop</groupId>
<artifactId>hop-plugins-static-schema</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hop</groupId>
<artifactId>hop-engine</artifactId>
Expand Down
13 changes: 13 additions & 0 deletions assemblies/plugins/dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2077,6 +2077,19 @@
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hop</groupId>
<artifactId>hop-assemblies-plugins-transforms-schemamapping</artifactId>
<version>${project.version}</version>
<type>zip</type>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hop</groupId>
<artifactId>hop-assemblies-plugins-transforms-samplerows</artifactId>
Expand Down
1 change: 1 addition & 0 deletions assemblies/plugins/transforms/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@
<module>salesforce</module>
<module>sasinput</module>
<module>samplerows</module>
<module>schemamapping</module>
<module>script</module>
<module>selectvalues</module>
<module>serverstatus</module>
Expand Down
44 changes: 44 additions & 0 deletions assemblies/plugins/transforms/schemamapping/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
~
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>org.apache.hop</groupId>
<artifactId>hop-assemblies-plugins-transforms</artifactId>
<version>2.9.0-SNAPSHOT</version>
</parent>


<artifactId>hop-assemblies-plugins-transforms-schemamapping</artifactId>
<version>2.9.0-SNAPSHOT</version>
<packaging>pom</packaging>

<name>Hop Assemblies Plugins Transforms Schema Mapping</name>
<description />

<dependencies>
<dependency>
<groupId>org.apache.hop</groupId>
<artifactId>hop-transform-schemamapping</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
~
-->

<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.3 http://maven.apache.org/xsd/assembly-1.1.3.xsd">
<id>hop-assemblies-plugins-transforms-schemamapping</id>
<formats>
<format>zip</format>
</formats>
<baseDirectory>transforms/schemamapping</baseDirectory>
<files>
<file>
<source>${project.basedir}/src/main/resources/version.xml</source>
<outputDirectory>.</outputDirectory>
<filtered>true</filtered>
</file>
</files>
<fileSets>
<fileSet>
<outputDirectory>lib</outputDirectory>
<excludes>
<exclude>**/*</exclude>
</excludes>
</fileSet>
</fileSets>
<dependencySets>
<dependencySet>
<useProjectArtifact>false</useProjectArtifact>
<includes>
<include>org.apache.hop:hop-transform-schemamapping:jar</include>
</includes>
</dependencySet>
</dependencySets>
</assembly>
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
~
-->

<version>${project.version}</version>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
////
:imagesdir: ../../assets/images/
:page-pagination:
:description: A Schema File Definition describes a stream layout that can be applied to a selected set of input/output transforms. The Schema Definition is a way to define a recurrent stream layout that can be reused around multiple pipelines by making the things easier by nit requiring the user to redefine if multiple times.

= Schema Definition

== Description

image:icons/folder.svg[]

A Schema Definition describes a stream layout that can be applied to a selected set of input/output transforms. The Schema Definition is a way to define a recurrent stream layout, that can be reused in multiple pipelines. This simplify the development by making the things easier because it saves the user to redefine the same set of fields and its associated attributes multiple times in multiple pipelines.


== Related Plugins

Transforms:

* xref:pipeline/transforms/textfileinput.adoc[Text File Input]
* xref:pipeline/transforms/textfileoutput.adoc[Text File Output]
* xref:pipeline/transforms/csvinput.adoc[CSV Input]
* xref:pipeline/transforms/excelinput.adoc[Excel Input]
* xref:pipeline/transforms/excelwriter.adoc[Excel Writer]

== Options

[options="header"]
|===
|Option |Description
|Name|The name to be used for this Beam file definition
|Description|The description to be used for this Beam file definition
|Field Separator|separator used between fields in the file definition
|Enclosure|field enclosure used for fields in the file definition
|Field Definitions|List of field's attributes that describes the file layout for this field definition.
|===

////
== Samples
* beam/pipelines/complex.hpl
* beam/pipelines/generate-synthetic-data.hpl
* beam/pipelines/input-process-output.hpl
* beam/pipelines/switch-case.hpl
* beam/pipelines/unbounded-synthetic-data.hpl
////
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ under the License.
|
== Description

The CSV File Input transform reads data from a delimited file.
The CSV File Input transform reads data from a delimited file. You can choose to use a xref:metadata-types/schema-file-definition.adoc[Schema Definition] or to define the required fields' layout manually.

The CSV label for this transform is a misnomer because you can define whatever separator you want to use, such as pipes, tabs, and semicolons; you are not constrained to using commas.
Internal processing allows this transform to process data quickly.
Expand Down Expand Up @@ -76,6 +76,7 @@ When reading multiple files, the total size of all files is taken into considera
In that specific case, make sure that ALL transform copies receive all files that need to be read, otherwise, the parallel algorithm will not work correctly (for obvious reasons).
WARNING: For technical reasons, parallel reading of CSV files is only supported on files that don't have fields with line breaks or carriage returns in them.
|File Encoding|Specify the encoding of the file being read.
|Schema Definition|Name of the xref:metadata-types/schema-file-definition.adoc[Schema Definition] that we want to reference.
|Fields Table|This table contains an ordered list of fields to be read from the target file.
|Preview button|Click to preview the data coming from the target file.
|Get Fields button|Click to return a list of fields from the target file based on the current settings (i.e. Delimiter, Enclosure, etc.).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ When you read other file types like OpenOffice ODS and using special functions l
[options="header"]
|===
|Option|Description
|transform Name|Name of the transform; the name has to be unique in a single transform.
|Transform Name|Name of the transform; the name has to be unique in a single transform.
|Spread sheet type (engine) a|This field allows you to specify the spreadsheet type.
Currently the following are supported:

Expand Down Expand Up @@ -108,15 +108,17 @@ The name of that file is <errorline dir>/filename.<date_time>.<errorline extensi

=== Fields tab

The fields tab is for specifying the fields that must be read from the Excel files.
Use Get fields from header row to fill in the available fields if the sheets have a header row automatically.
The fields tab is for specifying the fields that must be read from the Excel files. You can choose to use a xref:metadata-types/schema-file-definition.adoc[Schema Definition] or to define the required fields' layout manually.

Use _Get fields from header_ row to fill in the available fields if the sheets have a header row automatically.

The Type column performs type conversions for a given field.
For example, if you want to read a date and you have a String value in the Excel file, specify the conversion mask.
Note: In the case of Number to Date conversion (for example, 20051028--> October 28th, 2005) specify the conversion mask yyyyMMdd because there will be an implicit Number to String conversion taking place before doing the String to Date conversion.

|===
|Option|Description
|Schema Definition|Name of the xref:metadata-types/schema-file-definition.adoc[Schema Definition] that we want to reference.
|Name|The name of the field.
|Type|The field's data type; String, Date or Number.
|Length|The length option depends on the field type.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,14 @@ Negative numbers may be useful if you need to append to a sheet, but still prese

*Fields section*

The fields section is for specifying the fields that must be written to the Excel file. You can choose to use a xref:metadata-types/schema-file-definition.adoc[Schema Definition] or to define the required fields' layout manually.

If you decide to define the fields layout by using a xref:metadata-types/schema-file-definition.adoc[Schema Definition], use the xref:pipeline/transforms/schemamapping.adoc[Schema Mapping] transform to adjust the incoming stream according to the choosen xref:metadata-types/schema-file-definition.adoc[Schema Definition]

[options="header"]
|===
|Option|Description
|Schema Definition|Name of the xref:metadata-types/schema-file-definition.adoc[Schema Definition] that we want to reference.
|Name|The field to write
|Type|The type of data
|Format|The Excel format to use in the sheet.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ You must always specify the data type or you will have errors like the following
[options="header"]
|===
|Option|Description
|transform Name|The name of this transform as it appears in the pipeline workspace.
|Transform Name|The name of this transform as it appears in the pipeline workspace.
|Name|Name of the field.
|Variable a|Allows you to enter variables as complete strings to return rows or add values to input rows.
For example, you can specify: ${openvar}java.io.tmpdir{closevar}/hop/tempfile.txt and it will be expanded to /tmp/hop/tempfile.txt on Unix-like systems.
Expand Down
Loading

0 comments on commit bc88aa5

Please sign in to comment.