Skip to content

Commit

Permalink
feat: adding spotless initial
Browse files Browse the repository at this point in the history
  • Loading branch information
vibhatha committed May 25, 2024
1 parent 7c8ce45 commit 1af6bbb
Show file tree
Hide file tree
Showing 57 changed files with 1,503 additions and 1,297 deletions.
54 changes: 41 additions & 13 deletions docs/source/developers/java/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,30 +110,58 @@ integration tests, you would do:
Code Style
==========

Java code style is enforced with Checkstyle. The configuration is located at `checkstyle`_.
You can also just check the style without building the project.
This checks the code style of all source code under the current directory or from within an individual module.
The current Java code styles are configured as follows:

.. code-block::
- Indent: Tabs & spaces (2 spaces per tab)
- Google Java Format: Reformats Java source code to comply with `Google Java Style`_.
- Configure license headers for Java & XML files

$ mvn checkstyle:check

Maven ``pom.xml`` style is enforced with Spotless using `Apache Maven pom.xml guidelines`_
You can also just check the style without building the project.
This checks the style of all pom.xml files under the current directory or from within an individual module.
Java code style is checked by `Spotless`_ during the build, and the continuous integration build will verify
that changes adhere to the style guide.

.. code-block::
Automatically fixing code style issues
--------------------------------------

- You can also just check the style without building the project with `mvn spotless:check`.
- The Java code style can be corrected from the command line by using the following commands: `mvn spotless:apply`.

.. code-block:: bash
$ mvn spotless:check
The following files had format violations:
src/main/java/org/apache/arrow/algorithm/rank/VectorRank.java
@@ -15,7 +15,6 @@
·*·limitations·under·the·License.
·*/
This applies the style to all pom.xml files under the current directory or from within an individual module.
-
package·org.apache.arrow.algorithm.rank;
import·java.util.stream.IntStream;
Run 'mvn spotless:apply' to fix these violations.
Code Formatter for Intellij IDEA and Eclipse
--------------------------------------------

Follow the instructions for:

- `Eclipse`_
- `IntelliJ`_

Code style enforced with Checkstyle for most of the modules. The configuration is located at `checkstyle`_.
You can also just check the style without building the project.
This checks the code style of all source code under the current directory or from within an individual module.
Checkstyle will be removed once Spotless is fully integrated.

.. code-block::
$ mvn spotless:apply
$ mvn checkstyle:check
.. _benchmark: https://github.com/ursacomputing/benchmarks
.. _archery: https://github.com/apache/arrow/blob/main/dev/conbench_envs/README.md#L188
.. _conbench: https://github.com/conbench/conbench
.. _checkstyle: https://github.com/apache/arrow/blob/main/java/dev/checkstyle/checkstyle.xml
.. _Apache Maven pom.xml guidelines: https://maven.apache.org/developers/conventions/code.html#pom-code-convention
.. _Spotless: https://github.com/diffplug/spotless
.. _Google Java Style: https://google.github.io/styleguide/javaguide.html
.. _Eclipse: https://github.com/google/google-java-format?tab=readme-ov-file#eclipse
.. _IntelliJ: https://github.com/google/google-java-format?tab=readme-ov-file#intellij-android-studio-and-other-jetbrains-ides
88 changes: 87 additions & 1 deletion java/algorithm/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@
<name>Arrow Algorithms</name>
<description>(Experimental/Contrib) A collection of algorithms for working with ValueVectors.</description>

<properties>
<spotless.version>2.30.0</spotless.version>
</properties>

<dependencies>
<dependency>
<groupId>org.apache.arrow</groupId>
Expand Down Expand Up @@ -48,5 +52,87 @@
</dependency>
</dependencies>

<build></build>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-checkstyle-plugin</artifactId>
<version>3.1.0</version>
<configuration>
<skip>true</skip>
</configuration>
</plugin>
</plugins>
</build>

<profiles>
<profile>
<id>spotless</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<build>
<plugins>
<plugin>
<groupId>com.diffplug.spotless</groupId>
<artifactId>spotless-maven-plugin</artifactId>
<version>${spotless.version}</version>
<configuration>
<formats>
<format>
<!-- configure license for xml files -->
<includes>
<include>pom.xml</include>
</includes>
<licenseHeader>
<file>${maven.multiModuleProjectDirectory}/java/spotless/asf-xml.license</file>
<delimiter>(&lt;configuration|&lt;project)</delimiter>
</licenseHeader>
</format>
<format>
<!-- configure license for java files -->
<includes>
<include>**/*.java</include>
</includes>
<licenseHeader>
<file>${maven.multiModuleProjectDirectory}/java/spotless/asf-java.license</file>
<delimiter>package</delimiter>
</licenseHeader>
</format>
</formats>
<java>
<googleJavaFormat>
<version>1.17.0</version>
<style>GOOGLE</style>
</googleJavaFormat>
</java>
<pom>
<indent>
<tabs>true</tabs>
<spacesPerTab>2</spacesPerTab>
</indent>
<indent>
<spaces>true</spaces>
<spacesPerTab>2</spacesPerTab>
</indent>
<sortPom>
<expandEmptyElements>false</expandEmptyElements>
</sortPom>
</pom>
</configuration>
<executions>
<execution>
<id>spotless-check</id>
<goals>
<goal>apply</goal>
<goal>check</goal>
</goals>
<phase>validate</phase>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.arrow.algorithm.deduplicate;

import org.apache.arrow.memory.ArrowBuf;
Expand All @@ -26,18 +25,18 @@
import org.apache.arrow.vector.compare.RangeEqualsVisitor;
import org.apache.arrow.vector.util.DataSizeRoundingUtil;

/**
* Utilities for vector deduplication.
*/
/** Utilities for vector deduplication. */
class DeduplicationUtils {

/**
* Gets the start positions of the first distinct values in a vector.
*
* @param vector the target vector.
* @param runStarts the bit set to hold the start positions.
* @param <V> vector type.
*/
public static <V extends ValueVector> void populateRunStartIndicators(V vector, ArrowBuf runStarts) {
public static <V extends ValueVector> void populateRunStartIndicators(
V vector, ArrowBuf runStarts) {
int bufSize = DataSizeRoundingUtil.divideBy8Ceil(vector.getValueCount());
Preconditions.checkArgument(runStarts.capacity() >= bufSize);
runStarts.setZero(0, bufSize);
Expand All @@ -55,6 +54,7 @@ public static <V extends ValueVector> void populateRunStartIndicators(V vector,

/**
* Gets the run lengths, given the start positions.
*
* @param runStarts the bit set for start positions.
* @param runLengths the run length vector to populate.
* @param valueCount the number of values in the bit set.
Expand All @@ -76,15 +76,15 @@ public static void populateRunLengths(ArrowBuf runStarts, IntVector runLengths,
}

/**
* Gets distinct values from the input vector by removing adjacent
* duplicated values.
* Gets distinct values from the input vector by removing adjacent duplicated values.
*
* @param indicators the bit set containing the start positions of distinct values.
* @param inputVector the input vector.
* @param outputVector the output vector.
* @param <V> vector type.
*/
public static <V extends ValueVector> void populateDeduplicatedValues(
ArrowBuf indicators, V inputVector, V outputVector) {
ArrowBuf indicators, V inputVector, V outputVector) {
int dstIdx = 0;
for (int srcIdx = 0; srcIdx < inputVector.getValueCount(); srcIdx++) {
if (BitVectorHelper.get(indicators, srcIdx) != 0) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.arrow.algorithm.deduplicate;

import org.apache.arrow.memory.ArrowBuf;
Expand All @@ -26,29 +25,28 @@
import org.apache.arrow.vector.util.DataSizeRoundingUtil;

/**
* Remove adjacent equal elements from a vector.
* If the vector is sorted, it removes all duplicated values in the vector.
* Remove adjacent equal elements from a vector. If the vector is sorted, it removes all duplicated
* values in the vector.
*
* @param <V> vector type.
*/
public class VectorRunDeduplicator<V extends ValueVector> implements AutoCloseable {

/**
* Bit set for distinct values.
* If the value at some index is not equal to the previous value,
* its bit is set to 1, otherwise its bit is set to 0.
* Bit set for distinct values. If the value at some index is not equal to the previous value, its
* bit is set to 1, otherwise its bit is set to 0.
*/
private ArrowBuf distinctValueBuffer;

/**
* The vector to deduplicate.
*/
/** The vector to deduplicate. */
private final V vector;

private final BufferAllocator allocator;

/**
* Constructs a vector run deduplicator for a given vector.
* @param vector the vector to deduplicate. Ownership is NOT taken.
*
* @param vector the vector to deduplicate. Ownership is NOT taken.
* @param allocator the allocator used for allocating buffers for start indices.
*/
public VectorRunDeduplicator(V vector, BufferAllocator allocator) {
Expand All @@ -65,17 +63,20 @@ private void createDistinctValueBuffer() {

/**
* Gets the number of values which are different from their predecessor.
*
* @return the run count.
*/
public int getRunCount() {
if (distinctValueBuffer == null) {
createDistinctValueBuffer();
}
return vector.getValueCount() - BitVectorHelper.getNullCount(distinctValueBuffer, vector.getValueCount());
return vector.getValueCount()
- BitVectorHelper.getNullCount(distinctValueBuffer, vector.getValueCount());
}

/**
* Gets the vector with deduplicated adjacent values removed.
*
* @param outVector the output vector.
*/
public void populateDeduplicatedValues(V outVector) {
Expand All @@ -88,14 +89,16 @@ public void populateDeduplicatedValues(V outVector) {

/**
* Gets the length of each distinct value.
*
* @param lengthVector the vector for holding length values.
*/
public void populateRunLengths(IntVector lengthVector) {
if (distinctValueBuffer == null) {
createDistinctValueBuffer();
}

DeduplicationUtils.populateRunLengths(distinctValueBuffer, lengthVector, vector.getValueCount());
DeduplicationUtils.populateRunLengths(
distinctValueBuffer, lengthVector, vector.getValueCount());
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,33 +14,31 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.arrow.algorithm.dictionary;

import org.apache.arrow.vector.ValueVector;

/**
* A dictionary builder is intended for the scenario frequently encountered in practice:
* the dictionary is not known a priori, so it is generated dynamically.
* In particular, when a new value arrives, it is tested to check if it is already
* in the dictionary. If so, it is simply neglected, otherwise, it is added to the dictionary.
* <p>
* The dictionary builder is intended to build a single dictionary.
* So it cannot be used for different dictionaries.
* </p>
* A dictionary builder is intended for the scenario frequently encountered in practice: the
* dictionary is not known a priori, so it is generated dynamically. In particular, when a new value
* arrives, it is tested to check if it is already in the dictionary. If so, it is simply neglected,
* otherwise, it is added to the dictionary.
*
* <p>The dictionary builder is intended to build a single dictionary. So it cannot be used for
* different dictionaries.
*
* <p>Below gives the sample code for using the dictionary builder
*
* <pre>{@code
* DictionaryBuilder dictionaryBuilder = ...
* ...
* dictionaryBuild.addValue(newValue);
* ...
* }</pre>
* </p>
* <p>
* With the above code, the dictionary vector will be populated,
* and it can be retrieved by the {@link DictionaryBuilder#getDictionary()} method.
* After that, dictionary encoding can proceed with the populated dictionary..
* </p>
*
* <p>With the above code, the dictionary vector will be populated, and it can be retrieved by the
* {@link DictionaryBuilder#getDictionary()} method. After that, dictionary encoding can proceed
* with the populated dictionary..
*
* @param <V> the dictionary vector type.
*/
Expand All @@ -58,7 +56,7 @@ public interface DictionaryBuilder<V extends ValueVector> {
* Try to add an element from the target vector to the dictionary.
*
* @param targetVector the target vector containing new element.
* @param targetIndex the index of the new element in the target vector.
* @param targetIndex the index of the new element in the target vector.
* @return the index of the new element in the dictionary.
*/
int addValue(V targetVector, int targetIndex);
Expand Down
Loading

0 comments on commit 1af6bbb

Please sign in to comment.