feat: adding spotless initial

vibhatha · May 25, 2024 · 1af6bbb · 1af6bbb
1 parent 7c8ce45
commit 1af6bbb
Show file tree

Hide file tree

Showing 57 changed files with 1,503 additions and 1,297 deletions.
diff --git a/docs/source/developers/java/development.rst b/docs/source/developers/java/development.rst
@@ -110,30 +110,58 @@ integration tests, you would do:
 Code Style
 ==========
 
-Java code style is enforced with Checkstyle. The configuration is located at `checkstyle`_.
-You can also just check the style without building the project.
-This checks the code style of all source code under the current directory or from within an individual module.
+The current Java code styles are configured as follows:
 
-.. code-block::
+- Indent: Tabs & spaces (2 spaces per tab)
+- Google Java Format: Reformats Java source code to comply with `Google Java Style`_.
+- Configure license headers for Java & XML files
 
-    $ mvn checkstyle:check
 
-Maven ``pom.xml`` style is enforced with Spotless using `Apache Maven pom.xml guidelines`_
-You can also just check the style without building the project.
-This checks the style of all pom.xml files under the current directory or from within an individual module.
+Java code style is checked by `Spotless`_ during the build, and the continuous integration build will verify
+that changes adhere to the style guide.
 
-.. code-block::
+Automatically fixing code style issues
+--------------------------------------
+
+- You can also just check the style without building the project with `mvn spotless:check`.
+- The Java code style can be corrected from the command line by using the following commands: `mvn spotless:apply`.
+
+.. code-block:: bash
 
-    $ mvn spotless:check
+    The following files had format violations:
+        src/main/java/org/apache/arrow/algorithm/rank/VectorRank.java
+            @@ -15,7 +15,6 @@
+            ·*·limitations·under·the·License.
+            ·*/
 
-This applies the style to all pom.xml files under the current directory or from within an individual module.
+            -
+            package·org.apache.arrow.algorithm.rank;
+
+            import·java.util.stream.IntStream;
+    Run 'mvn spotless:apply' to fix these violations.
+
+Code Formatter for Intellij IDEA and Eclipse
+--------------------------------------------
+
+Follow the instructions for:
+
+- `Eclipse`_
+- `IntelliJ`_
+
+Code style enforced with Checkstyle for most of the modules. The configuration is located at `checkstyle`_.
+You can also just check the style without building the project.
+This checks the code style of all source code under the current directory or from within an individual module.
+Checkstyle will be removed once Spotless is fully integrated.
 
 .. code-block::
 
-    $ mvn spotless:apply
+    $ mvn checkstyle:check
 
 .. _benchmark: https://github.com/ursacomputing/benchmarks
 .. _archery: https://github.com/apache/arrow/blob/main/dev/conbench_envs/README.md#L188
 .. _conbench: https://github.com/conbench/conbench
 .. _checkstyle: https://github.com/apache/arrow/blob/main/java/dev/checkstyle/checkstyle.xml
-.. _Apache Maven pom.xml guidelines: https://maven.apache.org/developers/conventions/code.html#pom-code-convention
+.. _Spotless: https://github.com/diffplug/spotless
+.. _Google Java Style: https://google.github.io/styleguide/javaguide.html
+.. _Eclipse: https://github.com/google/google-java-format?tab=readme-ov-file#eclipse
+.. _IntelliJ: https://github.com/google/google-java-format?tab=readme-ov-file#intellij-android-studio-and-other-jetbrains-ides
diff --git a/java/algorithm/pom.xml b/java/algorithm/pom.xml
@@ -20,6 +20,10 @@
   <name>Arrow Algorithms</name>
   <description>(Experimental/Contrib) A collection of algorithms for working with ValueVectors.</description>
 
+  <properties>
+    <spotless.version>2.30.0</spotless.version>
+  </properties>
+
   <dependencies>
     <dependency>
       <groupId>org.apache.arrow</groupId>
@@ -48,5 +52,87 @@
     </dependency>
   </dependencies>
 
-  <build></build>
+  <build>
+    <plugins>
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-checkstyle-plugin</artifactId>
+        <version>3.1.0</version>
+        <configuration>
+          <skip>true</skip>
+        </configuration>
+      </plugin>
+    </plugins>
+  </build>
+
+  <profiles>
+    <profile>
+      <id>spotless</id>
+      <activation>
+        <activeByDefault>true</activeByDefault>
+      </activation>
+      <build>
+        <plugins>
+          <plugin>
+            <groupId>com.diffplug.spotless</groupId>
+            <artifactId>spotless-maven-plugin</artifactId>
+            <version>${spotless.version}</version>
+            <configuration>
+              <formats>
+                <format>
+                  <!-- configure license for xml files -->
+                  <includes>
+                    <include>pom.xml</include>
+                  </includes>
+                  <licenseHeader>
+                    <file>${maven.multiModuleProjectDirectory}/java/spotless/asf-xml.license</file>
+                    <delimiter>(&lt;configuration|&lt;project)</delimiter>
+                  </licenseHeader>
+                </format>
+                <format>
+                  <!-- configure license for java files -->
+                  <includes>
+                    <include>**/*.java</include>
+                  </includes>
+                  <licenseHeader>
+                    <file>${maven.multiModuleProjectDirectory}/java/spotless/asf-java.license</file>
+                    <delimiter>package</delimiter>
+                  </licenseHeader>
+                </format>
+              </formats>
+              <java>
+                <googleJavaFormat>
+                  <version>1.17.0</version>
+                  <style>GOOGLE</style>
+                </googleJavaFormat>
+              </java>
+              <pom>
+                <indent>
+                  <tabs>true</tabs>
+                  <spacesPerTab>2</spacesPerTab>
+                </indent>
+                <indent>
+                  <spaces>true</spaces>
+                  <spacesPerTab>2</spacesPerTab>
+                </indent>
+                <sortPom>
+                  <expandEmptyElements>false</expandEmptyElements>
+                </sortPom>
+              </pom>
+            </configuration>
+            <executions>
+              <execution>
+                <id>spotless-check</id>
+                <goals>
+                  <goal>apply</goal>
+                  <goal>check</goal>
+                </goals>
+                <phase>validate</phase>
+              </execution>
+            </executions>
+          </plugin>
+        </plugins>
+      </build>
+    </profile>
+  </profiles>
 </project>
diff --git a/java/algorithm/src/main/java/org/apache/arrow/algorithm/deduplicate/DeduplicationUtils.java b/java/algorithm/src/main/java/org/apache/arrow/algorithm/deduplicate/DeduplicationUtils.java
@@ -14,7 +14,6 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-
 package org.apache.arrow.algorithm.deduplicate;
 
 import org.apache.arrow.memory.ArrowBuf;
@@ -26,18 +25,18 @@
 import org.apache.arrow.vector.compare.RangeEqualsVisitor;
 import org.apache.arrow.vector.util.DataSizeRoundingUtil;
 
-/**
- * Utilities for vector deduplication.
- */
+/** Utilities for vector deduplication. */
 class DeduplicationUtils {
 
   /**
    * Gets the start positions of the first distinct values in a vector.
+   *
    * @param vector the target vector.
    * @param runStarts the bit set to hold the start positions.
    * @param <V> vector type.
    */
-  public static <V extends ValueVector> void populateRunStartIndicators(V vector, ArrowBuf runStarts) {
+  public static <V extends ValueVector> void populateRunStartIndicators(
+      V vector, ArrowBuf runStarts) {
     int bufSize = DataSizeRoundingUtil.divideBy8Ceil(vector.getValueCount());
     Preconditions.checkArgument(runStarts.capacity() >= bufSize);
     runStarts.setZero(0, bufSize);
@@ -55,6 +54,7 @@ public static <V extends ValueVector> void populateRunStartIndicators(V vector,
 
   /**
    * Gets the run lengths, given the start positions.
+   *
    * @param runStarts the bit set for start positions.
    * @param runLengths the run length vector to populate.
    * @param valueCount the number of values in the bit set.
@@ -76,15 +76,15 @@ public static void populateRunLengths(ArrowBuf runStarts, IntVector runLengths,
   }
 
   /**
-   * Gets distinct values from the input vector by removing adjacent
-   * duplicated values.
+   * Gets distinct values from the input vector by removing adjacent duplicated values.
+   *
    * @param indicators the bit set containing the start positions of distinct values.
    * @param inputVector the input vector.
    * @param outputVector the output vector.
    * @param <V> vector type.
    */
   public static <V extends ValueVector> void populateDeduplicatedValues(
-          ArrowBuf indicators, V inputVector, V outputVector) {
+      ArrowBuf indicators, V inputVector, V outputVector) {
     int dstIdx = 0;
     for (int srcIdx = 0; srcIdx < inputVector.getValueCount(); srcIdx++) {
       if (BitVectorHelper.get(indicators, srcIdx) != 0) {

diff --git a/...algorithm/src/main/java/org/apache/arrow/algorithm/deduplicate/VectorRunDeduplicator.java b/...algorithm/src/main/java/org/apache/arrow/algorithm/deduplicate/VectorRunDeduplicator.java
@@ -14,7 +14,6 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-
 package org.apache.arrow.algorithm.deduplicate;
 
 import org.apache.arrow.memory.ArrowBuf;
@@ -26,29 +25,28 @@
 import org.apache.arrow.vector.util.DataSizeRoundingUtil;
 
 /**
- * Remove adjacent equal elements from a vector.
- * If the vector is sorted, it removes all duplicated values in the vector.
+ * Remove adjacent equal elements from a vector. If the vector is sorted, it removes all duplicated
+ * values in the vector.
+ *
  * @param <V> vector type.
  */
 public class VectorRunDeduplicator<V extends ValueVector> implements AutoCloseable {
 
   /**
-   * Bit set for distinct values.
-   * If the value at some index is not equal to the previous value,
-   * its bit is set to 1, otherwise its bit is set to 0.
+   * Bit set for distinct values. If the value at some index is not equal to the previous value, its
+   * bit is set to 1, otherwise its bit is set to 0.
    */
   private ArrowBuf distinctValueBuffer;
 
-  /**
-   * The vector to deduplicate.
-   */
+  /** The vector to deduplicate. */
   private final V vector;
 
   private final BufferAllocator allocator;
 
   /**
    * Constructs a vector run deduplicator for a given vector.
-   * @param vector the vector to deduplicate.  Ownership is NOT taken.
+   *
+   * @param vector the vector to deduplicate. Ownership is NOT taken.
    * @param allocator the allocator used for allocating buffers for start indices.
    */
   public VectorRunDeduplicator(V vector, BufferAllocator allocator) {
@@ -65,17 +63,20 @@ private void createDistinctValueBuffer() {
 
   /**
    * Gets the number of values which are different from their predecessor.
+   *
    * @return the run count.
    */
   public int getRunCount() {
     if (distinctValueBuffer == null) {
       createDistinctValueBuffer();
     }
-    return vector.getValueCount() - BitVectorHelper.getNullCount(distinctValueBuffer, vector.getValueCount());
+    return vector.getValueCount()
+        - BitVectorHelper.getNullCount(distinctValueBuffer, vector.getValueCount());
   }
 
   /**
    * Gets the vector with deduplicated adjacent values removed.
+   *
    * @param outVector the output vector.
    */
   public void populateDeduplicatedValues(V outVector) {
@@ -88,14 +89,16 @@ public void populateDeduplicatedValues(V outVector) {
 
   /**
    * Gets the length of each distinct value.
+   *
    * @param lengthVector the vector for holding length values.
    */
   public void populateRunLengths(IntVector lengthVector) {
     if (distinctValueBuffer == null) {
       createDistinctValueBuffer();
     }
 
-    DeduplicationUtils.populateRunLengths(distinctValueBuffer, lengthVector, vector.getValueCount());
+    DeduplicationUtils.populateRunLengths(
+        distinctValueBuffer, lengthVector, vector.getValueCount());
   }
 
   @Override

diff --git a/java/algorithm/src/main/java/org/apache/arrow/algorithm/dictionary/DictionaryBuilder.java b/java/algorithm/src/main/java/org/apache/arrow/algorithm/dictionary/DictionaryBuilder.java
@@ -14,33 +14,31 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-
 package org.apache.arrow.algorithm.dictionary;
 
 import org.apache.arrow.vector.ValueVector;
 
 /**
- * A dictionary builder is intended for the scenario frequently encountered in practice:
- * the dictionary is not known a priori, so it is generated dynamically.
- * In particular, when a new value arrives, it is tested to check if it is already
- * in the dictionary. If so, it is simply neglected, otherwise, it is added to the dictionary.
- * <p>
- *   The dictionary builder is intended to build a single dictionary.
- *   So it cannot be used for different dictionaries.
- * </p>
+ * A dictionary builder is intended for the scenario frequently encountered in practice: the
+ * dictionary is not known a priori, so it is generated dynamically. In particular, when a new value
+ * arrives, it is tested to check if it is already in the dictionary. If so, it is simply neglected,
+ * otherwise, it is added to the dictionary.
+ *
+ * <p>The dictionary builder is intended to build a single dictionary. So it cannot be used for
+ * different dictionaries.
+ *
  * <p>Below gives the sample code for using the dictionary builder
+ *
  * <pre>{@code
  * DictionaryBuilder dictionaryBuilder = ...
  * ...
  * dictionaryBuild.addValue(newValue);
  * ...
  * }</pre>
- * </p>
- * <p>
- *   With the above code, the dictionary vector will be populated,
- *   and it can be retrieved by the {@link DictionaryBuilder#getDictionary()} method.
- *   After that, dictionary encoding can proceed with the populated dictionary..
- * </p>
+ *
+ * <p>With the above code, the dictionary vector will be populated, and it can be retrieved by the
+ * {@link DictionaryBuilder#getDictionary()} method. After that, dictionary encoding can proceed
+ * with the populated dictionary..
  *
  * @param <V> the dictionary vector type.
  */
@@ -58,7 +56,7 @@ public interface DictionaryBuilder<V extends ValueVector> {
    * Try to add an element from the target vector to the dictionary.
    *
    * @param targetVector the target vector containing new element.
-   * @param targetIndex  the index of the new element in the target vector.
+   * @param targetIndex the index of the new element in the target vector.
    * @return the index of the new element in the dictionary.
    */
   int addValue(V targetVector, int targetIndex);