Skip to content

Commit

Permalink
add documentation on how to build
Browse files Browse the repository at this point in the history
  • Loading branch information
panthony committed Mar 11, 2024
1 parent 166f0a9 commit 0ce50bf
Showing 1 changed file with 116 additions and 0 deletions.
116 changes: 116 additions & 0 deletions ONCRAWL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Oncrawl

## Why this fork

- Add support to disable statistics per column*


*: This is required when some column are huge (ex: HTML) as the truncate feature is only done when we dump the footer, whereas
we do not wish to compute said statistics in memory while we write a file.

## How to build

### install Bison

> configure: error: Bison version 2.5 or higher must be installed on the system!
>
```bash
brew install bison
export PATH="$(brew --prefix bison)/bin:$PATH"
export LDFLAGS="-L/opt/homebrew/opt/bison/lib"
```


### install thrift

```bash
wget -nv http://archive.apache.org/dist/thrift/0.16.0/thrift-0.16.0.tar.gz
tar xzf thrift-0.16.0.tar.gz
cd thrift-0.16.0
chmod +x ./configure
./configure --disable-libs
sudo make install
```

You might need to comment an unused variable in the code to make the compilation work:
```
src/thrift/generate/t_java_generator.cc:5371:9: error: variable 'j' set but not used [-Werror,-Wunused-but-set-variable]
int j = 0;
^
```
See https://github.com/apache/thrift/pull/2855/files


Or patch some functions:
```
In file included from src/thrift/generate/t_netstd_generator.cc:38:
./src/thrift/generate/t_netstd_generator.h:151:10: error: 'get_enum_class_name' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
string get_enum_class_name(t_type* type);
```

This is simply marking functions with "overrides", ex:

```cpp
void generate_struct(t_struct* tstruct) override;
void generate_xception(t_struct* txception) override;
void generate_service(t_service* tservice) override;
```
See https://lists.apache.org/thread/cmkys7y63kl9th8132y5d42c660s7o4d
### add support for artifact registry
In root pom.xml add:
```
<build>
<extensions>
<extension>
<groupId>com.google.cloud.artifactregistry</groupId>
<artifactId>artifactregistry-maven-wagon</artifactId>
<version>2.2.1</version>
</extension>
</extensions>
</build>
```
### fix deprecated repository
In:
> $HOME/.m2/settings.xml
Add:
```xml
<settings
xmlns="http://maven.apache.org/SETTINGS/1.2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.2.0 http://maven.apache.org/xsd/settings-1.2.0.xsd">
<mirrors>
<mirror>
<id>twitter</id>
<url>https://maven.twttr.com</url>
<mirrorOf>twitter</mirrorOf>
</mirror>
<mirror>
<id>conjars.org</id>
<url>https://conjars.wensel.net/repo/</url>
<mirrorOf>conjars.org</mirrorOf>
</mirror>
</mirrors>
</settings>
```


### build maven

```bash
LC_ALL=C mvn clean install -DskipTests -Dlicense.skip -Drat.skip -Djapicmp.skip -Dos.arch=x86_64
```

### publish

```bash
LC_ALL=C mvn clean deploy -DskipTests -Dlicense.skip -Drat.skip -Djapicmp.skip -Dos.arch=x86_64 -DaltDeploymentRepository=oncrawl.releases::default::artifactregistry://europe-maven.pkg.dev/oncrawl/java
```

0 comments on commit 0ce50bf

Please sign in to comment.