下载好之后解压, 然后修改 pom 文件, 在 pom 文件中新增如下内容:
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
修改 CDH 组件对应的版本, 组件版本具体信息在 CDH Web UI 中查看, 如下图
<lucene-solr.version>7.4.0-cdh6.3.2</lucene-solr.version>
<hadoop.version>3.0.0-cdh6.3.2</hadoop.version>
<hbase.version>2.1.0-cdh6.3.2</hbase.version>
<solr.version>7.4.0-cdh6.3.2</solr.version>
<hive.version>2.1.1-cdh6.3.2</hive.version>
<kafka.version>2.2.1-cdh6.3.2</kafka.version>
<kafka.scala.binary.version>2.11</kafka.scala.binary.version>
<calcite.version>1.16.0</calcite.version>
<zookeeper.version>3.4.5-cdh6.3.2</zookeeper.version>
<falcon.version>0.8</falcon.version>
<sqoop.version>1.4.7+cdh6.3.2</sqoop.version>
所需修改的项目位置: apache-atlas-sources-2.1.0/addons/hive-bridge
- org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 577行
public static String getDatabaseName(Database hiveDB) { String dbName = hiveDB.getName().toLowerCase(); // String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null; String catalogName = null; ...
- org/apache/atlas/hive/hook/AtlasHiveHookContext.java 81行
HiveHookObjectNamesCache knownObjects,
HiveMetastoreHook metastoreHook, ListenerEvent listenerEvent) throws Exception {
this.hook = hook;
this.hiveOperation = hiveOperation;
this.hiveContext = hiveContext;
this.hive = hiveContext != null ? Hive.get(hiveContext.getConf()) : null;
this.knownObjects = knownObjects;
this.metastoreHook = metastoreHook;
this.metastoreEvent = listenerEvent;
// this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null;
this.metastoreHandler = null;
如果使用外部的 HBase和Solr, 则使用如下编译命令
$ mvn clean -DskipTests package -Pdist
如果使用 atlas 自带的 HBase 和 Solr, 则使用如下编译命令
$ mvn clean -DskipTests package -Pdist,embedded-hbase-solr
$ tar -zxvf apache-atlas-2.1.0-bin.tar.gz
# 分发到从节点
# scp -r /opt/atlas root@kino2://opt/
# scp -r /opt/atlas root@kino3://opt/
$ vim atlas-application.properties
#修改atlas存储数据主机
atlas.graph.storage.backend=hbase
atlas.graph.storage.hbase.table=atlas
atlas.graph.storage.hostname=kino1:2181,kino2:2181,kino3:2181
$ ln -s /etc/hbase/conf/ /opt/atlas-2.1.0/conf/hbase
$ vim atlas-env.sh
#添加HBase配置文件路径
export HBASE_CONF_DIR=/etc/hbase/conf
$ vim atlas-application.properties
#修改如下配置
atlas.graph.index.search.solr.zookeeper-url=kino1:2181/solr,kino2:2181/solr,kino3:2181/solr
$ cp -r solr /opt/cloudera/parcels/CDH/lib/solr/
$ cd /opt/cloudera/parcels/CDH/lib/solr/
$ mv solr atlas_conf
# 将 atlas_conf 发送到其他从节点
$ scp -r atlas_conf root@kino2:///opt/cloudera/parcels/CDH/lib/solr/
$ scp -r atlas_conf root@kino3:///opt/cloudera/parcels/CDH/lib/solr/
# 将 solr 这一行修改为
$ solr:x:990:988:Solr:/var/lib/solr:/bin/bash # /sbin/nologin -> /bin/bash
$ su - solr
在安装 solr 的节点执行
$ /opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c vertex_index -p 8983 -d /opt/cloudera/parcels/CDH/lib/solr/atlas_conf -shards 3 -replicationFactor 2
INFO - 2021-10-25 18:23:28.091; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
Created collection 'vertex_index' with 3 shard(s), 2 replica(s) with config-set 'vertex_index'
$ /opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c edge_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas_conf -shards 3 -replicationFactor 2
INFO - 2021-10-25 18:23:43.906; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
Created collection 'edge_index' with 3 shard(s), 2 replica(s) with config-set 'edge_index'
$ /opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c fulltext_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas_conf -shards 3 -replicationFactor 2
INFO - 2021-10-25 18:23:51.357; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
Created collection 'fulltext_index' with 3 shard(s), 2 replica(s) with config-set 'fulltext_index'
- shards 3:表示该集合分片数为3
- replicationFactor 2:表示每个分片数都有2个备份
- vertex_index、edge_index、fulltext_index:表示集合名称
注意:如果需要删除vertex_index、edge_index、fulltext_index等collection可以执行如下命令:
$ /opt/cloudera/parcels/CDH/lib/solr/bin/solr delete -c ${collection_name}
登录solr web控制台: http://kino1:8983/solr/#/~cloud 看到如下图显示:
$ vim atlas-application.properties
atlas.notification.embedded=false
atlas.kafka.zookeeper.connect=kino1:2281,kino2:2281,kino3:2281
atlas.kafka.bootstrap.servers=kino1:9092,kino2:9092,kino3:9092
atlas.kafka.zookeeper.session.timeout.ms=4000
atlas.kafka.zookeeper.connection.timeout.ms=2000
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
atlas.kafka.enable.auto.commit=true
$ kafka-topics --zookeeper kino1:2281,kino2:2281,kino3:2281 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
$ kafka-topics --zookeeper kino1:2281,kino2:2281,kino3:2281 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
$ kafka-topics --zookeeper kino1:2281,kino2:2281,kino3:2281 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK
$ kafka-topics --list --zookeeper kino1:2281,kino2:2281,kino3:2281
ATLAS_ENTITIES
ATLAS_HOOK
_HOATLASOK
__consumer_offsets
....
$ vim atlas-application.properties
# 修改 addrss
atlas.rest.address=http://kino1:21000
# 打开注释
atlas.server.run.setup.on.start=false
# 修改 zookeeper地址
atlas.audit.hbase.zookeeper.quorum=kino1:2281,kino2:2281,kino3:2281
$ vim atlas-log4j.xml
#去掉如下代码的注释
<appender name="perf_appender" class="org.apache.log4j.DailyRollingFileAppender">
<param name="file" value="${atlas.log.dir}/atlas_perf.log" />
<param name="datePattern" value="'.'yyyy-MM-dd" />
<param name="append" value="true" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d|%t|%m%n" />
</layout>
</appender>
<logger name="org.apache.atlas.perf" additivity="false">
<level value="debug" />
<appender-ref ref="perf_appender" />
</logger>
$ scp atlas-application.properties root@kino2://opt/atlas/conf/
$ scp atlas-application.properties root@kino3://opt/atlas/conf/
$ scp atlas-log4j.xml root@kino2://opt/atlas/conf/
$ scp atlas-log4j.xml root@kino3://opt/atlas/conf/
$ cd /opt/atlas/atlas-2.1.0
$ bin/atlas_start.py
starting atlas on host localhost
starting atlas on port 21000
Apache Atlas Server started!!!
登录 atlas web控制台: http://kino1:21000 验证是否启动成功!
默认用户名和密码为:admin
该步骤所有的配置文件及 jar包都需要分发到hive相关的节点
$ vim atlas-application.properties
# 增加如下内容
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
将 编译好的 apache-atlas-2.1.0-hive-hook.tar.gz 上传到服务器上
$ tar -zxvf apache-atlas-2.0.0-hive-hook.tar.gz
将 hook和hook-bin 目录到到/app/atlas-2.1.0 文件夹中
$ cp hook* /app/atlas-2.1.0
将 atlas-application.properties 配置文件加入到 atlas-plugin-classloader-2.1.0.jar 中
$ cd /app/atlas-2.1.0/hook/hive
$ cp /app/atlas-2.1.0/conf/atlas-application.properties ./
$ zip -u atlas-plugin-classloader-2.1.0.jar atlas-application.properties
$ cp /app/atlas-2.1.0/conf/atlas-application.properties /etc/hive/conf/
$ rm -rf ./atlas-application.properties
原因:这个配置不能参照官网,将配置文件考到hive的conf中。参考官网的做法一直读取不到atlas-application.properties配置文件,看了源码发现是在classpath读取的这个配置文件,所以将它压到jar的根目录下面里面。
修改: hive-site.xml的Hive服务高级代码段(安全阀)
名称: hive.exec.post.hooks
值: org.apache.atlas.hive.hook.HiveHook
修改: hive-site.xml的Hive客户端高级代码段(安全阀)
名称: hive.exec.post.hooks
值: org.apache.atlas.hive.hook.HiveHook
修改: hive-env.sh 的 Gateway 客户端环境高级配置代码段(安全阀)
# HIVE_AUX_JARS_PATH=/opt/hook/hive
HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
JAVA_HOME=/app/jdk1.8.0_191
修改: hive-site.xml 的 HiveServer2 高级配置代码段(安全阀)
名称: hive.exec.post.hooks
值: org.apache.atlas.hive.hook.HiveHook
名称: hive.reloadable.aux.jars.path
值: /opt/atlas/atlas-2.1.0/hook/hive
修改: HiveServer2 环境高级配置代码段(安全阀)
HIVE_AUX_JARS_PATH=/opt/atlas/atlas-2.1.0/hook/hive
添加 Hive 环境变量, 如下:
$ export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
导出 Hive 元数据
$ bin/import-hive.sh
Using Hive configuration directory [/opt/cloudera/parcels/CDH/lib/hive/conf]
Log file for import is /opt/atlas/atlas-2.1.0/logs/import-hive.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'org.apache.logging.log4j.simplelog.StatusLogger.level' to TRACE to show Log4j2 internal initialization logging.
17:05:07.839 [main] ERROR org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Import failed
org.apache.atlas.AtlasException: Failed to load application properties
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:147) ~[atlas-intg-2.1.0.jar:2.1.0]
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:100) ~[atlas-intg-2.1.0.jar:2.1.0]
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:123) [hive-bridge-2.1.0.jar:2.1.0]
Caused by: org.apache.commons.configuration.ConfigurationException: Cannot locate configuration source null
at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:259) ~[commons-configuration-1.10.jar:1.10]
at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:238) ~[commons-configuration-1.10.jar:1.10]
at org.apache.commons.configuration.AbstractFileConfiguration.<init>(AbstractFileConfiguration.java:197) ~[commons-configuration-1.10.jar:1.10]
at org.apache.commons.configuration.PropertiesConfiguration.<init>(PropertiesConfiguration.java:284) ~[commons-configuration-1.10.jar:1.10]
at org.apache.atlas.ApplicationProperties.<init>(ApplicationProperties.java:83) ~[atlas-intg-2.1.0.jar:2.1.0]
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:136) ~[atlas-intg-2.1.0.jar:2.1.0]
... 2 more
Failed to import Hive Meta Data!!!
错误解决: 读取不到atlas-application.properties配置文件,将将/conf/atlas-application.properties复制到/conf/
$ cp conf/atlas-application.properties /etc/hive/conf/