Skip to content

Local Hadoop

Hurley edited this page Dec 23, 2020 · 1 revision

Transfer data between hdfs to database

hdfs and database are two entities, and here's how you can transfer data between each other:

From hdfs to database

  1. create a table in database (using Impala or Hive)
create table <tablename> (id string, name string)
row format delimited
fields terminated by ','
lines terminated BY '\n'
stored as textfile

IMPORTANT: your table name show be all lowercases

  1. insert your table from hdfs to the table on database you just created
hdfs dfs -copyFromLocal <path_of_your_table_on_hdfs> /encrypted/warehouse/<database_name>.db/<tablename>
  1. if you create your table using Hive, in order to see the table on Impala, you need to run following on Impala:
invalidate metadata <tablename>
compute stats <tablename>

From database to hdfs

  1. concat all sub-files from database to hdfs
  2. change delimited
hadoop fs -cat /encrypted/warehouse/<database_name>.db/<tablename>/* | tr '\001' '\t' > <path_of_file_on_hdfs>
Clone this wiki locally