HBase数据导入ImportTsv-快上网网站建设公司

HBase数据导入ImportTsv

ImportTsv 工具是通过map reduce 完成的。所以要启动yarn. 工具要使用jar包，所以注意配置classpath。ImportTsv默认是通过hbase api 插入数据的

成都创新互联公司服务项目包括昭化网站建设、昭化网站制作、昭化网页制作以及昭化网络营销策划等。多年来，我们专注于互联网行业，利用自身积累的技术优势、行业经验、深度合作伙伴关系等，向广大中小型企业、政府机构等提供互联网行业的解决方案，昭化网站推广取得了明显的社会效益与经济效益。目前，我们服务的客户以成都为中心已经辐射到昭化省份的部分城市，未来相信会继续扩大服务区域并继续获得客户的支持与信任！

[hadoop-user@rhel work]$ cat /home/hadoop-user/.bash_profile

# .bash_profile

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

. ~/.bashrc

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH

JAVA_HOME=/usr/java/jdk1.8.0_171-amd64

PATH=$PATH:$JAVA_HOME/bin

CLASSPATH=$CLASSPATH:$JAVA_HOME/lib

HADOOP_HOME=/home/hadoop-user/hadoop-2.8.0

PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib

HBASE_HOME=/home/hadoop-user/hbase-2.0.0

PATH=$PATH:$HBASE_HOME/bin

CLASSPATH=$CLASSPATH:$HBASE_HOME/lib

ZOOKEEPER_HOME=/home/hadoop-user/zookeeper-3.4.12

PATH=$PATH:$ZOOKEEPER_HOME/bin

PHOENIX_HOME=/home/hadoop-user/apache-phoenix-5.0.0-alpha-HBase-2.0-bin

PATH=$PATH:$PHOENIX_HOME/bin

export PATH

创建表

hbase(main):033:0> create 'test','cf'

创建要导入的文件

[hadoop-user@rhel work]$ cat /home/hadoop-user/work/sample1.csv

row10,"mjj10"

row11,"mjj11"

row12,"mjj12"

row14,"mjj13"

将文件放入hdfs

[hadoop-user@rhel work]$ hdfs dfs -put /home/hadoop-user/work/sample1.csv /sample1.csv

ImportTsv导入命令

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,cf:a test /sample1.csv

注: HBASE_ROW_KEY表示文件rowid的位置，后面是列的定义。这里意思是导入的列为列族为cf,列名为a。要导入的文件是hdfs中/sample1.csv

帮助中的解释

Usage: importtsv -Dimporttsv.columns=a,b,c

Imports the given input directory of TSV data into the specified table.

The column names of the TSV data must be specified using the -Dimporttsv.columns

option. This option takes the form of comma-separated column names, where each

column name is either a simple column family, or a columnfamily:qualifier. The special

column name HBASE_ROW_KEY is used to designate that this column should be used

as the row key for each imported record. You must specify exactly one column

to be the row key, and you must specify a column name for every column that exists in the

input data. Another special columnHBASE_TS_KEY designates that this column should be

used as timestamp for each record. Unlike HBASE_ROW_KEY, HBASE_TS_KEY is optional.

You must specify at most one column as timestamp key for each imported record.

Record with invalid timestamps (blank, non-numeric) will be treated as bad record.

Note: if you use this option, then 'importtsv.timestamp' option will be ignored.

注意: ImportTsv导入的内容，phoenix看不到。事实上，hbase创建的表，Phoenix看不到。phoenix创建的表，hbase能看到，但是内容是编码后的内容。

importtsv 工具默认使用hbase put api导数据.当使用选项 -Dimporttsv.bulk.output时，将会先生成HFILE文件的内部格式的文件。

The importtsv tool, by default, uses the HBase Put API to insert data into the HBase

table using TableOutputFormat in its map phase. But when the -Dimporttsv.bulk.output option is specified, it instead generates HBase internal format (HFile) files on HDFS

by using HFileOutputFormat. Therefore, we can then use the completebulkload tool to load the generated files into a running cluster. The following steps are to use the bulk output and load tools:

生成HFILE格式的文件命令

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.bulk.output=/hfiles_tsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:a test /sample1.csv

注: 生成hfile格式的文件，存放于hdfs中的/hfile_tsv目录中，目录会由命令自己创建。

[hadoop-user@rhel work]$ hdfs dfs -ls /hfiles_tsv/cf

18/06/28 10:49:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 1 items

-rw-r--r-- 1 hadoop-user supergroup 5125 2018-06-28 10:40 /hfiles_tsv/cf/0e466616d42a4a128fb60caa7dbe075a

注: 0e466616d42a4a128fb60caa7dbe075a的命名格式跟WEB中region的命名格式很像

通过

hadoop jar hbase-server-2.0.0.jar completebulkload /hfiles_tsv 'test'

出现异常； Exception in thread "main" java.lang.ClassNotFoundException: completebulkload

HBASE文档中两种导入方式:

There are two ways to invoke this utility, with explicit classname and via the driver:

Explicit Classname

$ bin/hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles

Driver

HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-VERSION.jar completebulkload

本文标题：HBase数据导入ImportTsv
文章URL：http://cdkjz.cn/article/ihoiig.html

多年建站经验

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

咨询相关问题或预约面谈，可以通过以下方式与我们联系

网站建设

网站推广

案例

方案

电商网站开发

微信小程序

我们

联系

精准传达 • 有效沟通

查看其它板块

HBase数据导入ImportTsv

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

大客户专线成都：13518219792 座机：028-86922220

友情链接交换友情链接

网络推广

Network promotion

网站方案

Solution

电商网站开发

E-commerce & System

我们

About Us

联系

Contact Us

精准传达 • 有效沟通

查看其它板块

HBase数据导入ImportTsv

相关资讯

OracleScheduler能实现哪些功能

如何使用Hive合并小文件

IE和Firefox中Javascript兼容性问题的示例分析

如何在python中使用正则表达式

小程序中如何实现点击控件后选中其它反选功能

Java可重入锁的实现原理与应用场景

css怎样实现下拉菜单

DecksetforMac有什么用

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

大客户专线 成都：13518219792 座机：028-86922220

友情链接 交换友情链接

大客户专线成都：13518219792 座机：028-86922220

友情链接交换友情链接