配置一定数量后配置自动增量

编程入门行业动态更新时间:2024-10-24 12:21:08

本文介绍了配置一定数量后配置自动增量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一个插入数据到目标表，其中所有的列应该填充从不同的源表，除了代理键列;这应该是目标表的最大值加上自动增量值开始1.我可以通过使用row_number（）函数生成自动增量值，但是在同一个查询中，我应该如何从目标表中获取代理键的最大值。 HIVE中是否有任何概念可以选择代理键的最大值并将其保存在临时变量中？或者有没有其他简单的方法来达到这个结果？

解决方案

以上两种方法可以解决上述问题。（通过示例进行解释）

方法1：使用shell脚本通过$ {hiveconf}变量获取最大值并设置为配置单元命令

方法2：使用row_sequence（），max（）和join操作 $ b 我的环境：

hadoop-2.6.0 apache-hive-2.0.0-bin
步骤：（注意：步骤1步骤1：创建源表和目标表

/ p>
源
配置单元> create table source_table1（字符串名称）; hive> create table source_table2（string name）; hive> create table source_table2（string name）;
target
hive> create table target_table（int id，string name）;
第2步：将数据加载到源表中
hive>加载数据本地inpath'source_table1.txt'放入表中source_table1; hive>将数据本地inpath'source_table2.txt'加载到表中source_table2; hive>将数据本地inpath'source_table3.txt'加载到表中source_table3;
样本输入：
source_table1.txt
a b c
source_table2.txt
d e f
source_table3.txt g h i
方式1：第3步：创建一个shell脚本hive_auto_increment.sh
＃！/ bin / sh hive -e'从target_table'选择max（id）> max.txt wait value =`cat max.txt` hive --hiveconf mx = $ value -eadd jar /home/apache-hive-2.0.0-bin/ lib / hive-contrib-2.0.0.jar; 创建临时函数row_sequence as'org.apache.hadoop.hive.contrib.udf.UDFRowSequence'; set mx; set hiveconf ：mx; INSERT INTO TABLE target_table SELECT from source_table1; row_sequence（）; INSERT INTO TABLE target_table SELECT（\ $ {hiveconf：mx} + row_sequence（）），来自source_table2的名称; INSERT INTO TABLE target_table SELECT（\ $ {hiveconf：mx} + row_sequence（）），来自source_table3的名称; 等待 hive -eselect * from target_table;
第4步：运行shell脚本
> bash hive_auto_increment.sh
方法2：
第3步：添加Jar
配置单元>添加jar / home /apache-hive-2.0.0-bin/lib/hive-contrib-2.0.0.jar;
第四步：借助hive contrib jar注册row_sequence函数 p>
hive>创建临时函数row_sequence as'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
第5步：将source_table1加载到target_table
hive> INSERT INTO TABLE target_table select row_sequence（），name from source_table1;
第6步：加载其他来源到target_table
hive> INSERT INTO TABLE target_table SELECT mrowcount + row_sequence（），来自source_table2的T.name T join（select max（id）as rowcount from target_table）M; hive> INSERT INTO TABLE target_table SELECT mrowcount + row_sequence（），来自source_table3的T.name T join（从target_table中选择max（id）作为rowcount）M;
输出：
INFO：OK + --------------- + -------------- --- + - + | target_table.id | target_table.name + --------------- + ----------------- + - + | 1 | a | | 2 | b | | 3 | c | | 4 | d | | 5 | e | | 6 | f | | 7 | g | | 8 | h | | 9 |我|

I have a to insert data into a target table where all columns should be populated from different source tables except the surrogate key column; which should be maximum value of the target table plus auto increment value starting 1. I can generate auto increment value by using row_number() function, but in the same query how should I get the max value of surrogate key from target table. Is there any concept in HIVE where I can select the max value of surrogate key and save it in a temporary variable? Or is there any other simple way to achieve this result?
解决方案
Here are two approaches which worked for me for the above problem. ( explained with example)

Approach 1: getting the max and setting to hive commands through ${hiveconf} variable using shell script

Approach 2: using row_sequence(), max() and join operations

My Environment:
hadoop-2.6.0 apache-hive-2.0.0-bin
Steps: (note: step 1 and step 2 are common for both approaches. Starting from step 3 , it differs for both)

Step 1: create source and target tables

source
hive>create table source_table1(string name); hive>create table source_table2(string name); hive>create table source_table2(string name);
target
hive>create table target_table(int id,string name);
Step 2: load data into source tables
hive>load data local inpath 'source_table1.txt' into table source_table1; hive>load data local inpath 'source_table2.txt' into table source_table2; hive>load data local inpath 'source_table3.txt' into table source_table3;
Sample Input:

source_table1.txt
a b c
source_table2.txt
d e f
source_table3.txt
g h i
Approach 1:

Step 3: create a shell script hive_auto_increment.sh
#!/bin/sh hive -e 'select max(id) from target_table' > max.txt wait value=`cat max.txt` hive --hiveconf mx=$value -e "add jar /home/apache-hive-2.0.0-bin/lib/hive-contrib-2.0.0.jar; create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence'; set mx; set hiveconf:mx; INSERT INTO TABLE target_table SELECT row_sequence(),name from source_table1; INSERT INTO TABLE target_table SELECT (\${hiveconf:mx} +row_sequence()),name from source_table2; INSERT INTO TABLE target_table SELECT (\${hiveconf:mx} +row_sequence()),name from source_table3;" wait hive -e "select * from target_table;"
Step 4: run the shell script
> bash hive_auto_increment.sh
Approach 2:

Step 3: Add Jar
hive>add jar /home/apache-hive-2.0.0-bin/lib/hive-contrib-2.0.0.jar;
Step 4: register row_sequence function with help of hive contrib jar
hive>create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
Step 5: load the source_table1 to target_table
hive>INSERT INTO TABLE target_table select row_sequence(),name from source_table1;
Step 6: load the other sources to target_table
hive>INSERT INTO TABLE target_table SELECT M.rowcount+row_sequence(),T.name from source_table2 T join (select max(id) as rowcount from target_table) M; hive>INSERT INTO TABLE target_table SELECT M.rowcount+row_sequence(),T.name from source_table3 T join (select max(id) as rowcount from target_table) M;
output:
INFO : OK +---------------+-----------------+--+ | target_table.id | target_table.name +---------------+-----------------+--+ | 1 | a | | 2 | b | | 3 | c | | 4 | d | | 5 | e | | 6 | f | | 7 | g | | 8 | h | | 9 | i |

更多推荐

配置一定数量后配置自动增量

本文发布于:2023-11-29 00:27:06，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1644564.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

增量数量

上一篇： Windows Server 2016使用MBR2GPT.EXE教程！

下一篇：经过一定数量的步骤后停止递归

发布评论取消回复

评论列表（有 0 条评论）

最近发表

荆门网站建设的重要性

win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法

您可以尝试添加 --skip-broken 选项来解决该问题您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案

关于无线网络波动大的解决办法

Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法

VS 2019 点击页面自动定位到解决方案资源管理器目录位置

（亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法

Typora官网下载的最新版本mac10.13以下版本用不了的解决办法

成功解决ModuleNotFoundError: No module named ‘torch._C‘

MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍！

热门文章

从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止

币安API错误代码1102，未发送强制参数“时间戳”

如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题

在 Node.js 中从网络流创建 blob

使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？

使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流

如何从nodejs连接laravel>laravel

使用nodejs观看目录

如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？

FirebaseError：无法从.env加载环境变量

标签列表

文件

如何在

Python

系统

java

方法

数据

错误

windows

函数

android

linux

教程

如何使用

代码

字符串

计算机

电脑

服务器

NET

应用程序

数组

PHP

MySQL

SQL

对象

项目

程序

数据库

word