如何使用Spark 2屏蔽列?

编程入门 行业动态 更新时间:2024-10-22 10:40:00
本文介绍了如何使用Spark 2屏蔽列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一些表需要掩盖其中的某些列.每个表中要屏蔽的列各不相同,我正在从application.conf文件中读取这些列.

I have some tables in which I need to mask some of its columns. Columns to be masked vary from table to table and I am reading those columns from application.conf file.

例如,如下所示的员工表

For example, for employee table as shown below

+----+------+-----+---------+ | id | name | age | address | +----+------+-----+---------+ | 1 | abcd | 21 | India | +----+------+-----+---------+ | 2 | qazx | 42 | Germany | +----+------+-----+---------+

如果要屏蔽名称和年龄列,那么我将按顺序获取这些列.

if we want to mask name and age columns then I get these columns in an sequence.

val mask = Seq("name", "age")

屏蔽后的预期值为:

+----+----------------+----------------+---------+ | id | name | age | address | +----+----------------+----------------+---------+ | 1 | *** Masked *** | *** Masked *** | India | +----+----------------+----------------+---------+ | 2 | *** Masked *** | *** Masked *** | Germany | +----+----------------+----------------+---------+

如果我有雇员表一个数据框,那么屏蔽这些列的方法是什么?

If I have employee table an data frame, then what is the way to mask these columns?

如果我具有如下所示的payment表,并且想要屏蔽name和salary列,那么我在Sequence中将掩码列设为

If I have payment table as shown below and want to mask name and salary columns then I get mask columns in Sequence as

+----+------+--------+----------+ | id | name | salary | tax_code | +----+------+--------+----------+ | 1 | abcd | 12345 | KT10 | +----+------+--------+----------+ | 2 | qazx | 98765 | AD12d | +----+------+--------+----------+

val mask = Seq("name", "salary")

我尝试了类似mask.foreach(c => base.withColumn(c, regexp_replace(col(c), "^.*?$", "*** Masked ***" ) ) )的操作,但未返回任何内容.

I tried something like this mask.foreach(c => base.withColumn(c, regexp_replace(col(c), "^.*?$", "*** Masked ***" ) ) ) but it did not returned anything.

由于@philantrovert,我找到了解决方案.这是我使用的解决方案:

Thanks to @philantrovert, I found out the solution. Here is the solution I used:

def maskData(base: DataFrame, maskColumns: Seq[String]) = { val maskExpr = base.columns.map { col => if(maskColumns.contains(col)) s"'*** Masked ***' as ${col}" else col } base.selectExpr(maskExpr: _*) }

推荐答案

您的声明

mask.foreach(c => base.withColumn(c, regexp_replace(col(c), "^.*?$", "*** Masked ***" ) ) )

将返回一个听起来不太好的List[org.apache.spark.sql.DataFrame].

will return a List[org.apache.spark.sql.DataFrame] which doesn't sound too good.

您可以使用selectExpr并使用:

base.show +---+----+-----+-------+ | id|name| age|address| +---+----+-----+-------+ | 1|abcd|12345| KT10 | | 2|qazx|98765| AD12d| +---+----+-----+-------+ val mask = Seq("name", "age") val expr = df.columns.map { col => if (mask.contains(col) ) s"""regexp_replace(${col}, "^.*", "** Masked **" ) as ${col}""" else col }

这将为序列mask

Array[String] = Array(id, regexp_replace(name, "^.*", "** Masked **" ) as name, regexp_replace(age, "^.*", "** Masked **" ) as age, address)

现在您可以在生成的序列上使用selectExpr

Now you can use selectExpr on the generated Sequence

base.selectExpr(expr: _*).show +---+------------+------------+-------+ | id| name| age|address| +---+------------+------------+-------+ | 1|** Masked **|** Masked **| KT10 | | 2|** Masked **|** Masked **| AD12d| +---+------------+------------+-------+

更多推荐

如何使用Spark 2屏蔽列?

本文发布于:2023-10-09 06:14:06,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1474922.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:如何使用   屏蔽   Spark

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!