当前正在使用spark 2.0.1和2.2.1在我的spark-shell中玩自定义变形金刚.
Currently playing with custom tranformers in my spark-shell using both spark 2.0.1 and 2.2.1.
在编写自定义ml转换器时,为了将其添加到管道中,我注意到复制方法的覆盖存在问题.
While writing a custom ml transformer, in order to add it to a pipeline, I noticed that there is an issue with the override of the copy method.
在我的情况下,copy方法由TrainValidationSplit的fit方法调用.
The copy method is called by the fit method of the TrainValidationSplit in my case.
我得到的错误:
java.lang.NoSuchMethodException: Custom.<init>(java.lang.String) at java.lang.Class.getConstructor0(Class.java:3082) at java.lang.Class.getConstructor(Class.java:1825) at org.apache.spark.ml.param.Params$class.defaultCopy(params.scala:718) at org.apache.spark.ml.PipelineStage.defaultCopy(Pipeline.scala:42) at Custom.copy(<console>:16) ... 48 elided然后我尝试直接调用copy方法,但仍然出现相同的错误.
I then tried to directly call the copy method but I still get the same error.
这是myclass和我执行的呼叫:
Here is myclass and the call I perform :
import org.apache.spark.ml.Transformer import org.apache.spark.sql.{Dataset, DataFrame} import org.apache.spark.sql.types.{StructField, StructType, DataTypes} import org.apache.spark.ml.param.{Param, ParamMap} // Simple DF val doubles = Seq((0, 5d, 100d), (1, 4d,500d), (2, 9d,700d)).toDF("id", "rating","views") class Custom(override val uid: String) extends org.apache.spark.ml.Transformer { def this() = this(org.apache.spark.ml.util.Identifiable.randomUID("custom")) def copy(extra: org.apache.spark.ml.param.ParamMap): Custom = { defaultCopy(extra) } override def transformSchema(schema: org.apache.spark.sql.types.StructType): org.apache.spark.sql.types.StructType = { schema.add(org.apache.spark.sql.types.StructField("trending", org.apache.spark.sql.types.IntegerType, false)) } def transform(df: org.apache.spark.sql.Dataset[_]): org.apache.spark.sql.DataFrame = { df.withColumn("trending", (df.col("rating") > 4 && df.col("views") > 40)) } } val mycustom = new Custom("Custom") // This call throws the exception. mycustom.copy(new org.apache.spark.ml.param.ParamMap())有人知道这是否是已知问题吗?我似乎在任何地方都找不到它.
Does anyone know if this is a known issue ? I cant seem to find it anywhere.
还有另一种在自定义转换器中实现copy方法的方法吗?
Is there another way to implement the copy method in a custom transformer ?
谢谢
推荐答案对于自定义Transformer,我需要更改以下几项内容(还可以启用PipelineModel的SerDe操作):
These are a couple of things that I would change about your custom Transformer (also to enable SerDe operations of your PipelineModel):
- 实施 DefaultParamsWritable 特征
- 添加一个扩展 DefaultParamsReadable 接口
- Implement the DefaultParamsWritable trait
- Add a Companion object that extends the DefaultParamsReadable Interface
例如
class Custom(override val uid: String) extends Transformer with DefaultParamsWritable { ... ... } object Custom extends DefaultParamsReadable[Custom]看看 UnaryTransformer (如果只有1个Input/Output列).
Do take a look at the UnaryTransformer if you have only 1 Input/Output columns.
最后,确切地调用mycustom.copy(new ParamMap())有什么需要?
Finally, what's the need to call mycustom.copy(new ParamMap()) exactly??
更多推荐
java.lang.NoSuchMethodException:复制自定义Transformer时< Class>.<
发布评论