Input type must be string type but got ArrayType(StringType,true) error in Spark using Scala -
i new spark , using scala create basic classifier. reading textfile dataset , splitting training , test data sets. i'm trying tokenize training data fails
caused by: java.lang.illegalargumentexception: requirement failed: input type must string type got arraytype(stringtype,true). @ scala.predef$.require(predef.scala:224) @ org.apache.spark.ml.feature.regextokenizer.validateinputtype(tokenizer.scala:149) @ org.apache.spark.ml.unarytransformer.transformschema(transformer.scala:110) @ org.apache.spark.ml.pipeline$$anonfun$transformschema$4.apply(pipeline.scala:180) @ org.apache.spark.ml.pipeline$$anonfun$transformschema$4.apply(pipeline.scala:180) @ scala.collection.indexedseqoptimized$class.foldl(indexedseqoptimized.scala:57) @ scala.collection.indexedseqoptimized$class.foldleft(indexedseqoptimized.scala:66) @ scala.collection.mutable.arrayops$ofref.foldleft(arrayops.scala:186) @ org.apache.spark.ml.pipeline.transformschema(pipeline.scala:180) @ org.apache.spark.ml.pipelinestage.transformschema(pipeline.scala:70) @ org.apache.spark.ml.pipeline.fit(pipeline.scala:132) @ com.classifier.classifier_app.app$.<init>(app.scala:91) @ com.classifier.classifier_app.app$.<clinit>(app.scala) ... 1 more
error.
the code below:
val input_path = "path//to//file.txt" case class sentence(value: string) val sentencesds = spark.read.textfile(input_path).as[sentence] val array(trainingdata, testdata) = sentencesds.randomsplit(array(0.7, 0.3)) val tokenizer = new tokenizer() .setinputcol("value") .setoutputcol("words") val pipeline = new pipeline().setstages(array(tokenizer, regextokenizer, remover, hashingtf, ovr)) val model = pipeline.fit(trainingdata)
how solve this? appreciated.
i have defined stages in pipeline haven't put them here in code snippet.
use guide
http://spark.apache.org/docs/latest/ml-pipeline.html#pipeline-components
and please provide whole code, code snippet above not clear if have defined required arguments setstages
method or not.
Comments
Post a Comment