Input type must be string type but got ArrayType(StringType,true) error in Spark using Scala -


i new spark , using scala create basic classifier. reading textfile dataset , splitting training , test data sets. i'm trying tokenize training data fails

caused by: java.lang.illegalargumentexception: requirement failed: input type must string type got arraytype(stringtype,true). @ scala.predef$.require(predef.scala:224) @ org.apache.spark.ml.feature.regextokenizer.validateinputtype(tokenizer.scala:149) @ org.apache.spark.ml.unarytransformer.transformschema(transformer.scala:110) @ org.apache.spark.ml.pipeline$$anonfun$transformschema$4.apply(pipeline.scala:180) @ org.apache.spark.ml.pipeline$$anonfun$transformschema$4.apply(pipeline.scala:180) @ scala.collection.indexedseqoptimized$class.foldl(indexedseqoptimized.scala:57) @ scala.collection.indexedseqoptimized$class.foldleft(indexedseqoptimized.scala:66) @ scala.collection.mutable.arrayops$ofref.foldleft(arrayops.scala:186) @ org.apache.spark.ml.pipeline.transformschema(pipeline.scala:180) @ org.apache.spark.ml.pipelinestage.transformschema(pipeline.scala:70) @ org.apache.spark.ml.pipeline.fit(pipeline.scala:132) @ com.classifier.classifier_app.app$.<init>(app.scala:91) @ com.classifier.classifier_app.app$.<clinit>(app.scala) ... 1 more 

error.

the code below:

val input_path = "path//to//file.txt"  case class sentence(value: string) val sentencesds = spark.read.textfile(input_path).as[sentence]    val array(trainingdata, testdata) = sentencesds.randomsplit(array(0.7, 0.3))       val tokenizer = new tokenizer()   .setinputcol("value")   .setoutputcol("words")  val pipeline = new pipeline().setstages(array(tokenizer, regextokenizer, remover, hashingtf, ovr)) val model = pipeline.fit(trainingdata) 

how solve this? appreciated.

i have defined stages in pipeline haven't put them here in code snippet.

use guide

http://spark.apache.org/docs/latest/ml-pipeline.html#pipeline-components

and please provide whole code, code snippet above not clear if have defined required arguments setstages method or not.


Comments

Popular posts from this blog

python - How to insert QWidgets in the middle of a Layout? -

python - serve multiple gunicorn django instances under nginx ubuntu -

module - Prestashop displayPaymentReturn hook url -