java - What is configuartion required to get data from object storage by SWIFT in Spark -
i go through document still confusing how data swift.
i configured swift in 1 linux machine. using below command able container list,
swift -a https://acc.objectstorage.softlayer.net/auth/v1.0/ -u username -k passwordkey list
i seen many blog blumix(https://console.ng.bluemix.net/docs/services/analyticsforapachespark/index-gentopic1.html#gentopprocid2) , written below code
sc.textfile("swift://container.myacct/file.xml")
i looking integrate in java spark. need configure object storage credential in java code. there sample code or blog?
this notebook illustrates number of ways load data using scala language. scala runs on jvm. java , scala classes can freely mixed, no matter whether reside in different projects or in same. looking @ mechanics of how scala code interacts openstack swift object storage should guide craft java equivalent.
from above notebook, here steps illustrating how configure , extract data openstack swift object storage instance using stocator library using scala language. swift url decomposes into:
swift2d :// container . myacct / filename.extension ^ ^ ^ ^ stocator name of namespace object storage protocol container filename
imports
import org.apache.spark.sparkcontext import scala.util.control.nonfatal import play.api.libs.json.json val sqlctx = new sqlcontext(sc) val scplain = sqlctx.sparkcontext
sample creds
// @hidden_cell var credentials = scala.collection.mutable.hashmap[string, string]( "auth_url"->"https://identity.open.softlayer.com", "project"->"object_storage_3xxxxxx3_xxxx_xxxx_xxxx_xxxxxxxxxxxx", "project_id"->"6xxxxxxxxxx04fxxxxxxxxxx6xxxxxx7", "region"->"dallas", "user_id"->"cxxxxxxxxxxaxxxxxxxxxx1xxxxxxxxx", "domain_id"->"cxxxxxxxxxxaxxyyyyyyxx1xxxxxxxxx", "domain_name"->"853255", "username"->"admin_cxxxxxxxxxxaxxxxxxxxxx1xxxxxxxxx", "password"->"""&m7372!fake""", "container"->"notebooks", "tenantid"->"undefined", "filename"->"file.xml" )
helper method
def setremoteobjectstorageconfig(name:string, sc: sparkcontext, dsconfiguration:string) : boolean = { try { val result = scala.util.parsing.json.json.parsefull(dsconfiguration) result match { case some(e:map[string,string]) => { val prefix = "fs.swift2d.service." + name val hconf = sc.hadoopconfiguration hconf.set("fs.swift2d.impl","com.ibm.stocator.fs.objectstorefilesystem") hconf.set(prefix + ".auth.url", e("auth_url") + "/v3/auth/tokens") hconf.set(prefix + ".tenant", e("project_id")) hconf.set(prefix + ".username", e("user_id")) hconf.set(prefix + ".password", e("password")) hconf.set(prefix + "auth.method", "keystonev3") hconf.set(prefix + ".region", e("region")) hconf.setboolean(prefix + ".public", true) println("successfully modified sparkcontext object remote object storage credentials using datasource name " + name) println("") return true } case none => println("failed.") return false } } catch { case nonfatal(exc) => println(exc) return false } }
load data
val setobjstor = setremoteobjectstorageconfig("sparksql", scplain, json.tojson(credentials.tomap).tostring) val data_rdd = scplain.textfile("swift2d://notebooks.sparksql/" + credentials("filename")) data_rdd.take(5)
Comments
Post a Comment