PredictionIO – A Machine Learning Server in Scala – SF Scala

Engineering

predictionio
of 27
Description
Text
  • Building and Deploying ML Applications on production in a fraction of the time. A Machine Learning Server in Scala
  • Available Tools Processing Framework • e.g. Apache Spark, Apache Hadoop Algorithm Libraries • e.g. MLlib, Mahout Data Storage • e.g. HBase, Cassandra
  • Integrate everything together nicely and move from prototyping to production. What is Missing?
  • You have a mobile app A Classic Recommender Example… App Predict products You need a Recommendation Engine Predict products that a customer will like – and show it. Predictive model Algorithm - You don't need to write your own: 
 Spark MLlib - ALS algorithm
 Predictive model - based on users’ previous behaviors
  • def pseudocode () { // Read training data 
 val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match 
 { …. }) // Build a predictive model with an algorithm
 val model = ALS.train(trainingData, 10, 20, 0.01) // Make prediction 
 allUsers.foreach { user =>
 model.recommendProducts(user, 5)
 } } A Classic Recommender Example prototyping…
  • • How to deploy a scalable service that respond to dynamic prediction query? • How do you persist the predictive model, in a distributed environment? • How to make HBase, Spark and algorithms talking to each other? • How should I prepare, or transform, the data for model training? • How to update the model with new data without downtime? • Where should I add some business logics? • How to make the code configurable, re-usable and maintainable? • How do I build all these with a separate of concerns (SoC)? Beyond Prototyping
  • Engine Event Server (data storage) Data: User Actions Query via REST: User ID Predicted Result: A list of Product IDs A Classic Recommender Example on production… Mobile App
  • • PredictionIO is a machine learning server for building and deploying predictive engines 
 on production
 in a fraction of the time. • Built on Apache Spark, MLlib and HBase. PredictionIO
  • Data: User Actions Query via REST: User ID Predicted Result: A list of Product IDs Engine Event Server (data storage) Mobile App Event Server
  • • $ pio eventserver • Event-based client.create_event( event="rate", entity_type="user", entity_id=“user_123”, target_entity_type="item", target_entity_id=“item_100”, properties= { "rating" : 5.0 } ) Event Server Collecting Date
  • Query via REST: User ID Predicted Result: A list of Product IDs Engine Data: User Actions Event Server (data storage) Mobile App Engine
  • • DASE - the “MVC” for Machine Learning • Data: Data Source and Data Preparator • Algorithm(s) • Serving • Evaluator Engine Building an Engine with Separation of Concerns (SoC)
  • A. Train deployable predictive model(s) B. Respond to dynamic query C. Evaluation Engine Functions of an Engine
  • Engine A. Train predictive model(s) class DataSource(…) extends PDataSource def readTraining(sc: SparkContext) ==> trainingData class Preparator(…) extends PPreparator def prepare(sc: SparkContext, trainingData: TrainingData) ==> preparedData class Algorithm1(…) extends PAlgorithm def train(prepareData: PreparedData) ==> Model $ pio train
  • Engine A. Train predictive model(s) class DataSource(…) extends PDataSource override def readTraining(sc: SparkContext): TrainingData = { val eventsDb = Storage.getPEvents() val eventsRDD: RDD[Event] = eventsDb.find(….)(sc) val ratingsRDD: RDD[Rating] = eventsRDD.map { event => val rating = try { val ratingValue: Double = event.event match {….} Rating(event.entityId, event.targetEntityId.get, ratingValue) } catch {…} rating } new TrainingData(ratingsRDD) }
  • Engine A. Train predictive model(s) class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm def train(preparedData: PreparedData): Model1 = { mllibRatings = data…. val m = ALS.train(mllibRatings, ap.rank, ap.numIterations, ap.lambda) new Model1( rank = m.rank, userFeatures = m.userFeatures, productFeatures = m.productFeatures ) }
  • Engine A. Train predictive model(s) Event Server Algorithm 1 Algorithm 3Algorithm 2 PreparedDate Engine Data Preparator Data Source TrainingDate Model 3Model 1Model 2
  • B. Respond to dynamic queryEngine • Query (Input) :
 
 $ curl -H "Content-Type: application/json" -d 
 '{ "user": "1", "num": 4 }' 
 http://localhost:8000/queries.json case class Query( val user: String, val num: Int ) extends Serializable
  • B. Respond to dynamic queryEngine • Predicted Result (Output):
 
 {“itemScores”:[{"item":"22","score":4.072304374729956}, {"item":"62","score":4.058482414005789}, 
 {"item":"75","score":4.046063009943821}]} case class PredictedResult( val itemScores: Array[ItemScore] ) extends Serializable case class ItemScore( item: String, score: Double ) extends Serializable
  • class Algorithm1(…) extends PAlgorithm def predict(model: ALSModel, query: Query) ==> predictedResult class Serving extends LServing def serve(query: Query, predictedResults: Seq[PredictedResult]) ==> predictedResult B. Respond to dynamic queryEngine Query via REST
  • Engine B. Respond to dynamic query class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm def predict(model: ALSModel, query: Query): PredictedResult = { model….{ userInt => val itemScores = model.recommendProducts (…).map (….) new PredictedResult(itemScores) }.getOrElse{….} }
  • B. Respond to dynamic queryEngine Algorithm 1 Model 1 Serving Mobile App Algorithm 3 Model 3 Algorithm 2 Model 2 Predicted Results Query (input) Predicted Result (output) Engine
  • Engine DASE Factory object RecEngine extends IEngineFactory { def apply() = { new Engine( classOf[DataSource], classOf[Preparator], Map("algo1" -> classOf[Algorithm1]), classOf[Serving]) } }
  • Running on Production • Install PredictionIO
 $ bash -c "$(curl -s http://install.prediction.io/install.sh)" • Start the Event Server 
 $ pio eventserver • Deploy an Engine
 $ pio build; pio train; pio deploy • Update Engine Model with New Data 
 $ pio train; pio deploy
  • Deploy on Production Website Mobile App Email Campaign Event Server (data storage) Data Query via REST Predicted Result Engine 1 Engine 3 Engine 2 Engine 4
  • The Next Step • Quickstart with an Engine Template! • Follow on Github: github.com/predictionio/ • Learn PredictionIO: prediction.io/ • Learn Scala! Scala for the Impatient • Contribute!
  • Thanks. Simon Chan simon@prediction.io @PredictionIO prediction.io (Newsletters) github.com/predictionio mailto:simon@prediction.io
Comments
Top