PredictionIO – A Machine Learning Server in Scala – SF Scala

Engineering

predictionio
  • Building and Deploying ML Applications on production in a fraction of the time. A Machine Learning Server in Scala
  • Available Tools Processing Framework • e.g. Apache Spark, Apache Hadoop Algorithm Libraries • e.g. MLlib, Mahout Data Storage • e.g. HBase, Cassandra
  • Integrate everything together nicely and move from prototyping to production. What is Missing?
  • You have a mobile app A Classic Recommender Example… App Predict products You need a Recommendation Engine Predict products that a customer will like – and show it. Predictive model Algorithm - You don't need to write your own: 
 Spark MLlib - ALS algorithm
 Predictive model - based on users’ previous behaviors
  • def pseudocode () { // Read training data 
 val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match 
 { …. }) // Build a predictive model with an algorithm
 val model = ALS.train(trainingData, 10, 20, 0.01) // Make prediction 
 allUsers.foreach { user =>
 model.recommendProducts(user, 5)
 } } A Classic Recommender Example prototyping…
  • • How to deploy a scalable service that respond to dynamic prediction query? • How do you persist the predictive model, in a distributed environment? • How to make HBase, Spark and algorithms talking to each other? • How should I prepare, or transform, the data for model training? • How to update the model with new data without downtime? • Where should I add some business logics? • How to make the code configurable, re-usable and maintainable? • How do I build all these with a separate of concerns (SoC)? Beyond Prototyping
  • Engine Event Server (data storage) Data: User Actions Query via REST: User ID Predicted Result: A list of Product IDs A Classic Recommender Example on production… Mobile App
  • • PredictionIO is a machine learning server for building and deploying predictive engines 
 on production
 in a fraction of the time. • Built on Apache Spark, MLlib and HBase. PredictionIO
  • Data: User Actions Query via REST: User ID Predicted Result: A list of Product IDs Engine Event Server (data storage) Mobile App Event Server
  • • $ pio eventserver • Event-based client.create_event( event="rate", entity_type="user", entity_id=“user_123”, target_entity_type="item", target_entity_id=“item_100”, properties= { "rating" : 5.0 } ) Event Server Collecting Date
  • Query via REST: User ID Predicted Result: A list of Product IDs Engine Data: User Actions Event Server (data storage) Mobile App Engine
  • • DASE - the “MVC” for Machine Learning • Data: Data Source and Data Preparator • Algorithm(s) • Serving • Evaluator Engine Building an Engine with Separation of Concerns (SoC)
  • A. Train deployable predictive model(s) B. Respond to dynamic query C. Evaluation Engine Functions of an Engine
  • Engine A. Train predictive model(s) class DataSource(…) extends PDataSource def readTraining(sc: SparkContext) ==> trainingData class Preparator(…) extends PPreparator def prepare(sc: SparkContext, trainingData: TrainingData) ==> preparedData class Algorithm1(…) extends PAlgorithm def train(prepareData: PreparedData) ==> Model $ pio train
  • Engine A. Train predictive model(s) class DataSource(…) extends PDataSource override def readTraining(sc: SparkContext): TrainingData = { val eventsDb = Storage.getPEvents() val eventsRDD: RDD[Event] = eventsDb.find(….)(sc) val ratingsRDD: RDD[Rating] = eventsRDD.map { event => val rating = try { val ratingValue: Double = event.event match {….} Rating(event.entityId, event.targetEntityId.get, ratingValue) } catch {…} rating } new TrainingData(ratingsRDD) }
  • Engine A. Train predictive model(s) class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm def train(preparedData: PreparedData): Model1 = { mllibRatings = data…. val m = ALS.train(mllibRatings, ap.rank, ap.numIterations, ap.lambda) new Model1( rank = m.rank, userFeatures = m.userFeatures, productFeatures = m.productFeatures ) }
  • Engine A. Train predictive model(s) Event Server Algorithm 1 Algorithm 3Algorithm 2 PreparedDate Engine Data Preparator Data Source TrainingDate Model 3Model 1Model 2
  • B. Respond to dynamic queryEngine • Query (Input) :
 
 $ curl -H "Content-Type: application/json" -d 
 '{ "user": "1", "num": 4 }' 
 http://localhost:8000/queries.json case class Query( val user: String, val num: Int ) extends Serializable
  • B. Respond to dynamic queryEngine • Predicted Result (Output):
 
 {“itemScores”:[{"item":"22","score":4.072304374729956}, {"item":"62","score":4.058482414005789}, 
 {"item":"75","score":4.046063009943821}]} case class PredictedResult( val itemScores: Array[ItemScore] ) extends Serializable case class ItemScore( item: String, score: Double ) extends Serializable
  • class Algorithm1(…) extends PAlgorithm def predict(model: ALSModel, query: Query) ==> predictedResult class Serving extends LServing def serve(query: Query, predictedResults: Seq[PredictedResult]) ==> predictedResult B. Respond to dynamic queryEngine Query via REST
  • Engine B. Respond to dynamic query class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm def predict(model: ALSModel, query: Query): PredictedResult = { model….{ userInt => val itemScores = model.recommendProducts (…).map (….) new PredictedResult(itemScores) }.getOrElse{….} }
  • B. Respond to dynamic queryEngine Algorithm 1 Model 1 Serving Mobile App Algorithm 3 Model 3 Algorithm 2 Model 2 Predicted Results Query (input) Predicted Result (output) Engine
  • Engine DASE Factory object RecEngine extends IEngineFactory { def apply() = { new Engine( classOf[DataSource], classOf[Preparator], Map("algo1" -> classOf[Algorithm1]), classOf[Serving]) } }
  • Running on Production • Install PredictionIO
 $ bash -c "$(curl -s http://install.prediction.io/install.sh)" • Start the Event Server 
 $ pio eventserver • Deploy an Engine
 $ pio build; pio train; pio deploy • Update Engine Model with New Data 
 $ pio train; pio deploy
  • Deploy on Production Website Mobile App Email Campaign Event Server (data storage) Data Query via REST Predicted Result Engine 1 Engine 3 Engine 2 Engine 4
  • The Next Step • Quickstart with an Engine Template! • Follow on Github: github.com/predictionio/ • Learn PredictionIO: prediction.io/ • Learn Scala! Scala for the Impatient • Contribute!
  • Thanks. Simon Chan [email protected] @PredictionIO prediction.io (Newsletters) github.com/predictionio mailto:[email protected]
Please download to view
27
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Description
Text
  • Building and Deploying ML Applications on production in a fraction of the time. A Machine Learning Server in Scala
  • Available Tools Processing Framework • e.g. Apache Spark, Apache Hadoop Algorithm Libraries • e.g. MLlib, Mahout Data Storage • e.g. HBase, Cassandra
  • Integrate everything together nicely and move from prototyping to production. What is Missing?
  • You have a mobile app A Classic Recommender Example… App Predict products You need a Recommendation Engine Predict products that a customer will like – and show it. Predictive model Algorithm - You don't need to write your own: 
 Spark MLlib - ALS algorithm
 Predictive model - based on users’ previous behaviors
  • def pseudocode () { // Read training data 
 val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match 
 { …. }) // Build a predictive model with an algorithm
 val model = ALS.train(trainingData, 10, 20, 0.01) // Make prediction 
 allUsers.foreach { user =>
 model.recommendProducts(user, 5)
 } } A Classic Recommender Example prototyping…
  • • How to deploy a scalable service that respond to dynamic prediction query? • How do you persist the predictive model, in a distributed environment? • How to make HBase, Spark and algorithms talking to each other? • How should I prepare, or transform, the data for model training? • How to update the model with new data without downtime? • Where should I add some business logics? • How to make the code configurable, re-usable and maintainable? • How do I build all these with a separate of concerns (SoC)? Beyond Prototyping
  • Engine Event Server (data storage) Data: User Actions Query via REST: User ID Predicted Result: A list of Product IDs A Classic Recommender Example on production… Mobile App
  • • PredictionIO is a machine learning server for building and deploying predictive engines 
 on production
 in a fraction of the time. • Built on Apache Spark, MLlib and HBase. PredictionIO
  • Data: User Actions Query via REST: User ID Predicted Result: A list of Product IDs Engine Event Server (data storage) Mobile App Event Server
  • • $ pio eventserver • Event-based client.create_event( event="rate", entity_type="user", entity_id=“user_123”, target_entity_type="item", target_entity_id=“item_100”, properties= { "rating" : 5.0 } ) Event Server Collecting Date
  • Query via REST: User ID Predicted Result: A list of Product IDs Engine Data: User Actions Event Server (data storage) Mobile App Engine
  • • DASE - the “MVC” for Machine Learning • Data: Data Source and Data Preparator • Algorithm(s) • Serving • Evaluator Engine Building an Engine with Separation of Concerns (SoC)
  • A. Train deployable predictive model(s) B. Respond to dynamic query C. Evaluation Engine Functions of an Engine
  • Engine A. Train predictive model(s) class DataSource(…) extends PDataSource def readTraining(sc: SparkContext) ==> trainingData class Preparator(…) extends PPreparator def prepare(sc: SparkContext, trainingData: TrainingData) ==> preparedData class Algorithm1(…) extends PAlgorithm def train(prepareData: PreparedData) ==> Model $ pio train
  • Engine A. Train predictive model(s) class DataSource(…) extends PDataSource override def readTraining(sc: SparkContext): TrainingData = { val eventsDb = Storage.getPEvents() val eventsRDD: RDD[Event] = eventsDb.find(….)(sc) val ratingsRDD: RDD[Rating] = eventsRDD.map { event => val rating = try { val ratingValue: Double = event.event match {….} Rating(event.entityId, event.targetEntityId.get, ratingValue) } catch {…} rating } new TrainingData(ratingsRDD) }
  • Engine A. Train predictive model(s) class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm def train(preparedData: PreparedData): Model1 = { mllibRatings = data…. val m = ALS.train(mllibRatings, ap.rank, ap.numIterations, ap.lambda) new Model1( rank = m.rank, userFeatures = m.userFeatures, productFeatures = m.productFeatures ) }
  • Engine A. Train predictive model(s) Event Server Algorithm 1 Algorithm 3Algorithm 2 PreparedDate Engine Data Preparator Data Source TrainingDate Model 3Model 1Model 2
  • B. Respond to dynamic queryEngine • Query (Input) :
 
 $ curl -H "Content-Type: application/json" -d 
 '{ "user": "1", "num": 4 }' 
 http://localhost:8000/queries.json case class Query( val user: String, val num: Int ) extends Serializable
  • B. Respond to dynamic queryEngine • Predicted Result (Output):
 
 {“itemScores”:[{"item":"22","score":4.072304374729956}, {"item":"62","score":4.058482414005789}, 
 {"item":"75","score":4.046063009943821}]} case class PredictedResult( val itemScores: Array[ItemScore] ) extends Serializable case class ItemScore( item: String, score: Double ) extends Serializable
  • class Algorithm1(…) extends PAlgorithm def predict(model: ALSModel, query: Query) ==> predictedResult class Serving extends LServing def serve(query: Query, predictedResults: Seq[PredictedResult]) ==> predictedResult B. Respond to dynamic queryEngine Query via REST
  • Engine B. Respond to dynamic query class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm def predict(model: ALSModel, query: Query): PredictedResult = { model….{ userInt => val itemScores = model.recommendProducts (…).map (….) new PredictedResult(itemScores) }.getOrElse{….} }
  • B. Respond to dynamic queryEngine Algorithm 1 Model 1 Serving Mobile App Algorithm 3 Model 3 Algorithm 2 Model 2 Predicted Results Query (input) Predicted Result (output) Engine
  • Engine DASE Factory object RecEngine extends IEngineFactory { def apply() = { new Engine( classOf[DataSource], classOf[Preparator], Map("algo1" -> classOf[Algorithm1]), classOf[Serving]) } }
  • Running on Production • Install PredictionIO
 $ bash -c "$(curl -s http://install.prediction.io/install.sh)" • Start the Event Server 
 $ pio eventserver • Deploy an Engine
 $ pio build; pio train; pio deploy • Update Engine Model with New Data 
 $ pio train; pio deploy
  • Deploy on Production Website Mobile App Email Campaign Event Server (data storage) Data Query via REST Predicted Result Engine 1 Engine 3 Engine 2 Engine 4
  • The Next Step • Quickstart with an Engine Template! • Follow on Github: github.com/predictionio/ • Learn PredictionIO: prediction.io/ • Learn Scala! Scala for the Impatient • Contribute!
  • Thanks. Simon Chan [email protected] @PredictionIO prediction.io (Newsletters) github.com/predictionio mailto:[email protected]
Comments
Top