Alink教程(Java版)

第29.3节 流式预测与LocalPredictor

本节讨论在流式预测和LocalPredictor嵌入式预测场景如何使用模型流,可通过如下三种方式实现:

    • 流式预测组件,使用对应算法训练生成的模型流
    • 大多数Model类型的PipelineStage组件提供了设置模型流的属性
    • LocalPredictor通过导入设置了模型流的PipelineModel,获得模型动态更新的能力



29.3.1 流式预测组件


BatchOperator initModel = new AkSourceBatchOp()
	.setFilePath(DATA_DIR + INIT_NUMERIC_LR_MODEL_FILE);

StreamOperator <?> predResult = new CsvSourceStreamOp()
	.setFilePath("http://alink-release.oss-cn-beijing.aliyuncs.com/data-files/avazu-ctr-train-8M.csv")
	.setSchemaStr(SCHEMA_STRING)
	.setIgnoreFirstLine(true)
	.link(
		new LogisticRegressionPredictStreamOp(initModel)
			.setPredictionCol(PREDICTION_COL_NAME)
			.setReservedCols(new String[] {LABEL_COL_NAME})
			.setPredictionDetailCol(PRED_DETAIL_COL_NAME)
			.setModelStreamFilePath(DATA_DIR + FTRL_MODEL_STREAM_DIR)
	);

predResult
	.sample(0.0001)
	.select("'Pred Sample' AS out_type, *")
	.print();

predResult
	.link(
		new EvalBinaryClassStreamOp()
			.setLabelCol(LABEL_COL_NAME)
			.setPredictionDetailCol(PRED_DETAIL_COL_NAME)
			.setTimeInterval(10)
	)
	.link(
		new JsonValueStreamOp()
			.setSelectedCol("Data")
			.setReservedCols(new String[] {"Statistics"})
			.setOutputCols(new String[] {"Accuracy", "AUC", "ConfusionMatrix"})
			.setJsonPath(new String[] {"$.Accuracy", "$.AUC", "$.ConfusionMatrix"})
	)
	.select("'Eval Metric' AS out_type, *")
	.print();

StreamOperator.execute();


29.3.2 Model类型的PipelineStage组件


BatchOperator initModel = new AkSourceBatchOp()
	.setFilePath(DATA_DIR + INIT_NUMERIC_LR_MODEL_FILE);

PipelineModel pipelineModel = new PipelineModel(
	new LogisticRegressionModel()
		.setModelData(initModel)
		.setPredictionCol(PREDICTION_COL_NAME)
		.setReservedCols(new String[] {LABEL_COL_NAME})
		.setPredictionDetailCol(PRED_DETAIL_COL_NAME)
		.setModelStreamFilePath(DATA_DIR + FTRL_MODEL_STREAM_DIR)
);

pipelineModel.save(DATA_DIR + LR_PIPELINEMODEL_FILE, true);
BatchOperator.execute();

StreamOperator <?> predResult = pipelineModel
	.transform(
		new CsvSourceStreamOp()
			.setFilePath(
				"http://alink-release.oss-cn-beijing.aliyuncs.com/data-files/avazu-ctr-train-8M.csv")
			.setSchemaStr(SCHEMA_STRING)
			.setIgnoreFirstLine(true)
	);

predResult
	.sample(0.0001)
	.select("'Pred Sample' AS out_type, *")
	.print();

predResult
	.link(
		new EvalBinaryClassStreamOp()
			.setLabelCol(LABEL_COL_NAME)
			.setPredictionDetailCol(PRED_DETAIL_COL_NAME)
			.setTimeInterval(10)
	)
	.link(
		new JsonValueStreamOp()
			.setSelectedCol("Data")
			.setReservedCols(new String[] {"Statistics"})
			.setOutputCols(new String[] {"Accuracy", "AUC", "ConfusionMatrix"})
			.setJsonPath(new String[] {"$.Accuracy", "$.AUC", "$.ConfusionMatrix"})
	)
	.select("'Eval Metric' AS out_type, *")
	.print();

StreamOperator.execute();


29.3.3 LocalPredictor


Object[] input = new Object[] {
	"10000949271186029916", "1", "14102100", "1005", 0, "1fbe01fe", "f3845767", "28905ebd",
	"ecad2386", "7801e8d9", "07d7df22", "a99f214a", "37e8da74", "5db079b5", "1", "2",
	15707, 320, 50, 1722, 0, 35, -1, 79};

LocalPredictor localPredictor
	= new LocalPredictor(DATA_DIR + LR_PIPELINEMODEL_FILE, SCHEMA_STRING);

for (int i = 1; i <= 100; i++) {
	System.out.print(i + "\t");
	System.out.println(ArrayUtils.toString(localPredictor.predict(input)));
	Thread.sleep(2000);
}
localPredictor.close();

本代码对应Chap29Pred.c_3_3()方法,运行此方法的同时,运行Chap29.c_2()方法。则此方法打印输出如下信息。由于本方法一直都是预测同样的数据,在模型没有发生变化的时候,预测结果是一样的;预测结果发生变化,也就意味着模型已经更新。从下面的内容看,预测结果发生多次变化,模型流起到了作用。

0	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
1	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
2	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
3	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
4	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
5	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
6	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
7	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
8	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
9	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
10	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
11	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
12	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
13	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
14	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
15	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
16	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
17	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
18	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
19	1,0,{"0":"0.8059634797835652","1":"0.1940365202164348"}
20	1,0,{"0":"0.8059059300425273","1":"0.1940940699574727"}
21	1,0,{"0":"0.8059059300425273","1":"0.1940940699574727"}
22	1,0,{"0":"0.8059059300425273","1":"0.1940940699574727"}
23	1,0,{"0":"0.8059059300425273","1":"0.1940940699574727"}
24	1,0,{"0":"0.8059059300425273","1":"0.1940940699574727"}
25	1,0,{"0":"0.8059057632191885","1":"0.1940942367808115"}
26	1,0,{"0":"0.8059057632191885","1":"0.1940942367808115"}
27	1,0,{"0":"0.8059057632191885","1":"0.1940942367808115"}
28	1,0,{"0":"0.8059057632191885","1":"0.1940942367808115"}
29	1,0,{"0":"0.8059057632191885","1":"0.1940942367808115"}
30	1,0,{"0":"0.8059057632191885","1":"0.1940942367808115"}
31	1,0,{"0":"0.8059057632191885","1":"0.1940942367808115"}
32	1,0,{"0":"0.8059057632191885","1":"0.1940942367808115"}
33	1,0,{"0":"0.8059057632191885","1":"0.1940942367808115"}
34	1,0,{"0":"0.8059057632191885","1":"0.1940942367808115"}
35	1,0,{"0":"0.8068633636443617","1":"0.19313663635563827"}
36	1,0,{"0":"0.8068633636443617","1":"0.19313663635563827"}
37	1,0,{"0":"0.8068633636443617","1":"0.19313663635563827"}
38	1,0,{"0":"0.8068633636443617","1":"0.19313663635563827"}
39	1,0,{"0":"0.8068633636443617","1":"0.19313663635563827"}
40	1,0,{"0":"0.8063115847658164","1":"0.19368841523418356"}
41	1,0,{"0":"0.8063115847658164","1":"0.19368841523418356"}
42	1,0,{"0":"0.8063115847658164","1":"0.19368841523418356"}
......