Alink教程(Java版)

第31.6节 改变训练参数


我们将随机路径的长度固定为20,每次变换单个节点生成路径的条数,分别尝试102050,生成相应的Embedding结果,并查看相应的分类效果。完整代码如下:

BatchOperator <?> edges = new UnionBatchOp().linkFrom(
	paper_author.select("paper_id AS source_id, author_id AS target_id"),
	paper_conf.select("paper_id AS source_id, conf_id AS target_id")
);

for (int walkNum : new int[] {10, 20, 50}) {
	edges
		.link(
			new DeepWalkBatchOp()
				.setSourceCol("source_id")
				.setTargetCol("target_id")
				.setIsToUndigraph(true)
				.setVectorSize(100)
				.setWalkLength(20)
				.setWalkNum(walkNum)
				.setNumIter(1)
		)
		.link(
			new AkSinkBatchOp()
				.setFilePath(DATA_DIR + String.valueOf(walkNum) + "_" + DEEPWALK_EMBEDDING)
				.setOverwriteSink(true)
		);
	BatchOperator.execute();

	classifyWithEmbedding(
		new AkSourceBatchOp()
			.setFilePath(DATA_DIR + String.valueOf(walkNum) + "_" + DEEPWALK_EMBEDDING)
	);
}

整理运行结果如下,整体上随着WalkNum的增加,每个分类器的效果都在变好,Softmax分类器,在WalkNum1020的变化中,精确度(Accuracy)提升非常明显,甚至超过KnnClassifier

WalkNum

Softmax

KnnClassifier

10

Accuracy:0.5361 Kappa:0.2547

Accuracy:0.5595 Kappa:0.3649

20

Accuracy:0.5727 Kappa:0.3555

Accuracy:0.5669 Kappa:0.3781

50

Accuracy:0.5752 Kappa:0.3647

Accuracy:0.5791 Kappa:0.3999

下面,我们再尝试修改构建图的流程,看看对Embedding的影响。如下面代码所示,再添加一种边的关系,即作者到会议的边。

BatchOperator <?> edges = new UnionBatchOp().linkFrom(
	paper_author.select("paper_id AS source_id, author_id AS target_id"),
	paper_conf.select("paper_id AS source_id, conf_id AS target_id"),
	new LookupBatchOp()
		.setSelectedCols("paper_id")
		.setOutputCols("target_id")
		.setMapKeyCols("paper_id")
		.setMapValueCols("conf_id")
		.linkFrom(paper_conf, paper_author)
		.select("author_id AS source_id, target_id")
);

整理运行结果如下,一个明显的变化是,WalkNum=10时,SoftmaxKnnClassifier的分类效果明显好于前面的实验,Softmax上的表现更为突出;但随着WalkNum的增加,分类效果的改进较小。

WalkNum

Softmax

KnnClassifier

10

Accuracy:0.5669 Kappa:0.3524

Accuracy:0.5633 Kappa:0.3717

20

Accuracy:0.5746 Kappa:0.3748

Accuracy:0.565 Kappa:0.3836

50

Accuracy:0.5761 Kappa:0.38

Accuracy:0.5652 Kappa:0.3936