我们将随机路径的长度固定为20,每次变换单个节点生成路径的条数,分别尝试10、20、50,生成相应的Embedding结果,并查看相应的分类效果。完整代码如下:
BatchOperator <?> edges = new UnionBatchOp().linkFrom(
paper_author.select("paper_id AS source_id, author_id AS target_id"),
paper_conf.select("paper_id AS source_id, conf_id AS target_id")
);
for (int walkNum : new int[] {10, 20, 50}) {
edges
.link(
new DeepWalkBatchOp()
.setSourceCol("source_id")
.setTargetCol("target_id")
.setIsToUndigraph(true)
.setVectorSize(100)
.setWalkLength(20)
.setWalkNum(walkNum)
.setNumIter(1)
)
.link(
new AkSinkBatchOp()
.setFilePath(DATA_DIR + String.valueOf(walkNum) + "_" + DEEPWALK_EMBEDDING)
.setOverwriteSink(true)
);
BatchOperator.execute();
classifyWithEmbedding(
new AkSourceBatchOp()
.setFilePath(DATA_DIR + String.valueOf(walkNum) + "_" + DEEPWALK_EMBEDDING)
);
}整理运行结果如下,整体上随着WalkNum的增加,每个分类器的效果都在变好,Softmax分类器,在WalkNum从10到20的变化中,精确度(Accuracy)提升非常明显,甚至超过KnnClassifier。
WalkNum | Softmax | KnnClassifier |
10 | Accuracy:0.5361 Kappa:0.2547 | Accuracy:0.5595 Kappa:0.3649 |
20 | Accuracy:0.5727 Kappa:0.3555 | Accuracy:0.5669 Kappa:0.3781 |
50 | Accuracy:0.5752 Kappa:0.3647 | Accuracy:0.5791 Kappa:0.3999 |
下面,我们再尝试修改构建图的流程,看看对Embedding的影响。如下面代码所示,再添加一种边的关系,即作者到会议的边。
BatchOperator <?> edges = new UnionBatchOp().linkFrom(
paper_author.select("paper_id AS source_id, author_id AS target_id"),
paper_conf.select("paper_id AS source_id, conf_id AS target_id"),
new LookupBatchOp()
.setSelectedCols("paper_id")
.setOutputCols("target_id")
.setMapKeyCols("paper_id")
.setMapValueCols("conf_id")
.linkFrom(paper_conf, paper_author)
.select("author_id AS source_id, target_id")
);整理运行结果如下,一个明显的变化是,WalkNum=10时,Softmax与KnnClassifier的分类效果明显好于前面的实验,Softmax上的表现更为突出;但随着WalkNum的增加,分类效果的改进较小。
WalkNum | Softmax | KnnClassifier |
10 | Accuracy:0.5669 Kappa:0.3524 | Accuracy:0.5633 Kappa:0.3717 |
20 | Accuracy:0.5746 Kappa:0.3748 | Accuracy:0.565 Kappa:0.3836 |
50 | Accuracy:0.5761 Kappa:0.38 | Accuracy:0.5652 Kappa:0.3936 |