Java 类名:com.alibaba.alink.operator.batch.statistics.SomBatchOp
Python 类名:SomBatchOp
Self-Organized Map 算法,是一种高维数据可视化算法。
参考:https://clarkdatalabs.github.io/soms/SOM_NBA
名称 | 中文名称 | 描述 | 类型 | 是否必须? | 取值范围 | 默认值 |
---|---|---|---|---|---|---|
vdim | 向量长度 | 向量长度 | Integer | ✓ | ||
vectorCol | vector列名 | vector列名 | String | ✓ | 所选列类型为 [DENSE_VECTOR, SPARSE_VECTOR, STRING, VECTOR] | |
xdim | x方向网格数 | x方向网格数 | Integer | ✓ | ||
ydim | y方向网格数 | y方向网格数 | Integer | ✓ | ||
debug | 是否打开调试 | 是否打开调试 | Boolean | false | ||
evaluation | 是否每轮评估迭代结果 | 是否每轮评估迭代结果 | Boolean | false | ||
learnRate | 学习率 | 学习率 | Double | 0.5 | ||
numIters | 迭代轮数 | 迭代轮数 | Integer | 100 | ||
sigma | neighborhood函数方差 | neighborhood函数方差 | Double | 1.0 |
from pyalink.alink import * import pandas as pd useLocalEnv(1) df = pd.DataFrame([ [5,2,3.5,1,'Iris-versicolor'], [5.1,3.7,1.5,0.4,'Iris-setosa'], [6.4,2.8,5.6,2.2,'Iris-virginica'], [6,2.9,4.5,1.5,'Iris-versicolor'], [4.9,3,1.4,0.2,'Iris-setosa'], [5.7,2.6,3.5,1,'Iris-versicolor'], [4.6,3.6,1,0.2,'Iris-setosa'], [5.9,3,4.2,1.5,'Iris-versicolor'], [6.3,2.8,5.1,1.5,'Iris-virginica'], [4.7,3.2,1.3,0.2,'Iris-setosa'], [5.1,3.3,1.7,0.5,'Iris-setosa'], [5.5,2.4,3.8,1.1,'Iris-versicolor'], ]) source = BatchOperator.fromDataframe(df, schemaStr='sepal_length double, sepal_width double, petal_length double, petal_width double, category string') va = VectorAssemblerBatchOp().setSelectedCols(["sepal_length", "sepal_width"]) \ .setOutputCol("features") som = SomBatchOp()\ .setXdim(2) \ .setYdim(4) \ .setVdim(2) \ .setSigma(1.0) \ .setNumIters(10) \ .setVectorCol("features") source.link(va).link(som).print()
meta xidx yidx weights cnt
0 4,4,2,r 0 0 6.098901,2.6216097 8
1 4,4,2,r 0 1 5.7773294,2.6960063 13
2 4,4,2,r 0 2 5.4155393,2.6422026 7
3 4,4,2,r 0 3 4.960137,2.6060686 6
4 4,4,2,r 1 0 6.3758574,2.8099446 9
5 4,4,2,r 1 1 6.1287007,2.9054923 11
6 4,4,2,r 1 2 5.422063,2.9557745 5
7 4,4,2,r 1 3 4.8237553,3.0189981 16
8 4,4,2,r 2 0 6.771961,2.9707642 8
9 4,4,2,r 2 1 6.541121,3.1317835 13
10 4,4,2,r 2 2 5.73227,3.323552 4
11 4,4,2,r 2 3 5.0270276,3.4258003 17
12 4,4,2,r 3 0 7.296618,3.1325939 11
13 4,4,2,r 3 1 6.956377,3.2311482 7
14 4,4,2,r 3 2 5.9284854,3.5446692 4
15 4,4,2,r 3 3 5.236388,3.7456865 11