Java 类名:com.alibaba.alink.operator.batch.associationrule.ApplySequenceRuleBatchOp
Python 类名:ApplySequenceRuleBatchOp
输入说明:一个sequence由多个element组成,element之间用分号分隔;一个element由多个item组成,item间用逗号分隔。
名称 | 中文名称 | 描述 | 类型 | 是否必须? | 取值范围 | 默认值 |
---|---|---|---|---|---|---|
selectedCol | 选中的列名 | 计算列对应的列名 | String | ✓ | 所选列类型为 [STRING] | |
modelFilePath | 模型的文件路径 | 模型的文件路径 | String | null | ||
outputCol | 输出结果列 | 输出结果列列名,可选,默认null | String | null | ||
reservedCols | 算法保留列名 | 算法保留列 | String[] | null | ||
numThreads | 组件多线程线程个数 | 组件多线程线程个数 | Integer | 1 |
from pyalink.alink import * import pandas as pd useLocalEnv(1) df = pd.DataFrame([ ["a;a,b,c;a,c;d;c,f"], ["a,d;c;b,c;a,e"], ["e,f;a,b;d,f;c;b"], ["e;g;a,f;c;b;c"], ]) data = BatchOperator.fromDataframe(df, schemaStr='sequence string') prefixSpan = PrefixSpanBatchOp() \ .setItemsCol("sequence") \ .setMinSupportCount(3) prefixSpan.linkFrom(data) prefixSpan.print() prefixSpan.getSideOutput(0).print() ApplySequenceRuleBatchOp()\ .setSelectedCol("sequence")\ .setOutputCol("result")\ .linkFrom(prefixSpan.getSideOutput(0), data)\ .print()
import org.apache.flink.types.Row; import com.alibaba.alink.operator.batch.BatchOperator; import com.alibaba.alink.operator.batch.source.MemSourceBatchOp; import com.alibaba.alink.testutil.AlinkTestBase; import org.junit.Test; import java.util.Arrays; import java.util.List; public class ApplySequenceRuleBatchOpTest { @Test public void testPrefixSpan() throws Exception { List <Row> rows = Arrays.asList( Row.of("a;a,b,c;a,c;d;c,f"), Row.of("a,d;c;b,c;a,e"), Row.of("e,f;a,b;d,f;c;b"), Row.of("e;g;a,f;c;b;c") ); BatchOperator data = new MemSourceBatchOp(rows, "sequence string"); PrefixSpanBatchOp prefixSpan = new PrefixSpanBatchOp() .setItemsCol("sequence") .setMinSupportCount(3); prefixSpan.linkFrom(data); ApplySequenceRuleBatchOp op = new ApplySequenceRuleBatchOp() .setSelectedCol("sequence") .setOutputCol("result") .linkFrom(prefixSpan.getSideOutputAssociationRules(), data); op.print(); } }
频繁项集输出:
itemset | supportcount | itemcount |
---|---|---|
e | 3 | 1 |
f | 3 | 1 |
a | 4 | 1 |
a;c | 4 | 2 |
a;c;c | 3 | 3 |
a;c;b | 3 | 3 |
a;b | 4 | 2 |
b | 4 | 1 |
b;c | 3 | 2 |
c | 4 | 1 |
c;c | 3 | 2 |
c;b | 3 | 2 |
d | 3 | 1 |
d;c | 3 | 2 |
关联规则输出:
rule | chain_length | support | confidence | transaction_count |
---|---|---|---|---|
c=>c | 2 | 0.7500 | 0.7500 | 3 |
c=>b | 2 | 0.7500 | 0.7500 | 3 |
d=>c | 2 | 0.7500 | 1.0000 | 3 |
b=>c | 2 | 0.7500 | 0.7500 | 3 |
a=>c | 2 | 1.0000 | 1.0000 | 4 |
a;c=>c | 3 | 0.7500 | 0.7500 | 3 |
a;c=>b | 3 | 0.7500 | 0.7500 | 3 |
a=>b | 2 | 1.0000 | 1.0000 | 4 |
预测结果输出
sequence | result |
---|---|
a;a,b,c;a,c;d;c,f | b |
a,d;c;b,c;a,e | c |
e,f;a,b;d,f;c;b | c |
e;g;a,f;c;b;c |