Java 类名:com.alibaba.alink.operator.batch.statistics.VectorChiSquareTestBatchOp
Python 类名:VectorChiSquareTestBatchOp
针对vector数据,进行卡方检验
名称 | 中文名称 | 描述 | 类型 | 是否必须? | 取值范围 | 默认值 |
---|---|---|---|---|---|---|
labelCol | 标签列名 | 输入表中的标签列名 | String | ✓ | ||
selectedCol | 选中的列名 | 计算列对应的列名 | String | ✓ | 所选列类型为 [DENSE_VECTOR, SPARSE_VECTOR, STRING, VECTOR] |
** 以下代码仅用于示意,可能需要修改部分代码或者配置环境后才能正常运行!**
无python接口
import org.apache.flink.types.Row; import com.alibaba.alink.operator.batch.BatchOperator; import com.alibaba.alink.operator.batch.source.MemSourceBatchOp; import com.alibaba.alink.operator.batch.statistics.VectorChiSquareTestBatchOp; import org.junit.Test; import java.util.Arrays; public class VectorChiSquareTestBatchOpTest { @Test public void testVectorChiSquareTestBatchOp() throws Exception { Row[] testArray = new Row[] { Row.of(7, "0.0 0.0 18.0 1.0", 1.0), Row.of(8, "0.0 1.0 12.0 0.0", 0.0), Row.of(9, "1.0 0.0 15.0 0.1", 0.0), }; String[] colNames = new String[] {"id", "features", "clicked"}; MemSourceBatchOp source = new MemSourceBatchOp(Arrays.asList(testArray), colNames); VectorChiSquareTestBatchOp test = new VectorChiSquareTestBatchOp() .setSelectedCol("features") .setLabelCol("clicked"); test.linkFrom(source); test.lazyPrintChiSquareTest(); BatchOperator.execute(); } }
ChiSquareTest:
col | p | value | df |
---|---|---|---|
0 | 0.3865 | 0.75 | 1 |
1 | 0.3865 | 0.75 | 1 |
2 | 0.2231 | 3 | 2 |
3 | 0.2231 | 3 | 2 |