Java 类名:com.alibaba.alink.operator.batch.finance.ConstrainedRegSelectorTrainBatchOp
Python 类名:ConstrainedRegSelectorTrainBatchOp
使用带约束的Stepwise线性回归方法,进行特征筛选。
| 名称 | 中文名称 | 描述 | 类型 | 是否必须? | 取值范围 | 默认值 |
|---|---|---|---|---|---|---|
| labelCol | 标签列名 | 输入表中的标签列名 | String | ✓ | ||
| alphaEntry | 筛选阈值 | 筛选阈值 | Double | 0.05 | ||
| alphaStay | 移除阈值 | 移除阈值 | Double | 0.05 | ||
| forceSelectedCols | 强制选择的列 | 强制选择的列 | int[] | 所选列类型为 [BIGDECIMAL, BIGINTEGER, BYTE, DOUBLE, FLOAT, INTEGER, LONG, SHORT] | ||
| l1 | L1 正则化系数 | L1 正则化系数,默认为0。 | Double | x >= 0.0 | 0.0 | |
| l2 | L2 正则化系数 | L2 正则化系数,默认为0。 | Double | x >= 0.0 | 0.0 | |
| method | 方法 | 方法 | String | “FTest”, “MarginalContribution” | “FTest” | |
| optimMethod | 优化方法 | 优化方法 | String | “SQP”, “BARRIER” | “SQP” | |
| selectedCol | 计算列对应的列名 | 计算列对应的列名, 默认值是null | String | null | ||
| selectedCols | 选中的列名数组 | 计算列对应的列名列表 | String[] | 所选列类型为 [BIGDECIMAL, BIGINTEGER, BYTE, DOUBLE, FLOAT, INTEGER, LONG, SHORT] | null |
from pyalink.alink import *
import pandas as pd
useLocalEnv(1)
df = pd.DataFrame([
["$3$0:1.0 1:7.0 2:9.0", "1.0 7.0 9.0", 1.0, 7.0, 9.0, 1.0],
["$3$0:1.0 1:3.0 2:3.0", "2.0 3.0 3.0", 2.0, 3.0, 3.0, 1.0],
["$3$0:1.0 1:2.0 2:4.0", "3.0 2.0 4.0", 3.0, 2.0, 4.0, 0.0],
["$3$0:1.0 1:3.0 2:4.0", "2.0 3.0 4.0", 2.0, 3.0, 4.0, 0.0],
["$3$0:1.0 1:3.0 2:4.0", "1.0 5.0 8.0", 1.0, 5.0, 8.0, 0.0],
["$3$0:1.0 1:3.0 2:4.0", "1.0 6.0 3.0", 1.0, 6.0, 3.0, 0.0]
])
batchData = BatchOperator.fromDataframe(df, schemaStr='svec string, vec string, f0 double, f1 double, f2 double, label double')
selector = ConstrainedRegSelectorTrainBatchOp()\
.setAlphaEntry(0.65)\
.setAlphaStay(0.7)\
.setSelectedCol("vec")\
.setLabelCol("label")\
.setForceSelectedCols([1,2])
constraint = pd.DataFrame([
['{"featureConstraint":[],"constraintBetweenFeatures":{"name":"constraintBetweenFeatures","UP":[],"LO":[],"=":[[1,1.814],[2,0.4]],"%":[],"<":[],">":[[1,2]]},"countZero":null,"elseNullSave":null}']
])
constraintData = BatchOperator.fromDataframe(constraint, schemaStr='data string')
selector.linkFrom(batchData, constraintData)
predict = ConstrainedRegSelectorPredictBatchOp()\
.setPredictionCol("pred")\
.setReservedCols(["label"])
predict.linkFrom(selector, batchData).print()
label pred
0 1.0 7.0 9.0
1 1.0 3.0 3.0
2 0.0 2.0 4.0
3 0.0 3.0 4.0
4 0.0 5.0 8.0
5 0.0 6.0 3.0