Java 类名:com.alibaba.alink.operator.batch.feature.AddressParserBatchOp
Python 类名:AddressParserBatchOp
地址解析,解析出省份/城市等
名称 | 中文名称 | 描述 | 类型 | 是否必须? | 取值范围 | 默认值 |
---|---|---|---|---|---|---|
selectedCol | 选中的列名 | 计算列对应的列名 | String | ✓ | ||
outputCol | 输出结果列 | 输出结果列列名,可选,默认null | String | null | ||
reservedCols | 算法保留列名 | 算法保留列 | String数组 | null | ||
numThreads | 组件多线程线程个数 | 组件多线程线程个数 | Integer | 1 |
import com.alibaba.alink.operator.batch.BatchOperator; import com.alibaba.alink.operator.batch.dataproc.format.JsonToColumnsBatchOp; import com.alibaba.alink.operator.batch.source.MemSourceBatchOp; import com.alibaba.alink.params.dataproc.format.HasHandleInvalidDefaultAsError.HandleInvalid; import com.alibaba.alink.testutil.AlinkTestBase; import org.apache.flink.types.Row; import org.junit.Test; import java.util.Arrays; public class AddressParserBatchOpTest extends AlinkTestBase { @Test public void test() throws Exception { Row[] testArray = new Row[] { Row.of("1", "成都市高新区天府软件园B区科技大楼"), Row.of("2", "双流县郑通路社保局区52050号"), Row.of("3", "city_walk") }; String[] colNames = new String[] {"id", "address"}; MemSourceBatchOp data = new MemSourceBatchOp(Arrays.asList(testArray), colNames); AddressParserBatchOp parser = new AddressParserBatchOp() .setSelectedCol("address") .setOutputCol("address_parse"); JsonToColumnsBatchOp jsonToColumns = new JsonToColumnsBatchOp() .setJsonCol("address_parse") .setSchemaStr("prov string, city string, district string, street string") .setHandleInvalid(HandleInvalid.SKIP); String selectSql = "*, " + "case " + " when prov is not null and city is not null and district is not null then true " + " else false " + "end is_address," + "case " + " when prov is not null then true " + " else false " + "end is_prov"; BatchOperator <?> result = data .link(parser) .link(jsonToColumns) .select(selectSql); result.print(); } }
id | address | address_parse | prov | city | district | street | is_address | is_prov |
---|---|---|---|---|---|---|---|---|
1 | 成都市高新区天府软件园B区科技大楼 | {“prov”:“四川省”,“city”:“成都市”,“district”:“高新西区”,“street”:“天府软件园B区科技大楼”} | 四川省 | 成都市 | 高新西区 | 天府软件园B区科技大楼 | true | true |
2 | 双流县郑通路社保局区52050号 | {“prov”:“四川省”,“city”:“成都市”,“district”:“双流县”,“street”:“郑通路社保局区52050号”} | 四川省 | 成都市 | 双流县 | 郑通路社保局区52050号 | true | true |
3 | city_walk | null | null | null | null | null | false | false |