Java 类名:com.alibaba.alink.operator.stream.statistics.QuantileStreamOp
Python 类名:QuantileStreamOp
分位数将一列数据按大小排序,给出每个分位点(默认百分位)的值。
| 名称 | 中文名称 | 描述 | 类型 | 是否必须? | 取值范围 | 默认值 |
|---|---|---|---|---|---|---|
| quantileNum | 分位个数 | 分位个数 | Integer | ✓ | x >= 0 | |
| dalayTime | 延迟时间 | 延迟时间 | Integer | 0 | ||
| selectedCols | 选中的列名数组 | 计算列对应的列名列表 | String[] | 所选列类型为 [BIGDECIMAL, BIGINTEGER, BYTE, DOUBLE, FLOAT, INTEGER, LONG, SHORT] | null | |
| timeCol | 时间列 | 时间列。如果用户输入时间列,则以此作为数据的时间;否则按照process time作为数据的时间。 | String | null | ||
| timeInterval | 时间间隔 | 流式数据统计的时间间隔 | Double | 3.0 |
from pyalink.alink import *
import pandas as pd
useLocalEnv(1)
df_data = pd.DataFrame([
[0.0,0.0,0.0],
[0.1,0.2,0.1],
[0.2,0.2,0.8],
[9.0,9.5,9.7],
[9.1,9.1,9.6],
[9.2,9.3,9.9]
])
streamData = StreamOperator.fromDataframe(df_data, schemaStr='x1 double, x2 double, x3 double')
quanOp = QuantileStreamOp()\
.setSelectedCols(["x2","x3"])\
.setQuantileNum(5)
#control data speed, 1 per second.
speedControl = SpeedControlStreamOp()\
.setTimeInterval(.3)
streamData.link(speedControl).link(quanOp).print()
StreamOperator.execute()
| starttime | endtime | colname | quantile |
|---|---|---|---|
| 2020/03/23 22:42:45 | 2020/03/23 22:42:46 | col2 | [-99.9,-2.5,-2.5,1.3,1.3,100.2] |
| 2020/03/23 22:42:45 | 2020/03/23 22:42:46 | col3 | [-0.01,0.9,0.9,1.1,1.1,100.9] |