leo_wyn
作者leo_wyn·2020-04-29 10:39
商业智能工程师·Security

Nested window functions in SQL

字数 12147阅读 1057评论 0赞 0

What nested window functions are not about

At the date of this writing, there's not much information available out there about the true standard nested window

functions. What makes it harder is that I don't know of any platform that implemented this feature yet. In fact,

running a web search for nested window functions returns mostly coverage of and discussions about nesting

grouped aggregate functions within windowed aggregate functions. For example, suppose that you want to query

the Sales.OrderValues view in the TSQLV5 sample database, and return for each customer and order date, the daily

total of the order values, and the running total until the current day. Such a task involves both grouping and

windowing. You group the rows by the customer ID and the order date, and apply a running sum on top of the

group sum of the order values, like so:

USE TSQLV5; -- http://tsql.solidq.com/SampleDatabases/TSQLV5.zip

SELECT custid, orderdate, SUM(val) AS daytotal,

SUM(SUM(val)) OVER(PARTITION BY custid

ORDER BY orderdate

ROWS UNBOUNDED PRECEDING) AS runningsum

FROM Sales.OrderValues

GROUP BY custid, orderdate;

This query generates the following output, shown here in abbreviated form:

custid orderdate daytotal runningsum


1 2018-08-25 814.50 814.50

1 2018-10-03 878.00 1692.50

1 2018-10-13 330.00 2022.50

1 2019-01-15 845.80 2868.30

1 2019-03-16 471.20 3339.50

1 2019-04-09 933.50 4273.00

2 2017-09-18 88.80 88.80

2 2018-08-08 479.75 568.55

Home Archives Authors Categories Newsletters About Search... GO

10/24/2019 Nested window functions in SQL - SQLPerformance.com

https://sqlperformance.com/2019/07/sql-performance/nested-window-functions-in-sql 2/7

2 2018-11-28 320.00 888.55

2 2019-03-04 514.40 1402.95

...

Even though this technique is pretty cool, and even though web searches for nested window functions return

mainly such techniques, that's not what the SQL standard means by nested window functions. Since I couldn't find

any information out there on the topic, I just had to figure it out from the standard itself. Hopefully, this article will

increase awareness of the true nested window functions feature, and cause people to turn to Microsoft and ask to

add support for it in SQL Server.

What nested window functions are about

Nested window functions include two functions that you can nest as an argument of a window aggregate function.

Those are the nested row number function, and the nested value_of expression at row function.

Nested row number function

The nested row number function allows you to refer to the row number of strategic markers in windowing

elements. Here's the syntax of the function:

(< argument with ROW_NUMBER() >) OVER()

The row markers that you can specify are:

BEGIN_PARTITION

END_PARTITION

BEGIN_FRAME

END_FRAME

CURRENT_ROW

FRAME_ROW

The first four markers are self-explanatory. As for the last two, the CURRENT_ROW marker represents the current

outer row, and the FRAME_ROW represents the current inner frame row.

As an example for using the nested row number function, consider the following task. You need to query the

Sales.OrderValues view, and return for each order some of its attributes, as well as the difference between the

current order value and the customer average, but excluding the first and last customer orders from the average.

This task is achievable without nested window functions, but the solution involves quite a few steps:

WITH C1 AS

(

SELECT custid, val,

ROW_NUMBER() OVER( PARTITION BY custid

ORDER BY orderdate, orderid ) AS rownumasc,

ROW_NUMBER() OVER( PARTITION BY custid

ORDER BY orderdate DESC, orderid DESC ) AS rownumdesc

FROM Sales.OrderValues

),

C2 AS

10/24/2019 Nested window functions in SQL - SQLPerformance.com

https://sqlperformance.com/2019/07/sql-performance/nested-window-functions-in-sql 3/7

(

SELECT custid, AVG(val) AS avgval

FROM C1

WHERE 1 NOT IN (rownumasc, rownumdesc)

GROUP BY custid

)

SELECT O.orderid, O.custid, O.orderdate, O.val,

O.val - C2.avgval AS diff

FROM Sales.OrderValues AS O

LEFT OUTER JOIN C2

ON O.custid = C2.custid;

Here's the output of this query, shown here in abbreviated form:

orderid custid orderdate val diff


10411 10 2018-01-10 966.80 -570.184166

10743 4 2018-11-17 319.20 -809.813636

11075 68 2019-05-06 498.10 -1546.297500

10388 72 2017-12-19 1228.80 -358.864285

10720 61 2018-10-28 550.00 -144.744285

11052 34 2019-04-27 1332.00 -1164.397500

10457 39 2018-02-25 1584.00 -797.999166

10789 23 2018-12-22 3687.00 1567.833334

10434 24 2018-02-03 321.12 -1329.582352

10766 56 2018-12-05 2310.00 1015.105000

...

Using nested row number functions, the task is achievable with a single query, like so:

SELECT orderid, custid, orderdate, val,

val - AVG( CASE

WHEN ROW_NUMBER(FRAME_ROW) NOT IN

( ROW_NUMBER(BEGIN_PARTITION), ROW_NUMBER(END_PARTITION) ) THEN val

END )

OVER( PARTITION BY custid

ORDER BY orderdate, orderid

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS diff

FROM Sales.OrderValues;

Also, the currently supported solution requires at least one sort in the plan, and multiple passes over the data. The

solution using nested row number functions has all the potential to get optimized with reliance on index order, and

a reduced number of passes over the data. This, of course, is implementation dependent though.

Nested value_of expression at row function

The nested value_of expression at row function enables you to interact with a value of an expression at the same

strategic row markers mentioned earlier in an argument of a window aggregate function. Here's the syntax of this

function:

(< argument with

VALUE OF AT [] [, ]

) OVER()

10/24/2019 Nested window functions in SQL - SQLPerformance.com

https://sqlperformance.com/2019/07/sql-performance/nested-window-functions-in-sql 4/7

As you can see, you can specify a certain negative or positive delta with respect to the row marker, and optionally

provide a default value in case a row doesn't exist at the specified position.

This capability gives you a lot of power when you need to interact with different points in windowing elements.

Consider the fact that as powerful as window functions may be compared to alternative tools like subqueries, what

window functions don't support is a basic concept of a correlation. Using the CURRENT_ROW marker you get access

to the outer row, and this way emulate correlations. At the same time you get to benefit from all of the advantages

that window functions have compared to subqueries.

As an example, suppose that you need to query the Sales.OrderValues view, and return for each order some of its

attributes, as well as the difference between the current order value and the customer average, but excluding

orders placed on the same date as the current order date. This requires a capability similar to a correlation. With the

nested value_of expression at row function, using the CURRENT_ROW marker, this is achievable easily like so:

SELECT orderid, custid, orderdate, val,

val - AVG( CASE WHEN orderdate <> VALUE OF orderdate AT CURRENT_ROW THEN val END )

OVER( PARTITION BY custid ) AS diff

FROM Sales.OrderValues;

This query is supposed to generate the following output:

orderid custid orderdate val diff


10248 85 2017-07-04 440.00 180.000000

10249 79 2017-07-05 1863.40 1280.452000

10250 34 2017-07-08 1552.60 -854.228461

10251 84 2017-07-08 654.06 -293.536666

10252 76 2017-07-09 3597.90 1735.092728

10253 34 2017-07-10 1444.80 -970.320769

10254 14 2017-07-11 556.62 -1127.988571

10255 68 2017-07-12 2490.50 617.913334

10256 88 2017-07-15 517.80 -176.000000

10257 35 2017-07-16 1119.90 -153.562352

...

If you're thinking that this task is achievable just as easily with correlated subqueries, in this simplistic case you'd be

right. The same can be achieved with the following query:

SELECT O1.orderid, O1.custid, O1.orderdate, O1.val,

O1.val - ( SELECT AVG(O2.val)

FROM Sales.OrderValues AS O2

WHERE O2.custid = O1.custid

AND O2.orderdate <> O1.orderdate ) AS diff

FROM Sales.OrderValues AS O1;

However, remember that a subquery operates on an independent view of the data, whereas a window function

operates on the set that is provided as input to the logical query processing step that handles the SELECT clause.

Usually, the underlying query has extra logic like joins, filters, grouping, and such. With subqueries, you either need

to prepare a preliminary CTE, or repeat the logic of the underlying query also in the subquery. With window

functions, there's no need to repeat any of the logic.

For example, say that you were supposed to operate only on shipped orders (where the shipped date is not NULL)

that were handled by employee 3. The solution with the window function needs to add the filter predicates only

once, like so:

10/24/2019 Nested window functions in SQL - SQLPerformance.com

https://sqlperformance.com/2019/07/sql-performance/nested-window-functions-in-sql 5/7

SELECT orderid, custid, orderdate, val,

val - AVG( CASE WHEN orderdate <> VALUE OF orderdate AT CURRENT_ROW THEN val END )

OVER( PARTITION BY custid ) AS diff

FROM Sales.OrderValues

WHERE empid = 3 AND shippeddate IS NOT NULL;

This query is supposed to generate the following output:

orderid custid orderdate val diff


10251 84 2017-07-08 654.06 -459.965000

10253 34 2017-07-10 1444.80 531.733334

10256 88 2017-07-15 517.80 -1022.020000

10266 87 2017-07-26 346.56 NULL

10273 63 2017-08-05 2037.28 -3149.075000

10283 46 2017-08-16 1414.80 534.300000

10309 37 2017-09-19 1762.00 -1951.262500

10321 38 2017-10-03 144.00 NULL

10330 46 2017-10-16 1649.00 885.600000

10332 51 2017-10-17 1786.88 495.830000

...

The solution with the subquery needs to add the filter predicates twice—once in the outer query and once in the

subquery—like so:

SELECT O1.orderid, O1.custid, O1.orderdate, O1.val,

O1.val - ( SELECT AVG(O2.val)

FROM Sales.OrderValues AS O2

WHERE O2.custid = O1.custid

AND O2.orderdate <> O1.orderdate

AND empid = 3

AND shippeddate IS NOT NULL) AS diff

FROM Sales.OrderValues AS O1

WHERE empid = 3 AND shippeddate IS NOT NULL;

It's either this, or adding a preliminary CTE that takes care of all of the filtering and any other logic. Anyway you

look at it, with subqueries, there are more complexity layers involved.

The other benefit in nested window functions is that if we had support for those in T-SQL, it would have been easy

to emulate the missing full support for the RANGE window frame unit. The RANGE option is supposed to enable you

to define dynamic frames that are based on an offset from the ordering value in the current row. For example,

suppose that you need to compute for each customer order from the Sales.OrderValues view the moving average

value of the last 14 days. According to the SQL standard, you can achieve this using the RANGE option and the

INTERVAL type, like so:

SELECT orderid, custid, orderdate, val,

AVG(val) OVER( PARTITION BY custid

ORDER BY orderdate

RANGE BETWEEN INTERVAL '13' DAY PRECEDING

AND CURRENT ROW ) AS movingavg14days

FROM Sales.OrderValues;

This query is supposed to generate the following output:

10/24/2019 Nested window functions in SQL - SQLPerformance.com

https://sqlperformance.com/2019/07/sql-performance/nested-window-functions-in-sql 6/7

Share this post: Facebook Twitter LinkedIn

orderid custid orderdate val movingavg14days


10643 1 2018-08-25 814.50 814.500000

10692 1 2018-10-03 878.00 878.000000

10702 1 2018-10-13 330.00 604.000000

10835 1 2019-01-15 845.80 845.800000

10952 1 2019-03-16 471.20 471.200000

11011 1 2019-04-09 933.50 933.500000

10308 2 2017-09-18 88.80 88.800000

10625 2 2018-08-08 479.75 479.750000

10759 2 2018-11-28 320.00 320.000000

10926 2 2019-03-04 514.40 514.400000

10365 3 2017-11-27 403.20 403.200000

10507 3 2018-04-15 749.06 749.060000

10535 3 2018-05-13 1940.85 1940.850000

10573 3 2018-06-19 2082.00 2082.000000

10677 3 2018-09-22 813.37 813.370000

10682 3 2018-09-25 375.50 594.435000

10856 3 2019-01-28 660.00 660.000000

...

At the date of this writing, this syntax is not supported in T-SQL. If we had support for nested window functions in TSQL,

you would have been able to emulate this query with the following code:

SELECT orderid, custid, orderdate, val,

AVG( CASE WHEN DATEDIFF(day, orderdate, VALUE OF orderdate AT CURRENT_ROW)

BETWEEN 0 AND 13

THEN val END )

OVER( PARTITION BY custid

ORDER BY orderdate

RANGE UNBOUNDED PRECEDING ) AS movingavg14days

FROM Sales.OrderValues;

What's not to like?

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

0

添加新评论0 条评论

Ctrl+Enter 发表

作者其他文章

相关文章

相关问题

相关资料

X社区推广