What nested window functions are not about
At the date of this writing, there's not much information available out there about the true standard nested window
functions. What makes it harder is that I don't know of any platform that implemented this feature yet. In fact,
running a web search for nested window functions returns mostly coverage of and discussions about nesting
grouped aggregate functions within windowed aggregate functions. For example, suppose that you want to query
the Sales.OrderValues view in the TSQLV5 sample database, and return for each customer and order date, the daily
total of the order values, and the running total until the current day. Such a task involves both grouping and
windowing. You group the rows by the customer ID and the order date, and apply a running sum on top of the
group sum of the order values, like so:
USE TSQLV5; -- http://tsql.solidq.com/SampleDatabases/TSQLV5.zip
SELECT custid, orderdate, SUM(val) AS daytotal,
SUM(SUM(val)) OVER(PARTITION BY custid
ORDER BY orderdate
ROWS UNBOUNDED PRECEDING) AS runningsum
FROM Sales.OrderValues
GROUP BY custid, orderdate;
This query generates the following output, shown here in abbreviated form:
custid orderdate daytotal runningsum
1 2018-08-25 814.50 814.50
1 2018-10-03 878.00 1692.50
1 2018-10-13 330.00 2022.50
1 2019-01-15 845.80 2868.30
1 2019-03-16 471.20 3339.50
1 2019-04-09 933.50 4273.00
2 2017-09-18 88.80 88.80
2 2018-08-08 479.75 568.55
Home Archives Authors Categories Newsletters About Search... GO
10/24/2019 Nested window functions in SQL - SQLPerformance.com
https://sqlperformance.com/2019/07/sql-performance/nested-window-functions-in-sql 2/7
2 2018-11-28 320.00 888.55
2 2019-03-04 514.40 1402.95
...
Even though this technique is pretty cool, and even though web searches for nested window functions return
mainly such techniques, that's not what the SQL standard means by nested window functions. Since I couldn't find
any information out there on the topic, I just had to figure it out from the standard itself. Hopefully, this article will
increase awareness of the true nested window functions feature, and cause people to turn to Microsoft and ask to
add support for it in SQL Server.
What nested window functions are about
Nested window functions include two functions that you can nest as an argument of a window aggregate function.
Those are the nested row number function, and the nested value_of expression at row function.
Nested row number function
The nested row number function allows you to refer to the row number of strategic markers in windowing
elements. Here's the syntax of the function:
(< argument with ROW_NUMBER() >) OVER()
The row markers that you can specify are:
BEGIN_PARTITION
END_PARTITION
BEGIN_FRAME
END_FRAME
CURRENT_ROW
FRAME_ROW
The first four markers are self-explanatory. As for the last two, the CURRENT_ROW marker represents the current
outer row, and the FRAME_ROW represents the current inner frame row.
As an example for using the nested row number function, consider the following task. You need to query the
Sales.OrderValues view, and return for each order some of its attributes, as well as the difference between the
current order value and the customer average, but excluding the first and last customer orders from the average.
This task is achievable without nested window functions, but the solution involves quite a few steps:
WITH C1 AS
(
SELECT custid, val,
ROW_NUMBER() OVER( PARTITION BY custid
ORDER BY orderdate, orderid ) AS rownumasc,
ROW_NUMBER() OVER( PARTITION BY custid
ORDER BY orderdate DESC, orderid DESC ) AS rownumdesc
FROM Sales.OrderValues
),
C2 AS
10/24/2019 Nested window functions in SQL - SQLPerformance.com
https://sqlperformance.com/2019/07/sql-performance/nested-window-functions-in-sql 3/7
(
SELECT custid, AVG(val) AS avgval
FROM C1
WHERE 1 NOT IN (rownumasc, rownumdesc)
GROUP BY custid
)
SELECT O.orderid, O.custid, O.orderdate, O.val,
O.val - C2.avgval AS diff
FROM Sales.OrderValues AS O
LEFT OUTER JOIN C2
ON O.custid = C2.custid;
Here's the output of this query, shown here in abbreviated form:
orderid custid orderdate val diff
10411 10 2018-01-10 966.80 -570.184166
10743 4 2018-11-17 319.20 -809.813636
11075 68 2019-05-06 498.10 -1546.297500
10388 72 2017-12-19 1228.80 -358.864285
10720 61 2018-10-28 550.00 -144.744285
11052 34 2019-04-27 1332.00 -1164.397500
10457 39 2018-02-25 1584.00 -797.999166
10789 23 2018-12-22 3687.00 1567.833334
10434 24 2018-02-03 321.12 -1329.582352
10766 56 2018-12-05 2310.00 1015.105000
...
Using nested row number functions, the task is achievable with a single query, like so:
SELECT orderid, custid, orderdate, val,
val - AVG( CASE
WHEN ROW_NUMBER(FRAME_ROW) NOT IN
( ROW_NUMBER(BEGIN_PARTITION), ROW_NUMBER(END_PARTITION) ) THEN val
END )
OVER( PARTITION BY custid
ORDER BY orderdate, orderid
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS diff
FROM Sales.OrderValues;
Also, the currently supported solution requires at least one sort in the plan, and multiple passes over the data. The
solution using nested row number functions has all the potential to get optimized with reliance on index order, and
a reduced number of passes over the data. This, of course, is implementation dependent though.
Nested value_of expression at row function
The nested value_of expression at row function enables you to interact with a value of an expression at the same
strategic row markers mentioned earlier in an argument of a window aggregate function. Here's the syntax of this
function:
(< argument with
VALUE OF AT [] [, ]
) OVER()
10/24/2019 Nested window functions in SQL - SQLPerformance.com
https://sqlperformance.com/2019/07/sql-performance/nested-window-functions-in-sql 4/7
As you can see, you can specify a certain negative or positive delta with respect to the row marker, and optionally
provide a default value in case a row doesn't exist at the specified position.
This capability gives you a lot of power when you need to interact with different points in windowing elements.
Consider the fact that as powerful as window functions may be compared to alternative tools like subqueries, what
window functions don't support is a basic concept of a correlation. Using the CURRENT_ROW marker you get access
to the outer row, and this way emulate correlations. At the same time you get to benefit from all of the advantages
that window functions have compared to subqueries.
As an example, suppose that you need to query the Sales.OrderValues view, and return for each order some of its
attributes, as well as the difference between the current order value and the customer average, but excluding
orders placed on the same date as the current order date. This requires a capability similar to a correlation. With the
nested value_of expression at row function, using the CURRENT_ROW marker, this is achievable easily like so:
SELECT orderid, custid, orderdate, val,
val - AVG( CASE WHEN orderdate <> VALUE OF orderdate AT CURRENT_ROW THEN val END )
OVER( PARTITION BY custid ) AS diff
FROM Sales.OrderValues;
This query is supposed to generate the following output:
orderid custid orderdate val diff
10248 85 2017-07-04 440.00 180.000000
10249 79 2017-07-05 1863.40 1280.452000
10250 34 2017-07-08 1552.60 -854.228461
10251 84 2017-07-08 654.06 -293.536666
10252 76 2017-07-09 3597.90 1735.092728
10253 34 2017-07-10 1444.80 -970.320769
10254 14 2017-07-11 556.62 -1127.988571
10255 68 2017-07-12 2490.50 617.913334
10256 88 2017-07-15 517.80 -176.000000
10257 35 2017-07-16 1119.90 -153.562352
...
If you're thinking that this task is achievable just as easily with correlated subqueries, in this simplistic case you'd be
right. The same can be achieved with the following query:
SELECT O1.orderid, O1.custid, O1.orderdate, O1.val,
O1.val - ( SELECT AVG(O2.val)
FROM Sales.OrderValues AS O2
WHERE O2.custid = O1.custid
AND O2.orderdate <> O1.orderdate ) AS diff
FROM Sales.OrderValues AS O1;
However, remember that a subquery operates on an independent view of the data, whereas a window function
operates on the set that is provided as input to the logical query processing step that handles the SELECT clause.
Usually, the underlying query has extra logic like joins, filters, grouping, and such. With subqueries, you either need
to prepare a preliminary CTE, or repeat the logic of the underlying query also in the subquery. With window
functions, there's no need to repeat any of the logic.
For example, say that you were supposed to operate only on shipped orders (where the shipped date is not NULL)
that were handled by employee 3. The solution with the window function needs to add the filter predicates only
once, like so:
10/24/2019 Nested window functions in SQL - SQLPerformance.com
https://sqlperformance.com/2019/07/sql-performance/nested-window-functions-in-sql 5/7
SELECT orderid, custid, orderdate, val,
val - AVG( CASE WHEN orderdate <> VALUE OF orderdate AT CURRENT_ROW THEN val END )
OVER( PARTITION BY custid ) AS diff
FROM Sales.OrderValues
WHERE empid = 3 AND shippeddate IS NOT NULL;
This query is supposed to generate the following output:
orderid custid orderdate val diff
10251 84 2017-07-08 654.06 -459.965000
10253 34 2017-07-10 1444.80 531.733334
10256 88 2017-07-15 517.80 -1022.020000
10266 87 2017-07-26 346.56 NULL
10273 63 2017-08-05 2037.28 -3149.075000
10283 46 2017-08-16 1414.80 534.300000
10309 37 2017-09-19 1762.00 -1951.262500
10321 38 2017-10-03 144.00 NULL
10330 46 2017-10-16 1649.00 885.600000
10332 51 2017-10-17 1786.88 495.830000
...
The solution with the subquery needs to add the filter predicates twice—once in the outer query and once in the
subquery—like so:
SELECT O1.orderid, O1.custid, O1.orderdate, O1.val,
O1.val - ( SELECT AVG(O2.val)
FROM Sales.OrderValues AS O2
WHERE O2.custid = O1.custid
AND O2.orderdate <> O1.orderdate
AND empid = 3
AND shippeddate IS NOT NULL) AS diff
FROM Sales.OrderValues AS O1
WHERE empid = 3 AND shippeddate IS NOT NULL;
It's either this, or adding a preliminary CTE that takes care of all of the filtering and any other logic. Anyway you
look at it, with subqueries, there are more complexity layers involved.
The other benefit in nested window functions is that if we had support for those in T-SQL, it would have been easy
to emulate the missing full support for the RANGE window frame unit. The RANGE option is supposed to enable you
to define dynamic frames that are based on an offset from the ordering value in the current row. For example,
suppose that you need to compute for each customer order from the Sales.OrderValues view the moving average
value of the last 14 days. According to the SQL standard, you can achieve this using the RANGE option and the
INTERVAL type, like so:
SELECT orderid, custid, orderdate, val,
AVG(val) OVER( PARTITION BY custid
ORDER BY orderdate
RANGE BETWEEN INTERVAL '13' DAY PRECEDING
AND CURRENT ROW ) AS movingavg14days
FROM Sales.OrderValues;
This query is supposed to generate the following output:
10/24/2019 Nested window functions in SQL - SQLPerformance.com
https://sqlperformance.com/2019/07/sql-performance/nested-window-functions-in-sql 6/7
Share this post: Facebook Twitter LinkedIn
orderid custid orderdate val movingavg14days
10643 1 2018-08-25 814.50 814.500000
10692 1 2018-10-03 878.00 878.000000
10702 1 2018-10-13 330.00 604.000000
10835 1 2019-01-15 845.80 845.800000
10952 1 2019-03-16 471.20 471.200000
11011 1 2019-04-09 933.50 933.500000
10308 2 2017-09-18 88.80 88.800000
10625 2 2018-08-08 479.75 479.750000
10759 2 2018-11-28 320.00 320.000000
10926 2 2019-03-04 514.40 514.400000
10365 3 2017-11-27 403.20 403.200000
10507 3 2018-04-15 749.06 749.060000
10535 3 2018-05-13 1940.85 1940.850000
10573 3 2018-06-19 2082.00 2082.000000
10677 3 2018-09-22 813.37 813.370000
10682 3 2018-09-25 375.50 594.435000
10856 3 2019-01-28 660.00 660.000000
...
At the date of this writing, this syntax is not supported in T-SQL. If we had support for nested window functions in TSQL,
you would have been able to emulate this query with the following code:
SELECT orderid, custid, orderdate, val,
AVG( CASE WHEN DATEDIFF(day, orderdate, VALUE OF orderdate AT CURRENT_ROW)
BETWEEN 0 AND 13
THEN val END )
OVER( PARTITION BY custid
ORDER BY orderdate
RANGE UNBOUNDED PRECEDING ) AS movingavg14days
FROM Sales.OrderValues;
What's not to like?
如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!
赞0
添加新评论0 条评论