How to select only first ROW_NUMBER combined with SUM


How to select only first ROW_NUMBER combined with SUM



I like to group my table by [ID] while using SUM and also bring back
[Product_Name] of the top ROW_NUMBER - not sure if I should use ROW_NUMBER, GROUPING SETS or loop through everything with FETCH... this is what I tried:


ROW_NUMBER


ROW_NUMBER


GROUPING SETS


DECLARE @SampleTable TABLE
(
[ID] INT,
[Price] MONEY,
[Product_Name] VARCHAR(50)
)

INSERT INTO @SampleTable
VALUES (1, 100, 'Product_1'), (1, 200, 'Product_2'),
(1, 300, 'Product_3'), (2, 500, 'Product_4'),
(2, 200, 'Product_5'), (2, 300, 'Product_6');

SELECT
[ID],
[Product_Name],
[Price],
SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total],
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [ID]) AS [Row_Number]
FROM
@SampleTable T1



My desired results - only two records:


1 Product_1 100.00 600.00 1
2 Product_4 500.00 1000.00 1



Any help or guidance is highly appreciated.



UPDATE:
I end up using what Prateek Sharma suggested in his comment, to simply wrap the query with another SELECT WHERE [Row_Number] = 1


SELECT * FROM
(
SELECT
[ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
,ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [ID]) AS [Row_Number]
FROM @SampleTable
) MultipleRows
WHERE [Row_Number] = 1





BTW - Nicely formatted question. Easy copy/paste
– John Cappelletti
Jun 30 at 16:43





There is no " top ROW_NUMBER" unless you have a column that defines ordering.
– Martin Smith
Jun 30 at 17:05





I mean top row from the result set (in case there is more than one with the same ID)
– Yovav
Jun 30 at 17:17





As ordered by what? There is no guaranteed top row from the resultset without any ordering criteria applied
– Martin Smith
Jun 30 at 17:18





Thanks Martin, I see your point now, in my case it is not important but good point.
– Yovav
Jun 30 at 17:35




3 Answers
3



You should have a column on which you will perform ORDER BY for ROW_NUMBER(). In this case if you want to only rely on the table self index then it's OK to use ID column for ORDER BY.


ORDER BY


ROW_NUMBER()



Hence your query is correct and you can go with it.



Other option is to use WITH TIES clause. BUT again, If you will use WITH TIES clause with the ORDER BY on ID column then performance will be very poor. WITH TIES only performs well if you have well defined index. And, then can use that indexed column with WITH TIES clause.


WITH TIES


WITH TIES


SELECT TOP 1 WITH TIES *
FROM (
SELECT [ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
FROM @SampleTable
) TAB
ORDER BY ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY <IndexedColumn> DESC)



This query may help you bit. But remember, it is also not going to provide better performance than the query written by you. It is only reducing the line of code.





I was trying to add WHERE [Row_Number] = 1 to pick up just the top level of results - but it's not working...
– Yovav
Jun 30 at 17:22





Use This - SELECT * FROM ( SELECT [ID] ,[Product_Name] ,[Price] ,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total] ,ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [ID]) AS [Row_Number] FROM @SampleTable ) TAB WHERE [Row_Number] = 1
– Prateek Sharma
Jun 30 at 17:26





Oh cool - that works too! maybe more efficient from WITH TIES? checking...
– Yovav
Jun 30 at 17:32





WITH TIES is not efficient, that's why DBA max time avoid to use it. When your DB has a proper structure, with all the best practices used in your DB architecture, then also only for the complex queries we use it.You can give it a try.
– Prateek Sharma
Jun 30 at 17:38



One option is using the WITH TIES clause. No extra field RN.



Hopefully, you have a proper sequence number or date which can be used in either the sum() over or in the final row_number() over


sum() over


row_number() over



Example


SELECT Top 1 with ties *
From (
Select [ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
FROM @SampleTable T1
) A
Order By ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [Price_Total] Desc)



Returns


ID Product_Name Price Price_Total
1 Product_1 100.00 600.00
2 Product_4 500.00 1000.00





This is not efficient
– Martin Smith
Jun 30 at 17:05





Thanks, I didn't know you can do that :) I wonder why it may be not efficient...
– Yovav
Jun 30 at 17:33





@Yovav I like the WITH TIES. There are many ways to do this. For example a subquery to get Price Total by ID. Also as a cte or sub-query where RN=1.
– John Cappelletti
Jun 30 at 17:36





@Yovav RE: efficiency because a simple filter on row number = 1 is more efficient than calculating the row number anyway and then sorting the whole result set with that as a sort column.
– Martin Smith
Jun 30 at 17:37






@Yovav Martin is indeed correct. However, in my benchmark tesing, the with ties was a very close second. The WITH TIES is just another tool in your belt.
– John Cappelletti
Jun 30 at 17:41



There is no "top ROW_NUMBER" unless you have a column that defines ordering.



If you just want an arbitary row per id you can use the below. To deterministically pick one you would need to order by deterministic unique criteria.


DECLARE @SampleTable TABLE
(
ID INT,
Price MONEY,
Product_Name VARCHAR(50),
INDEX cix CLUSTERED (ID)
);

INSERT INTO @SampleTable
VALUES (1,100,'Product_1'),
(1,200,'Product_2'),
(1,300,'Product_3'),
(2,500,'Product_4'),
(2,200,'Product_5'),
(2,300,'Product_6');


WITH T AS
(
SELECT *,
OrderingColumn = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM @SampleTable
)

SELECT ID,
SUBSTRING(MIN(CONCAT(STR(OrderingColumn), Product_Name)), 11, 50) AS Product_Name,
CAST(SUBSTRING(MIN(CONCAT(STR(OrderingColumn), Price)), 11, 50) AS MONEY) AS Price,
SUM(Price) AS Price_Total
FROM T
GROUP BY ID



The plan for this is pretty efficient as it is able to use the index ordered by id and has no additional sorts, spools, or passes through the table.


id



enter image description here





Hi, what I mean is - if there is more than one result for the same ID then - the first result is the top record that I would like to get for all the data that is not summed up (like [Product_Name] etc.)
– Yovav
Jun 30 at 17:26





@Yovav - you keep saying "first" and "top" but you haven't addressed the point - which multiple people have told you - that there is no inherent "first" or "top"
– Martin Smith
Jun 30 at 17:28





Thanks for the answer and yes, it wasn't clear enough, in my case I just needed the first record that appears (the top level where [Row_Number] = 1) without any particular ordering, but I will probably put some good use to your answer at some point too.
– Yovav
Jun 30 at 18:17






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

PySpark - SparkContext: Error initializing SparkContext File does not exist

List of Kim Possible characters

Python Tkinter Error, “Too Early to Create Image”