Combining “like” rows


Combining “like” rows



I have a table of locations_from, locations_to, and count of flights. I want to combine the rows where departure on one row equals arrival on the other (for example LA TO NY combined with NY TO LA) and SUM the like rows.



I think it would be better explained with an example.



BEFORE


locations_from locations_to # of Flights
--------------------------------------------------
San Francisco, CA Los Angeles, CA 29558
Los Angeles, CA San Francisco, CA 32389
New York, NY Los Angeles, CA 30389
Los Angeles, CA New York, NY 35484
Las Vegas, NV Los Angeles, CA 28363
Los Angeles, CA Las Vegas, NV 34455
Honolulu, HI Kahului, HI 46563
Kahului, HI Honolulu, HI 16879
San Francisco, CA New York, NY 44654
New York, NY San Francisco, CA 25882



AFTER


From/To From/To # of Flights
---------------------------------------------------
San Francisco, CA Los Angeles, CA 61947
New York, NY Los Angeles, CA 65873
Las Vegas, NV Los Angeles, CA 62818
Honolulu, HI Kahului, HI 63442
San Francisco, CA New York, NY 70536



I have tried a cross join on itself


where a.locations_from = b.locations_to



this works but I end up getting double the rows (IE one for LA TO NY and also one for NY TO LA)





Just to clarify, you just need the number of flights between 2 cities. The from/to city doesn't matter, correct?
– Ankur Patel
Jul 1 at 7:01





yes, that's correct
– pluck
Jul 1 at 7:05





Can you add a table schema. I am basically looking for a primary key in your table. So basically besides the from, to and no of flights column, do you have a primary key column?
– Ankur Patel
Jul 1 at 7:11





3 Answers
3



Simply Use CASE ... END to "sort" the airports and then group by that "sorted" pair.


CASE ... END


SELECT CASE
WHEN [locations_from] > [locations_to]
THEN [locations_to]
ELSE
[locations_from]
END [From/To],
CASE
WHEN [locations_from] > [locations_to]
THEN [locations_from]
ELSE
[locations_to]
END [From/To],
sum([# of Flights]) [# of Flights]
FROM elbat
GROUP BY CASE
WHEN [locations_from] > [locations_to]
THEN [locations_to]
ELSE
[locations_from]
END,
CASE
WHEN [locations_from] > [locations_to]
THEN [locations_from]
ELSE
[locations_to]
END;



SQL Fiddle



And here is also an alternative solution using a FULL JOIN on the opposite direction.


FULL JOIN


SELECT coalesce(t1.[locations_from], t2.locations_from) [locations_from],
coalesce(t1.[locations_to], t2.locations_from) [locations_to],
coalesce(t1.[# of Flights], 0) + coalesce(t2.[# of Flights], 0) [# of Flights]
FROM elbat t1
FULL JOIN elbat t2
ON t2.[locations_from] = t1.[locations_to]
AND t2.[locations_to] = t1.[locations_from]
WHERE (t1.[locations_from] IS NULL
AND t1.[locations_to] IS NULL
OR t1.[locations_from] < t1.[locations_to])
AND (t2.[locations_from] IS NULL
AND t2.[locations_to] IS NULL
OR t2.[locations_from] > t2.[locations_to]);



SQL Fiddle





@HABO: Works for me. SQL Fiddle It gives me 84 = 42 + 42 flights in total. I don't see any fun.
– sticky bit
Jul 1 at 17:09





Apologies, my bad. Apparently I can't read all the way across an output row.
– HABO
Jul 1 at 20:08



Using union


Declare @YourTable Table (SomeRowID int,[locations_from] varchar(50),[locations_to] varchar(50),[# of Flights] int)
Insert Into @YourTable Values
(1,'San Francisco, CA','Los Angeles, CA',29558)
,(2,'Los Angeles, CA','San Francisco, CA',32389)
,(3,'New York, NY','Los Angeles, CA',30389)
,(4,'Los Angeles, CA','New York, NY',35484)
,(5,'Las Vegas, NV','Los Angeles, CA',28363)
,(6,'Los Angeles, CA','Las Vegas, NV',34455)
,(7,'Honolulu, HI','Kahului, HI',46563)
,(8,'Kahului, HI','Honolulu, HI',16879)
,(9,'San Francisco, CA','New York, NY',44654)
,(10,'New York, NY','San Francisco, CA',25882);

select [locations_from], [locations_to], sum([# of Flights])
from
(
select [locations_from], [locations_to], [# of Flights]
from @YourTable
where [locations_from] < [locations_to]
union all
select [locations_to], [locations_from], [# of Flights]
from @YourTable
where [locations_from] > [locations_to]
) t
group by [locations_from], [locations_to]





Yup. Even better. +1 However, one should <= or >= just in case of an emergency turn-around NY to NY :)
– John Cappelletti
13 hours ago



Just another option using a CROSS APPLY and a conditional aggregation


CROSS APPLY



This examples assumes you have some sort of RowID



Example


Declare @YourTable Table (SomeRowID int,[locations_from] varchar(50),[locations_to] varchar(50),[# of Flights] int)
Insert Into @YourTable Values
(1,'San Francisco, CA','Los Angeles, CA',29558)
,(2,'Los Angeles, CA','San Francisco, CA',32389)
,(3,'New York, NY','Los Angeles, CA',30389)
,(4,'Los Angeles, CA','New York, NY',35484)
,(5,'Las Vegas, NV','Los Angeles, CA',28363)
,(6,'Los Angeles, CA','Las Vegas, NV',34455)
,(7,'Honolulu, HI','Kahului, HI',46563)
,(8,'Kahului, HI','Honolulu, HI',16879)
,(9,'San Francisco, CA','New York, NY',44654)
,(10,'New York, NY','San Francisco, CA',25882)

;with cte as (
Select A.SomeRowID
,Loc1 = min(Loc)
,Loc2 = max(Loc)
,Flights = sum(Val)
From @YourTable A
Cross Apply ( values ([locations_from],[# of Flights])
,([locations_to] ,0)
) B (Loc,Val)
Group By A.SomeRowID
)
Select Loc1
,Loc2
,Flights=sum(Flights)
From cte
Group By Loc1,Loc2



Returns


Loc1 Loc2 Flights
Honolulu, HI Kahului, HI 63442
Las Vegas, NV Los Angeles, CA 62818
Los Angeles, CA New York, NY 65873
Los Angeles, CA San Francisco, CA 61947
New York, NY San Francisco, CA 70536






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

PySpark - SparkContext: Error initializing SparkContext File does not exist

django NoReverseMatch Exception

List of Kim Possible characters