From last week, our approach to reporting has been changed. Now we use ADO.NET Data Services on top of the Entity Framework Model. Today was the first day, when I was not able to express query in terms of the ADO.NET Data Services query options.
Often this is not a problem because
Data Services supports awesome feature -
Service Operations. Even more exiting that methods return a
IQueryable<T> automatically get support for
all query options like $filter and
$orderby. So what I need is some code that will
retrieve required data and then return it as a
IQueryable<T>.
The problem
I have regular Order model. Customer has many Orders, Order has many OrderItems, many OrderItems have Product. Pretty common.
I need to retrieve all Products that Customer bought. With SQL, it is pretty easy with few joins. What about LINQ?
Short #1. Direct translation of the SQL to LINQ
Well, it might look as the following:
var r = from c in ds.Customers
join o in ds.Orders on c equals o.Customer
join oi in ds.OrderItems on o equals oi.Order
join p in ds.OrderProducts on oi.OrderProduct equals p
where c.CustomerID == customerId
select p;
Here ds is Entity Framework context. Very
familiar. Imagine my disappointment when I saw stuff generated
by Entity Framework.
exec sp_executesql N'SELECT
...
FROM [Invoicing].[Customer] AS [Extent1]
INNER JOIN [Invoicing].[Order] AS [Extent2] ON EXISTS (SELECT
cast(1 as bit) AS [C1]
FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent3].[CustomerID] AS [CustomerID]
FROM [Invoicing].[Customer] AS [Extent3]
WHERE [Extent2].[CustomerID] = [Extent3].[CustomerID] )
AS [Project1] ON 1 = 1
...
WHERE ([Extent1].[CustomerID] = [Project1].[CustomerID])
OR (([Extent1].[CustomerID] IS NULL)
AND ([Project2].[CustomerID] IS NULL))
)
INNER JOIN [Invoicing].[OrderItem] AS [Extent5] ON EXISTS (SELECT
...
)
INNER JOIN [Invoicing].[OrderProduct] AS [Extent8] ON EXISTS (SELECT
...
)
WHERE [Extent1].[CustomerID] = @p__linq__1',N'@p__linq__1 uniqueidentifier',
@p__linq__1='F5DD85CF-CE1E-E2D1-3171-650938ABD2B7'
Download full script if you interested in. The Execution Plan was terrible. Just terrible. I can send this plan as example of plan to avoid….
Of course, this is not acceptable. So next:
Short #2. Reversed order of the joins
May be change of the order will help?
var r = from p in ds.OrderProducts
join oi in ds.OrderItems on p equals oi.OrderProduct
join o in ds.Orders on oi.Order equals o
join c in ds.Customers on o.Customer equals c
where c.CustomerID == customerId
select p;
No ;(. So next:
Short #3. Lets reduce joins
When doing LINQ, it's probably worth to remember
that you have dot notation. Dot notation is pretty powerfull.
So I just rewrote where clause to reduce 75% of my joins.
var r = from p in ds.OrderProducts
join oi in ds.OrderItems on p equals oi.OrderProduct
where oi.Order.Customer.CustomerID == customerId
select p;
Looks better? Sure. Generated stuff also looks much better.
exec sp_executesql N'SELECT
...
FROM [Invoicing].[OrderProduct] AS [Extent1]
INNER JOIN [Invoicing].[OrderItem] AS [Extent2] ON EXISTS (SELECT
cast(1 as bit) AS [C1]
FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent3].[OrderProductID] AS [OrderProductID]
FROM [Invoicing].[OrderProduct] AS [Extent3]
WHERE [Extent2].[OrderProductID] = [Extent3].[OrderProductID] )
AS [Project1] ON 1 = 1
LEFT OUTER JOIN (SELECT
[Extent4].[OrderProductID] AS [OrderProductID]
FROM [Invoicing].[OrderProduct] AS [Extent4]
WHERE [Extent2].[OrderProductID] = [Extent4].[OrderProductID] )
AS [Project2] ON 1 = 1
WHERE ([Extent1].[OrderProductID] = [Project1].[OrderProductID])
OR (([Extent1].[OrderProductID] IS NULL)
AND ([Project2].[OrderProductID] IS NULL))
)
INNER JOIN [Invoicing].[Order] AS [Extent5]
ON [Extent2].[OrderID] = [Extent5].[OrderID]
WHERE [Extent5].[CustomerID] = @p__linq__1',N'@p__linq__1 uniqueidentifier',
@p__linq__1='F5DD85CF-CE1E-E2D1-3171-650938ABD2B7'
Only one crazy INNER JOIN. BTW, possible that I
am not aware of something. I do not understand why
Entity Framework generates
INNER JOIN with EXISTS in response
to LINQ joins. My expectation was
just INNER JOIN. Need to spend some time to
understand this. Anyway, the Execution Plan could be compared
to manual.
After such success, I decided to go further. So next…
Short #4. What about JOIN by ID?
var r = from p in ds.OrderProducts
join oi in ds.OrderItems
on p.OrderProductID equals oi.OrderProduct.OrderProductID
where oi.Order.Customer.CustomerID == customerId
select p;
Damn, this works!
exec sp_executesql N'SELECT
...
FROM [Invoicing].[OrderProduct] AS [Extent1]
INNER JOIN [Invoicing].[OrderItem] AS [Extent2]
ON ([Extent1].[OrderProductID] = [Extent2].[OrderProductID])
OR (([Extent1].[OrderProductID] IS NULL)
AND ([Extent2].[OrderProductID] IS NULL))
INNER JOIN [Invoicing].[Order] AS [Extent3]
ON [Extent2].[OrderID] = [Extent3].[OrderID]
WHERE [Extent3].[CustomerID] = @p__linq__1',N'@p__linq__1 uniqueidentifier',
@p__linq__1='F5DD85CF-CE1E-E2D1-3171-650938ABD2B7'
Now LINQ query looks not so cool. Hm. Entity Framework knows about primary keys. Why it cannot optimize this? I could not live with these results. So next:
Short #5. Without joins at all. Projections
We can remove last join with simple projection.
var r = from oi in ds.OrderItems
where oi.Order.Customer.CustomerID == customerId
select oi.OrderProduct;
And yes, it works. Now Entity Framework generates this:
exec sp_executesql N'SELECT
...
FROM [Invoicing].[OrderItem] AS [Extent1]
INNER JOIN [Invoicing].[Order] AS [Extent2]
ON [Extent1].[OrderID] = [Extent2].[OrderID]
LEFT OUTER JOIN [Invoicing].[OrderProduct] AS [Extent3]
ON [Extent1].[OrderProductID] = [Extent3].[OrderProductID]
WHERE [Extent2].[CustomerID] = @p__linq__2',N'@p__linq__2 uniqueidentifier',
@p__linq__2='F5DD85CF-CE1E-E2D1-3171-650938ABD2B7'
Of course, execution plans look OK.
So can we stop here? No. At least not quite. The problem that all this queries are wrong… They do not distinct products. So next:
Short #6. Intuitive where clause
Believing in the power of the Entity Framework, I decided to write everything into the where clause:
var r = from p in ds.OrderProducts
where p.OrderItems.Any(oi => oi.Order.Customer.CustomerID == customerId)
select p;
Result
exec sp_executesql N'SELECT
...
FROM [Invoicing].[OrderProduct] AS [Extent1]
WHERE EXISTS (SELECT
cast(1 as bit) AS [C1]
FROM [Invoicing].[OrderItem] AS [Extent2]
INNER JOIN [Invoicing].[Order] AS [Extent3]
ON [Extent2].[OrderID] = [Extent3].[OrderID]
WHERE ([Extent1].[OrderProductID] = [Extent2].[OrderProductID])
AND ([Extent3].[CustomerID] = @p__linq__1)
)',N'@p__linq__1 uniqueidentifier',@p__linq__1='F5DD85CF-CE1E-E2D1-3171-650938ABD2B7'
I am fully with these results. So this code will fly in production.
Conclusions
- Do not think in SQL terms when doing LINQ
- Do check results
-
Reduce
joinclauses.Whereclause has enough power - Use projections. Entity Framework support them
Enjoy!
UPDATE Unfortunately, this post does not provide right solution. Please see The right way to do INNER JOIN in LINQ to Entity.
For comments or feedback, write at x.com/chaliy.