History of the one LINQ to Entities query

From last week, our approach to reporting has been changed. Now we use ADO.NET Data Services on top of the Entity Framework Model. Today was the first day, when I was not able to express query in terms of the ADO.NET Data Services query options.

Often this is not a problem because Data Services supports awesome feature - Service Operations. Even more exiting that methods return a IQueryable<T> automatically get support for all query options like $filter and $orderby. So what I need is some code that will retrieve required data and then return it as a IQueryable<T>.

The problem

Model

I have regular Order model. Customer has many Orders, Order has many OrderItems, many OrderItems have Product. Pretty common.

I need to retrieve all Products that Customer bought. With SQL, it is pretty easy with few joins. What about LINQ?

Short #1. Direct translation of the SQL to LINQ

Well, it might look as the following:

var r = from c in ds.Customers
	join o in ds.Orders on c equals o.Customer
	join oi in ds.OrderItems on o equals oi.Order
	join p in ds.OrderProducts on oi.OrderProduct equals p
	where c.CustomerID == customerId
	select p;

Here ds is Entity Framework context. Very familiar. Imagine my disappointment when I saw stuff generated by Entity Framework.

exec sp_executesql N'SELECT 
	...
FROM    [Invoicing].[Customer] AS [Extent1]
INNER JOIN [Invoicing].[Order] AS [Extent2] ON  EXISTS (SELECT 
	cast(1 as bit) AS [C1]
	FROM    ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
	LEFT OUTER JOIN  (SELECT 
		[Extent3].[CustomerID] AS [CustomerID]
		FROM [Invoicing].[Customer] AS [Extent3]
		WHERE [Extent2].[CustomerID] = [Extent3].[CustomerID] ) 
			AS [Project1] ON 1 = 1
	...
	WHERE ([Extent1].[CustomerID] = [Project1].[CustomerID]) 
		OR (([Extent1].[CustomerID] IS NULL) 
		AND ([Project2].[CustomerID] IS NULL))
)
INNER JOIN [Invoicing].[OrderItem] AS [Extent5] ON  EXISTS (SELECT 
	...
)
INNER JOIN [Invoicing].[OrderProduct] AS [Extent8] ON  EXISTS (SELECT 
	...
)
WHERE [Extent1].[CustomerID] = @p__linq__1',N'@p__linq__1 uniqueidentifier',
	@p__linq__1='F5DD85CF-CE1E-E2D1-3171-650938ABD2B7'

Download full script if you interested in. The Execution Plan was terrible. Just terrible. I can send this plan as example of plan to avoid….

Of course, this is not acceptable. So next:

Short #2. Reversed order of the joins

May be change of the order will help?

var r = from p in ds.OrderProducts
	join oi in ds.OrderItems on p equals oi.OrderProduct
	join o in ds.Orders on oi.Order equals o
	join c in ds.Customers on o.Customer equals c
	where c.CustomerID == customerId
	select p;

No ;(. So next:

Short #3. Lets reduce joins

When doing LINQ, it’s probably worth to remember that you have dot notation. Dot notation is pretty powerfull. So I just rewrote where clause to reduce 75% of my joins.

var r = from p in ds.OrderProducts
	join oi in ds.OrderItems on p equals oi.OrderProduct
	where oi.Order.Customer.CustomerID == customerId
	select p;

Looks better? Sure. Generated stuff also looks much better.

exec sp_executesql N'SELECT 
...
FROM   [Invoicing].[OrderProduct] AS [Extent1]
INNER JOIN [Invoicing].[OrderItem] AS [Extent2] ON  EXISTS (SELECT 
	cast(1 as bit) AS [C1]
	FROM    ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
	LEFT OUTER JOIN  (SELECT 
		[Extent3].[OrderProductID] AS [OrderProductID]
		FROM [Invoicing].[OrderProduct] AS [Extent3]
		WHERE [Extent2].[OrderProductID] = [Extent3].[OrderProductID] ) 
			AS [Project1] ON 1 = 1
	LEFT OUTER JOIN  (SELECT 
		[Extent4].[OrderProductID] AS [OrderProductID]
		FROM [Invoicing].[OrderProduct] AS [Extent4]
		WHERE [Extent2].[OrderProductID] = [Extent4].[OrderProductID] ) 
			AS [Project2] ON 1 = 1
	WHERE ([Extent1].[OrderProductID] = [Project1].[OrderProductID]) 
		OR (([Extent1].[OrderProductID] IS NULL) 
			AND ([Project2].[OrderProductID] IS NULL))
)
INNER JOIN [Invoicing].[Order] AS [Extent5] 
	ON [Extent2].[OrderID] = [Extent5].[OrderID]
WHERE [Extent5].[CustomerID] = @p__linq__1',N'@p__linq__1 uniqueidentifier',
	@p__linq__1='F5DD85CF-CE1E-E2D1-3171-650938ABD2B7'

Only one crazy INNER JOIN. BTW, possible that I am not aware of something. I do not understand why Entity Framework generates INNER JOIN with EXISTS in response to LINQ joins. My expectation was just INNER JOIN. Need to spend some time to understand this. Anyway, the Execution Plan could be compared to manual.

After such success, I decided to go further. So next…

Short #4. What about JOIN by ID?

var r = from p in ds.OrderProducts
	join oi in ds.OrderItems 
		on p.OrderProductID equals oi.OrderProduct.OrderProductID
	where oi.Order.Customer.CustomerID == customerId
	select p;

Damn, this works!

exec sp_executesql N'SELECT 
...
FROM   [Invoicing].[OrderProduct] AS [Extent1]
INNER JOIN [Invoicing].[OrderItem] AS [Extent2] 
	ON ([Extent1].[OrderProductID] = [Extent2].[OrderProductID]) 
		OR (([Extent1].[OrderProductID] IS NULL) 
		AND ([Extent2].[OrderProductID] IS NULL))
INNER JOIN [Invoicing].[Order] AS [Extent3] 
	ON [Extent2].[OrderID] = [Extent3].[OrderID]
WHERE [Extent3].[CustomerID] = @p__linq__1',N'@p__linq__1 uniqueidentifier',
	@p__linq__1='F5DD85CF-CE1E-E2D1-3171-650938ABD2B7'

Now LINQ query looks not so cool. Hm. Entity Framework knows about primary keys. Why it cannot optimize this? I could not live with these results. So next:

Short #5. Without joins at all. Projections

We can remove last join with simple projection.

var r = from oi in ds.OrderItems
	where oi.Order.Customer.CustomerID == customerId
	select oi.OrderProduct;

And yes, it works. Now Entity Framework generates this:

exec sp_executesql N'SELECT 
...
FROM   [Invoicing].[OrderItem] AS [Extent1]
INNER JOIN [Invoicing].[Order] AS [Extent2] 
	ON [Extent1].[OrderID] = [Extent2].[OrderID]
LEFT OUTER JOIN [Invoicing].[OrderProduct] AS [Extent3] 
	ON [Extent1].[OrderProductID] = [Extent3].[OrderProductID]
WHERE [Extent2].[CustomerID] = @p__linq__2',N'@p__linq__2 uniqueidentifier',
	@p__linq__2='F5DD85CF-CE1E-E2D1-3171-650938ABD2B7'

Of course, execution plans look OK.

So can we stop here? No. At least not quite. The problem that all this queries are wrong… They do not distinct products. So next:

Short #6. Intuitive where clause

Believing in the power of the Entity Framework, I decided to write everything into the where clause:

var r = from p in ds.OrderProducts
	where p.OrderItems.Any(oi => oi.Order.Customer.CustomerID == customerId)
	select p;

Result

exec sp_executesql N'SELECT 
...
FROM [Invoicing].[OrderProduct] AS [Extent1]
WHERE  EXISTS (SELECT 
	cast(1 as bit) AS [C1]
	FROM  [Invoicing].[OrderItem] AS [Extent2]
	INNER JOIN [Invoicing].[Order] AS [Extent3] 
		ON [Extent2].[OrderID] = [Extent3].[OrderID]
	WHERE ([Extent1].[OrderProductID] = [Extent2].[OrderProductID]) 
		AND ([Extent3].[CustomerID] = @p__linq__1)
)',N'@p__linq__1 uniqueidentifier',@p__linq__1='F5DD85CF-CE1E-E2D1-3171-650938ABD2B7'

I am fully with these results. So this code will fly in production.

Conclusions

  1. Do not think in SQL terms when doing LINQ
  2. Do check results
  3. Reduce join clauses. Where clause has enough power
  4. Use projections. Entity Framework support them

Enjoy!

UPDATE Unfortunately, this post does not provide right solution. Please see The right way to do INNER JOIN in LINQ to Entity.