Understanding the LINQ JOIN Operator in C#

Article Summary

The LINQ JOIN operator combines data from two collections or data sources based on matching keys, performing an inner join by default. This article covers query expression syntax, fluent syntax, and using JOIN with Entity Framework. You'll gain a clear, practical understanding of how to apply the LINQ JOIN operator across real C# scenarios.

LINQ (Language Integrated Query) is a part of ADO.NET, which is Microsoft’s latest and most recommended data access technology. LINQ is a high level alternative to traditional SQL (Structured Query Language). However, Learning SQL is still considered a prerequisite for entering database administration.

Want to learn SQL? Take a SQL course right now!

LINQ has several advantages over the old SQL. LINQ allows developers to query in-memory objects as well as remote data via an extremely simple and robust interface. LINQ queries are strongly typed and have an integrated Intellisense feature. This means that you can write your LINQ code in Visual Studio and Microsoft Intellisense assists you with proper syntax. Traditional SQL, on the other hand, lacks these advanced features.

New to LINQ? Take this course and start coding today!.

LINQ JOIN Operator

If you’re familiar with traditional SQL JOINs, you are already well equipped for understanding the LINQ JOIN operator. In SQL, JOIN operators are used to fetch data from two or more tables based on a common column. In LINQ, the concept is further extended to collections in addition to database tables.

Implementing LINQ JOIN on In-Memory Collections

The LINQ join operator can be used with two or more collections that implement the IEnumerable or IEnumerable<T> interface. These collections can be simple arrays, generic lists, Dictionaries or Hashtables. A JOIN query by default performs an inner join on the two collections.

In the following example, there are two classes: Patient and TestReport. For example, in a clinic or a hospital, a patient can have many test reports. The structures of these two classes are as follows:

class Patient
{
public int id;
public string firstname;
public string lastname;
public int age;
public Patient(int id, string fname, string lname, int age)
{
this.id = id;
this.firstname = fname;
this.lastname = lname;
this.age = age;
}
}
class TestReport
{
public int id;
public string testtype;
public string remarks;
public int patientid;
public TestReport(int id, string test, string remarks, int pid)
{
this.id = id;
this.testtype = test;
this.remarks = remarks;
this.patientid = pid;
}
}

The relationship between the Patient type and the TestReport type is specified by the patientid field of the TestReport type. Suppose there are two arrays of type Patient and TestReport and the arrays hold multiple instances of these types as follows:

Patient [] patients = new Patient[]
{
new Patient(1, "James", "Milner", 45),
new Patient(2, "Chris", "Foster", 50),
new Patient(3, "Mikel", "Obi", 68),
new Patient(4, "Tania", "Grey", 35),
};
TestReport[] testreports = new TestReport[]
{
new TestReport(34, "Blood CP", "Fit", 2),
new TestReport(23, "X-Ray", "Fit", 3),
new TestReport(12, "Urine", "Unfit", 7),
new TestReport(32, "AIDS", "Fit", 3)
};

The ‘patients’ array contains four patients and the ‘testreports’ array contains four test reports. Now, in order to get the records from both of these collections at once, the JOIN operator can be used. For instance, in order to get the first names of all the patients who have any test reports, along with the name of the test type and doctor’s remarks, the following LINQ JOIN query can be used:

var JoinResult = (from p in patients
join t in testreports on p.id equals t.patientid
select new {p.firstname, t.testtype, t.remarks});

This LINQ query says that from all the Patient types in the patients collections, join those TestReport types from the testreports collection where the id field of the Patient type is equal to patientid field of the TestReport type.

This forms a group of all those Patient instances which have a corresponding TestReport instance. Patient instances with id 1, 4 would be ignored from the resultant group because they don’t have a corresponding patientid in TestReport instances. Similarly, TestReport instances with patientid 7 would be ignored because it doesn’t have any corresponding id in the Patient instances. This is basic principle of the inner JOIN operator, only those fields from both collections which have some common field between them are included in the resultant collection.

Finding this tutorial difficult? Take this introductory course in C#

The last statement of the above query creates a new anonymous IEnumerable<T> collection, which contains the firstname field from the Patient instances and testtype and remarks field from TestReport instances. This collection is stored in a JoinResult variable which can be enumerated to get the values as follows:

foreach (var group in JoinResult)
{
Console.WriteLine(group.firstname +" | "+ group.testtype +" | "+       group.remarks);
}

LINQ JOIN in Fluent Syntax

The LINQ JOIN query explained in the last section was implanted in a query expression. However, like all other LINQ query operators, the JOIN operator can also be implemented in fluent syntax. The JOIN query in the last section can be written in fluent syntax as follows:

var JoinResult2 =   patients.Join
(
testreports, p => p.id, t => t.patientid,
(p, t) => new { p.firstname, t.testtype, t.remarks }
);

The fluent query would yield the same result by returning collections of the firstname field of Patient instances and testtype and remarks fields of TestReport instances.

Implementing JOIN in LINQ to Entity Framework

In querying database tables, the JOIN operator plays an integral role as primary keys and foreign keys are implemented via JOIN operators. In LINQ to SQL and EF, associations can be used to query database tables, but JOIN is still considered affective in fetching data from multiple tables in EF model.

Suppose there are two tables in a database: Patient and Test where one patient can have multiple tests. The relationship between the Patient and Test tables is as follows:

Data contained by Patient and Test tables is as follows:

Patient Table

Test Table

The PatientID column in the Test table is a foreign key, which refers to ID column of the Patient table. You can see that every PatientID in Test table has a corresponding ID in Patient table. It means that implementing JOIN operator on these two tables will yield all the records from Test table but only those records from Patient table where ID corresponds to PatientId in Test table. The LINQ JOIN operator can be used to fetch FirstName from Patient table and Name and Remarks from the Test table as shown below:

var result = (from p in hospitaldb.Patients
join t in hospitaldb.Tests on p.ID equals t.PatientID
select new {p.FirstName, t.Name, t.Remarks});

In the above JOIN query, hospitaldb is the DataContext object for the entity framework. It can have any name. This query would yield a collection of objects containing FirstName values from Patient table and Name and Remarks values from Test tables.

For complete C# and LINQ tutorials, take a course at Udemy.com

Understanding the LINQ JOIN Operator in C#

Article Summary

Share article: