Key Word in Context

KWIC is an acronym for Key Word In Context, the most common format for concordance lines. The term KWIC was first coined by Hans Peter Luhn. The system was based on a concept called keyword in titles which was first proposed for Manchester libraries in 1864 by Andrea Crestadoro. A KWIC index is formed by sorting and aligning the words within an article title to allow each word (except the stop words) in titles to be searchable alphabetically in the index. It was a useful indexing method for technical manuals before computerized full text search became common.

An Overview of KWIC

KWIC a classic problem in Software Engineering papers in which you try to create an index of words by sorting and aligning each word in a piece of text. David Parnas has a famous paper on modularity that uses KWIC as an example.
There are some basic and advanced implementations of KWIC in different platforms but the main steps are:
  • Reading the input
  • Shifting the words in each line to get a new permutation
  • Sorting the results
  • Writing the output
Interestingly, in this case the output of each step is the input of the next step which makes this a good candidate for the Pipes and Filters pattern
kwicpipe.png

Implement the Pipes and Filters Pattern with Generics

Implementing the Pipes and Filters in the .NET Framework is based on a Generic interface and a Generic class. The Generic interface simulates the filter and the Generic class simulates the pipeline.
kwicpipe1.png
The IOperation interface has a single method called Execute that is the implementation of the filter logic. Each filter should implement this interface.

1. using System.Collections.Generic;
2. namespace KwicPipesFilters
3. {
4.public interface IOperation<T>
5.{
6.IEnumerable<T> Execute(IEnumerable<T> input);
7.}
8. }


The use of a generic IEnumerable is a good choice because it leaves a lot of space for the developers to plug in any type that they want and use various types for their filters.

The Pipeline class has an Execute and a Register method. Using the Register method, you add different filters to the pipeline and using the Execute method, you start processing the item in all the registered filters.

01. using System.Collections.Generic;
02. namespace KwicPipesFilters
03. {
04.public class Pipeline<T>
05.{
06.private readonly List<IOperation<T>> operations = new List<IOperation<T>>();
07.public Pipeline<T> Register(IOperation<T> operation)
08.{
09.operations.Add(operation);
10.return this;
11.}
12.public void Execute()
13.{
14.IEnumerable<T> current = new List<T>();
15.foreach (IOperation<T> operation in operations)
16.{
17.current = operation.Execute(current);
18.}
19.IEnumerator<T> enumerator = current.GetEnumerator();
20.while (enumerator.MoveNext());
21.}
22.}
23. }


The implementation of the Pipeline class is straightforward: it keeps a list of filters and provides a Register function that lets you add new filters to your pipeline, and then use the Execute method to execute all the filters in the list to process an input.


Reader

The Reader filter reads the input text from a file and returns an IEnumerable list of lines. Of course, for the first filter in the pipe we don’t care about the input as the input is read inside the filter itself.
01. using System;
02. using System.Collections.Generic;
03. using System.IO;
04. namespace KwicPipesFilters
05. {
06.public class Reader : IOperation<string>
07.{
08.public IEnumerable<string> Execute(IEnumerable<string> input)
09.{
10.Console.Title = "Pipes and Filters Pattern in .NET";
11.Console.WriteLine("Enter the path of the file:");
12.return File.ReadLines(Console.ReadLine());
13.}
14.}
15. }

Shifter


The Shifter filter is where the main logic of the KWIC application is implemented. It shifts the words in each line to find all the possible permutations suitable for the index.

01. using System.Collections.Generic;
02. namespace KwicPipesFilters
03. {
04.public class Shifter : IOperation<string>
05.{
06.public IEnumerable<string> Execute(IEnumerable<string> input)
07.{
08.List<string> shifts = new List<string>();
09.foreach (string line in input)
10.{
11.string[] words = line.Split(new char[] { ' ' });
12.for (int i = 0; i <= words.Length - 1; i++)
13.{
14.shifts.Add(string.Join(" ", words));
15.string firstWord = words[0];
16.for (int j = 1; j <= words.Length - 1; j++)
17.{
18.words.SetValue(words[j], j - 1);
19.}
20.words.SetValue(firstWord, words.Length - 1);
21.}
22.}
23.return shifts;
24.}
25.}
26. }


Here we have a basic implementation of the Shifter filter where we split the line into separate words based on the space between them, then shift all the words to find various permutations.


Sorter

Before returning the final results in the Writer filter, we need to sort the index alphabetically. This is done in the Sorter filter.

01. using System.Collections.Generic;
02. using System.Linq;
03. namespace KwicPipesFilters
04. {
05.public class Sorter : IOperation<string>
06.{
07.public IEnumerable<string> Execute(IEnumerable<string> input)
08.{
09.LineComparer lineComparer = new LineComparer();
10.input.ToList<string>().Sort(lineComparer);
11.return input;
12.}
13.}
14. }

Here a LineComparer class is used to implement the ICcomparer interface for the string type.
01. using System.Collections.Generic;
02. namespace KwicPipesFilters
03. {
04.public class LineComparer : IComparer<string>
05.{
06.public int Compare(string x, string y)
07.{
08.return string.Compare(x, y);
09.}
10.}
11. }

Writer

Obviously, the last filter should write the index to the output for the user and that’s the purpose of the Writer filter.

01. using System;
02. using System.Collections.Generic;
03. namespace KwicPipesFilters
04. {
05.public class Writer : IOperation<string>
06.{
07.public IEnumerable<string> Execute(IEnumerable<string> input)
08.{
09.foreach (string line in input)
10.{
11.Console.WriteLine();
12.Console.WriteLine(line);
13.}
14.Console.ReadLine();
15.yield break;
16.}
17.}
18. }

As you see, this filter uses a yield break to avoid returning any result.

Pipeline

Having all the filter implemented, it needs to implement the pipeline itself in order to register the filters and make the whole thing work. We do this in the KwicPipeline class with a simple code that it has.

01. namespace KwicPipesFilters
02. {
03.public class KwicPipeline : Pipeline<string>
04.{
05.public KwicPipeline()
06.{
07.Register(new Reader());
08.Register(new Shifter());
09.Register(new Sorter());
10.Register(new Writer());
11.}
12.}
13. }

WE inherit from the Pipeline class and register the filters in the public constructor.
Putting It TogetherThere is only one step remained and that is putting all these things together to start the pipeline. All we need to do is to create an instance of the KwicPipeline class, call its Execute method, and leave the rest to the pipes and filters

01. namespace KwicPipesFilters
02. {
03.class Program
04.{
05.static void Main(string[] args)
06.{
07.KwicPipeline pipeline = new KwicPipeline();
08.pipeline.Execute();
09.}
10.}
11. }

Conclusion

This is the Pipes and Filters pattern in the .NET Framework using a simple and generalized technique that relies on Generics to implement the KWIC application. This is one of the best ways to implement this pattern in the .NET Framework. Of course, there are other techniques to implement this pattern in .NET and one specific technique, for example, using the Windows Workflow Foundation.