It happens quite often that we assign a simple programming task for one reason or another. One thing we find is that our victims often look for the “right” answer when in reality there isn’t one. Sure some answers are better than others but even that can be subjective.
So whilst a group were busy scribbling away I thought I’d demonstrate the point by doing the same problem… in as many different ways as I could think of.
The Task
The task we set them was;
Given the string “We Wish You A Merry Xmas”, output a list of the unique letters.
So I put the string into a string variable called “leString” and got to work. My favourite implementation is this;
Console.WriteLine(String.Concat(leString.ToLower() .Where(x => char.IsLetter(x)).Distinct()));
If you understand a little about LINQ, particularly the extension method syntax, then this is about as clear and concise as you can get. It’s possibly not the fastest way of doing it, but speed of execution is not the only concern.
I then tried to guess how I thought a junior programmer would approach it. The result is this.
HashSet<char> alreadyThere = new HashSet<char>(); foreach (char c in leString) { char lc = char.ToLower(c); if (char.IsLetter(c) && !alreadyThere.Contains(lc)) { alreadyThere.Add(lc); Console.Write(lc); } } Console.WriteLine();
This is perfectly valid and indeed relatively quick. I don’t think it’s quite as obvious what it’s doing as using .Distinct(), but if you’re not a LINQ Ninja then it’s OK.
I was right here in that it’s this kind of approach that most of them went for, although we saw arrays, lists and dictionaries used to store the list of chars and nobody realised that you could output the chars as you went so the real results looked more like this;
List<char> alreadyThere = new List<char>(); foreach (char c in leString) { char lc = char.ToLower(c); if (char.IsLetter(c) && !alreadyThere.Contains(lc)) alreadyThere.Add(lc); } foreach (char c in alreadyThere) Console.Write(c); Console.WriteLine();
There’s a fundamentally different approach that we can take here. Essentially we want to remove duplicates. If we sort the string then all the duplicates will be next to each other. So all we have to do is iterate through the sorted string and avoid outputting the same char twice.
This is pretty easy to implement.
char lastchar = (char)0; foreach (char c in leString.ToLower() .Where(x => char.IsLetter(x)).OrderBy(n => n)) { if (lastchar != c) Console.Write(c); lastchar = c; } Console.WriteLine();
It’s a bit of a nasty mix of LINQ and more traditional iteration but I still think it’s pretty clear. We take the string, eliminate anything that isn’t a letter and then sort the result.
We then iterate over the result. If this iteration’s char is different from the last iteration’s char then we output it. If it’s the same, we don’t.
You can also implement this using a for loop, but you need to make the sorted result into an array or list so that you can get a count. This is all rather convoluted so the foreach is preferable in my opinion.
You can implement it in LINQ too. The problem is how to assign “lastChar” after it’s tested. I’ve used a nasty little hack here – in C# an assignment actually has a return type that is the value of the assignment. So lastchar = p actually returns the value of p as well as assigning it to lastchar – hence the final .Select.
char lastchar = (char)0; Console.WriteLine(String.Concat(leString.ToLower() .OrderBy(n => n) .Where(x => char.IsLetter(x) && lastchar != x) .Select(p => lastchar = p)));
This isn’t good code, it’s not really clear what it’s doing.
Here’s another slightly leftfield LINQ based approach. In C# the ++ actually returns a value, if it’s used postfix (i.e. i++) the value returned is what i was before it was incremented. If it’s used prefix (i.e. ++i) then it’s the value after it’s incremented.
We can use this little gem to index into a counter array…
int[] hits = new int[26]; Console.WriteLine(String.Concat(leString.ToLower() .Where(x => char.IsLetter(x) && 0 == hits[(byte)x - (byte)'a']++)));
This assumes that the string is 7 bit ASCII. It goes horribly wrong if it’s unicode. This should be a pretty quick implementation though.
Here’s another 1 liner (if you ignore the declaration of a HashSet). This works in a similar way to the previous example but is unicode safe. A HashSet is a collection of unique values, duplicates are not allowed. The HashSet.Add() method returns false if the entity you’re trying to add is already in the collection. So we can use that in the same way we used the array and the ++ operator above…
HashSet<char> alreadyThere = new HashSet<char>(); Console.WriteLine(String.Concat(leString.ToLower() .Where(x => char.IsLetter(x) && alreadyThere.Add(x))));
If you’re a fully paid up, card carrying lunatic you can of course do it parallel…
ConcurrentDictionary<char, int> alreadyThereDict = new ConcurrentDictionary<char, int>(); Console.WriteLine(String.Concat(leString.AsParallel() .Select(c=>char.ToLower(c)) .Where(x => char.IsLetter(x) && alreadyThereDict.TryAdd(x,1))));
Rather irritatingly there’s no such thing as a ConcurrentHashSet
Another different approach is this.
Console.WriteLine(String.Concat(leString.ToLower() .Intersect("abcdefghijklmnopqrstuvwxyz")));
The result of intersecting a string with a set that contains all possible characters must be a set containing the characters that are in the string. Clearly this isn’t unicode safe either, although it’s better than the array indexing example because this won’t blow up, it just won’t work with unicode.
The advantage of this approach however is that it very clearly defines what the dictionary of permissible character is. So if you wanted to define a slightly different dictionary you could – say consonants for instance.
Finally we return to the original method, but we do it slightly differently. Most of the functionality we think of as LINQ is provided through extension methods defined in the “Enumerable” class.
You don’t need to call an extension method as an extension method at all, you can in fact call it as a static method of its defining class. Thus we can perform the entire operation in a functional manner.
Console.WriteLine( String.Concat( Enumerable.Distinct( Enumerable.Where( Enumerable.Select(leString, char.ToLower), char.IsLetter))));
If you’re not familiar with this style, basically you read it from the middle out. So we take “leString” and call Enumerable.Select on it, passing char.ToLower as the predicate for selection. The result of this operation is then used as the first parameter to the call to Enumerable.Where. The other parameter to the Where call is the predicate char.IsLetter. The result of this operation is then used as the first parameter to a call to Enumerable.Distinct and the result of that goes to String.Concat.
As with many things programming it’s horses for courses and this is not the right horse. It’s useful however to know that it can be done like this.
Over to you…
Naturally this isn’t an exhaustive list – it’s just the ones I happened to think of, if you can think of another way let me know!