Try-Catch-FAIL

Failure is inevitable.

Data Storage: Why is the answer always "relational"?

clock January 23, 2009 03:18 by author Matt

I've been thinking a lot about persistence and storage lately, and I think I've finally come to the conclusion that relational databases are almost universally being used incorrectly, and that many times an alternative persistence medium is actually a much better solution.  I think relational databases are good for holding data that needs to be aggregated across (for reporting or other types of heavy analysis), not for things that are inherently object-oriented tasks.  We spend so much time these days trying to work around the object-relational impedance mismatch when we could instead remove the mismatch altogether.

What's happened?  Has my brain been negatively impacted by my return to academia?  Am I missing some magic explanation for why we developers seem to immediately assume that there's going to be a SQL database when we create a new application?  Why aren't we using things like db4Objects and CouchDB instead?  I've used db4Objects and found the performance to be nothing short of spectacular when compared to ORM solutions, and I've heard good things about CouchDB. Are we defaulting to relational databases because that's just "the way things are done"?

I guess maybe it will help if you understand why I'm thinking on persistence.  At my day job, our primary product is basically a combination of data warehouse, search engine, deep-web crawler, and data mining toolset.  The primary underlying storage mechanism is a SQL Server database.  This works pretty well for the reporting types of functions, but not so well for the information retrieval types of functions, such as full-text querying, document viewing, etc; those operations tend to be comparatively slow.  And even the reporting performance is pretty slow because the schema is trying to support our object-oriented view of things instead of the data-oriented view that's really needed for reports.  So, I'm going back to the drawing board and trying to come up with a better persistence strategy.  I'm thinking Lucene for indexing.  I'm thinking db4Objects or something similar for the actual document object storage.  I'm thinking SQL Server for storing only the aggregated information collected by processing the documents, no more of this object-relational mismatch garbage. 

My question is, am I thinking clearly?

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


When good architectures go bad...

clock January 7, 2009 17:44 by author Matt

Today has not been a fun day.  I have spent most of today and a large part of yesterday trying to fix a problem in our system.  The problem seems very simple at first, and indeed we came up with a dozen or so ideas for solutions to the problem.  In the end though, none of the ideas could be implemented given our time constraints.  Why?  How could something that seems so simple be so difficult to fix?  The answer lies in a decision that we made four years ago, a decision that seemed like a great idea at the time: we chose the wrong architecture.

Four years ago, we were designing the "final" version of a massive text and data mining platform after going through three previous major prototype efforts (each consisting of about 4-8 months of development and culminating in a somewhat-working product).  Based on our successes and failures in the previous prototypes, we decided we wanted to try something new.  Instead of writing a bunch of architectural infrastructure, we would jump on the web services band wagon, make everything a service, and stick it all in IIS.  It seemed so simple at the time... we would go with the pipe & filter pattern, using asynchronous web services as the transportation mechanism.  The feeling was that we didn't really need anything overly reliable or performant.  Our system was expected to take weeks to run, processing hundreds of thousands of documents.  We did the math and thought "yeah, this architecture should handle that fine."  Plus, we really thought that using IIS and web services would reduce the amount of architectural and infrastructure plumbing and  management we would have to do.  Everything would be loosely coupled (asynchronous and all), and indeed, it was and is to this very day.  It would be robust in the sense that it could recover from errors, and to a degree, this is true.  If something goes wrong while processing a document, the system will eventually try again.  And since we using IIS as the host, we wouldn't have to write our own hosting services, and again, we sort-of hit the mark.  But there were problems.  Oh man, there are still problems.

Fast forward four years, and I am convinced that our architecture has been more of a hindrance than a help.  Everything is asynchronous and decoupled, but it was a lot of work to get it there.  Did you know you can't send soap messages that are 50 MB long by default (at least you couldn't with .NET 1.1).  We found that out the hard way.  Did you know that the XML serializer, which .NET uses for web services, fails to escape a whole slew of characters?  Again, we found that out the hard way after a lot of painful debugging.   Do you know what happens when you fire a document off to an asynchronous pipeline?  Neither does the process that sent it!  Is it in there?  Did it come out the other side?  Should I resend it?  The only way we could address that was by "guessing" how long it would take the document to make it through, then essentially looking for it on the other end.  Did it come out?  No?  Then resend it! 

And that, my friends, is what I have spent the last two days working on.  Let's think about that strategy for a second.  We send a document to a pipeline, wait for some amount of time, then look to see if it has come out there other end.  If not, surely that means something went wrong, and the document died somewhere in the pipeline in a burst of exceptiony goodness.  Right?  WRONG.  The document may very well still be in there.  Someone may have fed a patent document in to it that contains a massive DNA sequence.  One of the processes in the pipeline may be faithfully chugging away, trying to figure out what the various letters in the sequence mean.  But we don't know that.  All we know is that the document never made it out the other end, so we have to assume the worst and send another copy in.  Great.  Now we have a second thread faithfully chugging away on the same DNA sequence.  Again, we wait, then look to see if it has come out the other end.  No?  Send it again!    We now have three copies of the document eating up three threads on a four core machine.  One more pass like that, and we have effectively clogged the pipeline.  Throw in 8 more copies for good measure, and you can rest assured that the pipeline is now permanently blocked until IIS is reset.  This is the bug I've been trying like crazy to fix for two days: how does our controlling process (which we weren't even supposed to have to create according to our original architectural grand vision) know what's going on?  There's no good answer.  I thought of a few hacks, but most wouldn't work.  The hack I went with was basically to try to detect documents that *might* contain genetic sequences and ignore them.  In a system that will see hundreds of thousands of documents in a week, I'm pretty confident that things will be filtered that shouldn't have been.

Anyway, the moral of this rambling post is simply this: the importance of architecture, especially in an enterprise application, is critical.  You do not want to get this piece wrong, or whoever takes over for you when you finally go insane from all the hacks you've had to implement to work around the danged architecture will pay for it.  Think through everything: how it will work under normal conditions, how it will work under load, how it will work when under attack, how it will respond to every conceivable error, how flexible it needs to be, how difficult it will be to maintain... do not skimp on this step, or you will be sorry. 

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Unit Testing in .NET Part 3 - Asserting That Your Code Rocks

clock December 16, 2008 07:52 by author Matt

In the previous entries in this series, you've learned about the basics of unit testing, and you've seen how to create a very basic unit test.  In this post, you will learn how to fully use NUnit's Assert class to create a full suite of unit tests.  This post builds off the sample described in the previous post, so be sure to check it out if you want to follow along.

Asserting Equality

In the last post, you saw one example of how to use the Assert.AreEqual method to verify that two objects are equal.  As you will soon see, most methods on the Assert class have a ton of overloads.  You can think of the method as having two levels of overloads: the first for all the various types you could pass in (it has specific overloads for most primitive types as well as more generic versions that work with anything that derives from object), and the second level for controling the message that is shown when the Assert fails.  The type-based overloads are self-explanatory and are handled by the compiler for you automatically, so we'll ignore those and focus on the overloads that look like Assert.AreEqual(expected,actual,message) and Assert.AreEqual(expected,actual,message,params).  Let's incorporate one of these overloads into our unit test from the last post.  Here's the original, unmodified test:

   1: /// <summary>
   2: /// Verifies that the balance increases
   3: /// by the appropriate amount.
   4: /// </summary>
   5: [Test]
   6: public void Deposit_AddsValueToBalance()
   7: {
   8:     Account account = new Account();
   9:     //account.Balance is currently zero.
  10:     account.Deposit(100);
  11:   
  12:     Assert.AreEqual(100, account.Balance);
  13: }

Let's change our assumption about what the Deposit method should be doing.  Let's say that this is an awesome bank that automatically adds a 10% match to anything that you deposit.  We could just change the first parameter to Assert.AreEqual to 110, but what happens if we run that test?  All you will see is the 'expected (110), actual (100)' message.  It doesn't tell you much about why 110 was expected.  That's where the third parameter comes in handy:

   1: /// <summary>
   2: /// Verifies that the balance increases
   3: /// by the appropriate amount.
   4: /// </summary>
   5: [Test]
   6: public void Deposit_AddsValueToBalance()
   7: {
   8:     Account account = new Account();
   9:     //account.Balance is currently zero.
  10:     account.Deposit(100);
  11:  
  12:     Assert.AreEqual(110, account.Balance, "Deposit bonus was not applied!");
  13: }

Now when you run the test, you will see the helpful message that was supplied as the third parameter to AreEqual.  The fourth parameter, a params array of objects, behaves like the String.Format method: the third parameter becomes a format string, and the fourth parameter is the set of values to insert into the format string.  This could be useful for logging additional information about the unit test failure.

For both double and float types, the AreEqual method has an additional set of overloads that look like Assert.AreEqual(expected,actual,tolerance).  The tolerance parameter allows you to tell NUnit how close two floating-point values have to be in order to be considered equal.  Going back to our Deposit_AddsValueToBalance test, what if the bonus wasn't *quite* 10%, but was more like 9.75989%?  Well, we could calculate exactly what the bonus would be and pass that in as the expected value, then apply it, or we could specify a tolerance of 0.01 and leave the expected value as 110, like so:

   1: /// <summary>
   2: /// Verifies that the balance increases
   3: /// by the appropriate amount.
   4: /// </summary>
   5: [Test]
   6: public void Deposit_AddsValueToBalance()
   7: {
   8:     Account account = new Account();
   9:     //account.Balance is currently zero.
  10:     account.Deposit(100);
  11:  
  12:     Assert.AreEqual(110, account.Balance, 0.01, "Deposit bonus was not applied!");
  13: }

In addition to primitive types, NUnit has some "special" support for Arrays and Collections.  Typically in the .NET world, Equality is determined by an object's Equals method. Derived types are responsible for overriding that method if they wish to define equality as anything other than the default behavior inherited from the object class.  NUnit fudges this definition a bit for Collections and Arrays: two collections (or arrays) are considered equal if they have the same number of items and all their corresponding elements are equal. 

There is a corresponding inverse method to AreEqual called (not surprisingly) AreNotEqual.  The obvious difference is that AreNotEqual verifies that two objects are different from one another.  AreNotEqual has the same set of overloads as its complementary method.

Asserting Sameness

Next up is the AreSame method.  You may be wondering "what's the difference between 'same' and 'equal'?".  Objects A and B are equal if A.Equals(B) returns true.  Remember that by default all objects inherit an Equals method from the base object class, and that derived classes can implement custom equality checks as needed, so the exact definition of equal depends on what you are comparing.  'Same' is much simpler: objects  A and B are the same if they point to the exact same object in memory. 

The difference between 'equal' and 'same' may sound a bit confusing if you aren't comfortable with the concept of pointers and object references, so check the documentation here if you are still unclear.

To demonstrate this difference, let's create a new method that looks up account information and verify that repeated calls to the method return the same account information instance.  NOTE: I am intentionally not doing things in a test-driven manner right now.  I don't want to muddy the waters with trying to explain that concept at the same time I'm explaining the asserts.  A proper treatment of test-driven development is coming Real Soon(tm)!

First, let's add some new properties to our Account class along with an overloaded Equals method:

   1: /// <summary>
   2: /// A bank account.
   3: /// </summary>
   4: public class Account
   5: {
   6:     #region Public Properties
   7:  
   8:     /// <summary>
   9:     /// The ID of the account.
  10:     /// </summary>
  11:     public int AccountID { get; private set; }
  12:  
  13:     /// <summary>
  14:     /// The name of the account owner.
  15:     /// </summary>
  16:     public string Owner { get; private set; }
  17:  
  18:     /// <summary>
  19:     /// The current account balance.
  20:     /// </summary>
  21:     public float Balance { get; private set; }
  22:  
  23:     #endregion
  24:  
  25:     #region Public Methods
  26:     
  27: ----Snip----
  28:  
  29:     /// <summary>
  30:     /// Compares the current object to the specified object.
  31:     /// </summary>
  32:     /// <param name="obj"></param>
  33:     /// <returns>True if the accounts have the same AccountID,
  34:     /// false otherwise.</returns>
  35:     public override bool Equals(object obj)
  36:     {
  37:         Account account = obj as Account;
  38:  
  39:         if (account == null)
  40:         {
  41:             return false;
  42:         }
  43:         else
  44:         {
  45:             return AccountID == account.AccountID;
  46:         }
  47:     }
  48:  
  49:     /// <summary>
  50:     /// When you override Equals, you have to override
  51:     /// GetHashCode, too...
  52:     /// </summary>
  53:     /// <returns></returns>
  54:     public override int GetHashCode()
  55:     {
  56:         return AccountID.GetHashCode();
  57:     }
  58:  
  59:     #endregion
  60: }

Next, let's add a method to look up and return a 'dummy' Account object on demand:

   1: /// <summary>
   2: /// Gets the specified account.
   3: /// </summary>
   4: /// <param name="accountID"></param>
   5: /// <returns></returns>
   6: public static Account Lookup(int accountID)
   7: {
   8:     return new Account {AccountID = accountID, Owner = "John Doe"};
   9: }

Finally, let's create our test:

   1: /// <summary>
   2: /// The Lookup should return the exact same instance
   3: /// for all lookups on a specific ID.
   4: /// </summary>
   5: [Test]
   6: public void Lookup_ReturnsSameInstance()
   7: {
   8:     Account account = Account.Lookup(1);
   9:  
  10:     Assert.AreSame(account, Account.Lookup(1));
  11: }

Go ahead and build the project and run the test.  What happened?  The test failed because even though we are returning an identical object for both calls, we aren't actually returning the same object.  Let's correct that by storing some static dummy Account instances; we'll return one of these instead of creating a new instance from now on:

   1: /// <summary>
   2: /// These are our dummy accounts.
   3: /// </summary>
   4: private static Account[] mAccounts = new Account[]
   5: {
   6:     new Account{ AccountID = 1, Owner = "John Doe" },
   7:     new Account{ AccountID = 2, Owner = "Jane Doe" }
   8: };
   9:  
  10: /// <summary>
  11: /// Gets the specified account.
  12: /// </summary>
  13: /// <param name="accountID"></param>
  14: /// <returns></returns>
  15: public static Account Lookup(int accountID)
  16: {
  17:     //Arrays are 0-based, accountIDs are 1-based, so we shift them.
  18:     return mAccounts[accountID - 1];
  19: }

Build and re-run the test, and you should get a success message.

Similar to the AreEqual method, the AreSame method has a logical inverse: the AreNotSame method. 

Asserting Greatness

NUnit includes four methods (with overloads) for asserting various inequalities: Greater, GreaterOrEqual, Less, and LessOrEqual.  The intent of these methods should be obvious, but they differ in one major one from the other assertions we've seen so far.  Recall that the basic versions of both AreEqual and AreSame took an expected argument first and the actual value second.  Applying that same logic, you might expect that expressing the inequality x is greater than y would look like Assert.Greater(y,x), but it's actually the opposite.  I can't tell you how many times I've seen this inconsistency bite developers; it doesn't help that the parameters have less-than-helpful names, like arg1 and arg2.  I don't know why they couldn't have used something more obvious and intuitive, like maybe left and right...

Enough complaining, Let's write some code!  Let's add a new deposit to our account class called RandomDeposit.  This method is very different from our standard Deposit method.  Instead of depositing the specified amount, RandomDeposit will deposit a random amount that is anywhere from 0.5 to 1.5 times the specified amount.  The method looks like so:

   1: /// <summary>
   2: /// Deposits a random amount that is between 0.5 
   3: /// and 1.5 times the specified amount.
   4: /// </summary>
   5: /// <param name="amount"></param>
   6: public void RandomDeposit(float amount)
   7: {
   8:     Random rand = new Random();
   9:     double multiplier = rand.NextDouble();
  10:  
  11:     Balance += (float)(amount*(0.5 + multiplier));
  12: }

Because of the randomness in the method, it's going to be very hard to write a unit test using the Assert.AreEqual method.  Instead, we'll use the GreaterOrEqual method to assert that the deposited amount is at least 0.5 times the amount we deposited.  Here's the unit test:

   1: /// <summary>
   2: /// The method should deposit between 0.5 and 1.5
   3: /// times the specified amount.
   4: /// </summary>
   5: [Test]
   6: public void RandomDeposit_DepositsExpectedAmount()
   7: {
   8:     Account account = new Account();
   9:  
  10:     account.RandomDeposit(100);
  11:     Assert.GreaterOrEqual(account.Balance, 50);
  12:     Assert.LessOrEqual(account.Balance, 150);
  13: }

If you aren't already building and testing by habit, go ahead and build the project and run the new test. 

Asserting Typeiness (If Steven Colbert can do it, so can I!)

The Assert class includes methods for asserting things about the type of an instance.  You can check whether or not an object is of a given type using the IsInstanceOfType method.  The first parameter is the expected type of the object, the second parameter is the actual object.  Let's make our Account class implement the ICloneable interface, then write a test to verify that our clone actually is of type Account:

   1: /// <summary>
   2: /// A bank account.
   3: /// </summary>
   4: public class Account : ICloneable
   5: {
   6: ----SNIP----
   7: /// <summary>
   8: /// Clones the current account.
   9: /// </summary>
  10: /// <returns></returns>
  11: public object Clone()
  12: {
  13:     return new Account {AccountID = AccountID, Balance = Balance, Owner = Owner};
  14: }
  15: ----SNIP----
  16: }

Here's the corresponding test case:

   1: /// <summary>
   2: /// Verifies that a complete clone of the account
   3: /// is returned.
   4: /// </summary>
   5: [Test]
   6: public void Clone_ReturnsAccountClone()
   7: {
   8:     Account account = Account.Lookup(1);
   9:  
  10:     object clone = account.Clone();
  11:  
  12:     Assert.IsInstanceOfType(typeof (Account), clone);
  13: }

The IsInstanceOfType and IsAssignableFrom methods are very similar.  Under the covers, they're just calling the corresponding members of the System.Type class.  Check the documentation on MSDN if you are curious about the subtle differences between the two methods, but for the most part, you can use them interchangeably. 

As with most assert methods, there are various overloads of both IsAssignableFrom and IsInstanceOfType.  Each also has a set of complementary Not methods: IsNotAssignableFrom and IsNotInstanceOfType.

Asserting Nothingness

Sometimes the right result is a null result.  Let's look again at our Account.Lookup method.  Right now, we're not really handling the case of an account ID that doesn't exist.  Let's modify the code so that it returns null when given an invalid account ID:

   1: /// <summary>
   2: /// Gets the specified account.
   3: /// </summary>
   4: /// <param name="accountID"></param>
   5: /// <returns></returns>
   6: public static Account Lookup(int accountID)
   7: {
   8:     if (accountID < 1 || accountID > mAccounts.Length)
   9:     {
  10:         return null;
  11:     }
  12:  
  13:     //Arrays are 0-based, accountIDs are 1-based, so we shift them.
  14:     return mAccounts[accountID - 1];
  15: }

And let's write a new test case to verify this:

   1: /// <summary>
   2: /// The lookup should return null when given an ID
   3: /// that doesn't correspond to an account.
   4: /// </summary>
   5: [Test]
   6: public void Lookup_ReturnsNullForInvalidId()
   7: {
   8:     Assert.IsNull(Account.Lookup(0));
   9:  
  10:     Assert.IsNotNull(Account.Lookup(2));
  11: }

Here we've used the Assert.IsNull method.  This is a very simple assert: it simply checks that the parameter is null.  Like everything else, it has a complementary method that will test that something is not null. 

Asserting Truthiness (or Falsiness)

Everything we've asserted so far could actually be expressed using one of the most basic assertions: IsTrue.  This method asserts that a boolean input is true.  It has a complementary IsFalse method that can be used to assert that an input is false.  These methods can be used to test anything that you can express as a boolean condition.  Let's rewrite our previous Lookup test using only IsTrue instead of IsNull to see this:

   1: /// <summary>
   2: /// The lookup should return null when given an ID
   3: /// that doesn't correspond to an account.
   4: /// </summary>
   5: [Test]
   6: public void Lookup_ReturnsNullForInvalidId()
   7: {
   8:     Assert.IsTrue(Account.Lookup(0) == null);
   9:     //Assert.IsNull(Account.Lookup(0));
  10:  
  11:     Assert.IsTrue(Account.Lookup(2) != null);
  12:     //Assert.IsNotNull(Account.Lookup(2));
  13: }

The test should produce identical output because it is logically equivalent to the original.  You might be tempted to just say "forget about all these other asserts, I'll just use IsTrue for everything!", but that's a terrible idea.  The various other assertions give you a lot more information when something goes wrong than IsTrue will.  For example, if GreaterThan fails, it will tell you the values of both parameters.  If you expressed the test using only IsTrue, you would get a very unhelpful message that says "Expected: True, Actual: False".  Sure, you can probably work backwards, add some logging, etc, to figure out what's going on, but why not use the more powerful GreaterThan method to begin with?

Asserting Failuriness

Sometimes you just want a test to fail. Maybe the test isn't finished, or the test couldn't perform some setup correctly, or maybe you need to test for something that is beyond what the built-in assertion methods can handle.  Assert.Fail to the rescue!  Calling this method will instantly fail a test (assuming you haven't done anything silly like wrapped the call with a try-catch block, which we will look at in a future post). 

Asserting Exceptioniness

We've tested that things work so far, but how do we test that things explode?  Right now, there's nothing in our Deposit method that prevents us from depositing negative amounts.  Let's add some logic to throw an exception:

   1: /// <summary>
   2: /// Deposits the specified amount.
   3: /// </summary>
   4: /// <param name="amount"></param>
   5: public void Deposit(float amount)
   6: {
   7:     if (amount <= 0)
   8:     {
   9:         throw new ArgumentOutOfRangeException("amount", amount, "Must be greater than zero.");
  10:     }
  11:  
  12:     Balance += amount;
  13: }

Error-handling code is good, but it still needs to be tested.  NUnit has the ExpectedException attribute that you can use to verify that an exception is thrown, but I hate this attribute.  What the attribute is really doing is verifying that something somewhere in your test case is throwing an exception, not that the exception is actually coming from where you want it to come from.  Instead, I prefer to go with this model:

   1: /// <summary>
   2: /// The method should throw an ArgumentOutOfRangeException
   3: /// if you pass in a negative value.
   4: /// </summary>
   5: [Test]
   6: public void Deposit_ThrowsExceptionOnNegativeAmount()
   7: {
   8:     Account account = new Account();
   9:  
  10:     try
  11:     {
  12:         account.Deposit(-100);
  13:         //The following line will only be executed if the Deposit method
  14:         //failed to throw an exception.
  15:         Assert.Fail("Expected ArgumentOutOfRangeException was not thrown!");
  16:     }
  17:     catch (ArgumentOutOfRangeException)
  18:     {
  19:         //Ok, this is an expected exception.
  20:     }
  21: }

It requires a bit more code, but this version verifies that the correct type of exception is thrown in exactly the right spot.

Other Ways to Assert RoXXorness

The methods we've looked at so far are just the ones that I have found myself using often over the last several years.  NUnit includes other methods that you can use to assert various things about your objects, including:

  • Assert.Contains - Given an object and a list, this method asserts that the collection contains the specified object.
  • Assert.IsEmpty - Given a collection (or a string), asserts that the object contains no items.
  • Assert.IsNaN - Both double.NaN represent the 'not-a-number' condition (often caused by division by zero).  You can test for this condition using the IsNaN assert.

There is even more...

Recent versions of NUnit have added additional utilities, asserts, etc. to simplify your testing.  You can find out more about them here.  We might look at those in a future post, but I really don't find myself using most of them in my day-to-day testing, and I think most developers can get by just fine without them.

In the next post in this series, we'll look at some more complicated testing scenarios as well as common testing problems and strategies for overcoming them.

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Unit Testing in .NET Part 2 - Your First Unit Tests

clock December 10, 2008 04:36 by author Matt

In the last post, you learned the basics of unit testing in .NET: you should understand the intent behind unit testing, you should have a vague idea of what unit tests should look like, and you should have looked at some of the unit testing API options that are available to you.  For the remainder of this series, I'll be focusing specifically on unit testing with NUnit and TestDriven.NET.  If you have chosen to use a different API or tool, you may need to translate the code samples and steps below to match.  If you don't already have NUnit downloaded and installed, do so now.

Adding NUnit To An Existing Project

For now, we'll assume that we are adding unit tests for a project that has already been created.  We'll deal with test-driven development (not to be confused with the test-running tool by the same name) in a future post.  Our tests will be built around an example taken from the NUnit Quick Start, where we have a simple bank API that includes a single Account class.  Here's the code for the class:

   1: using System;
   2:  
   3: namespace NunitDemo
   4: {
   5:     /// <summary>
   6:     /// A bank account.
   7:     /// </summary>
   8:     public class Account
   9:     {
  10:         /// <summary>
  11:         /// The current account balance.
  12:         /// </summary>
  13:         public float Balance { get; private set; }
  14:  
  15:         /// <summary>
  16:         /// Withdraws the specified amount from the account.
  17:         /// </summary>
  18:         /// <param name="amount"></param>
  19:         public void Withdraw(float amount)
  20:         {
  21:             Balance -= amount;
  22:         }
  23:  
  24:         /// <summary>
  25:         /// Deposits the specified amount.
  26:         /// </summary>
  27:         /// <param name="amount"></param>
  28:         public void Deposit(float amount)
  29:         {
  30:             Balance += amount;
  31:         }
  32:  
  33:         /// <summary>
  34:         /// Transfers money to the specified account.
  35:         /// </summary>
  36:         /// <param name="destination"></param>
  37:         /// <param name="amount"></param>
  38:         public void TransferFunds(Account destination, float amount)
  39:         {
  40:             throw new NotImplementedException();
  41:         }
  42:  
  43:     }
  44: }

Create a new project in Visual Studio named 'NunitDemo', add this class to it, and verify that your project builds correctly.  If all goes well, you are now ready to add NUnit to the project.  Assuming you are using Visual Studio, find the "References" folder under the project in the "Solution Explorer" pane.  Right-click on "References" and choose "Add Reference".  From the .NET tab of the "Add Reference" dialog, choose "nunit.framework", then click "Ok".  This DLL contains the attributes and assertion functionality that you need to create your actual unit tests.  At this point, you should see "nunit.framework" listed under the "References" folder.  If so, you are now ready to begin writing test cases!

Sidetrack - Where Do Unit Tests Go?

Before we actually start writing test cases, let's talk about where we're going to put them.  The conventional approach is to put test cases in a completely separate project from the code being tested.  In our example, if our Account class is in a project named "BankSystem", our unit tests would go in a completely separate project called "BankSystem.Tests".  I have two big problems with this approach.  First, it bloats the size of your Visual Studio solution, which means it will take longer to build the solution, and you have twice the number of overhead files (the project file, the documentation file, all the little temp files and folders that Visual Studio and Resharper create, etc).  Second, what if you need to test a class or method that's marked as internal?  You can't do that if your tests live in a completely separate project.

My recommendation is to put the tests in the same project as the code being tested, but do separate the tests into their own namespaces.  For example, our "Account" class is in the "NunitDemo" namespace, so its tests will go in the "NunitDemo.Tests" namesapce.  The downside to this approach is that your test cases will now be distributed along with your actual API, but there's a very easy solution to this: wrap the tests with preprocessor directives.  Depending on your environment and how you deploy your production assemblies, it may be sufficient to simply test whether the DEBUG symbol is defined.  This symbol is defined by default when you build a project in debug mode in Visual Studio.  If you sometimes deploy debug versions of code, then you may want to define a custom symbol to use instead.  For now, we'll assume that we want our unit tests compiled if the DEBUG symbol is defined. 

Writing Your First NUnit Test

At this point, your project should have a reference to nunit.framework, and you should know where you are going to put the test cases.  Let's go ahead and create a test fixture for the Account class.  Within Visual Studio solution explorer, create a new folder called "Tests".  Folders in Visual Studio are typically used to denote namespaces, so any classes you add to this folder will be created in the "NunitDemo.Tests" namespace by default (you can change the namespace after the fact, but that would be WRONG).  Right-click on the new folder, click "Add", then select "Class".  Name the class "AccountTests".  Visual Studio should create the class and open the .cs file for you.  This is just a standard class right now.  We have to make a few changes in order for it to become a test fixture.

First, let's wrap everything in a '#if DEBUG' directive (see the code below if you aren't sure how to do that).  This insures that our new class will only be compiled if the DEBUG directive is defined.  Next, add a using statement for the NUnit.Framework namespace.  NUnit requires that test fixtures be public classes, so go ahead and add the 'public' keyword to the class definition.  Finally, apply the "TestFixture" attribute to the class.  You now have an empty test fixture like so:

   1: #if DEBUG
   2:  
   3: using System;
   4: using System.Collections.Generic;
   5: using System.Linq;
   6: using System.Text;
   7: using NUnit.Framework;
   8:  
   9: namespace NunitDemo.Tests
  10: {
  11:     /// <summary>
  12:     /// Test fixture for <see cref="Account"/>.
  13:     /// </summary>
  14:     [TestFixture]
  15:     public class AccountTests
  16:     {
  17:     }
  18: }
  19:  
  20: #endif

Let's go ahead and add our first test case.  We're going to create a test to verify that the deposit method correctly adds money to the account balance.  Create a method named "Deposit_AddsValueToBalance".  Yes, typically underscores in method names are BAAAD, but this isn't a normal method, it's a test case.  When a test case fails, you want to know immediately what method failed and what the test was supposed to verify, so you need to use descriptive names.  Using underscores allows you to clearly indicate the name of the method being tested and the behavior being tested.  Anyway, you've created your method, now mark it with the "Test" attribute.  Congratulations, you now have an empty test case!  You could run the test now, and it would pass (because it doesn't do anything), but let's add some meat to it first.

Before we write the logic for the test, let's think about exactly what we want to test: we should be able to call the Deposit method, passing in some value X, and the balance should be incremented by that same value X.  How do we test that?  First, we call the method, passing in some constant value.  Next, we use the Assert class from NUnit to verify that the balance increased by that amount.  There are two ways we could do that in NUnit: we can use the so-called "classic model", or we could use the newer "constraint-based" model.  I'm still on the fence about which one I prefer, but for now, we'll stick with the classic model, which means we're going to use the Assert.AreEqual method to check the value of the Balance property after we have invoked the Deposit method.

That's a long paragraph.  If you're confused, maybe this will help:

   1: /// <summary>
   2: /// Verifies that the balance increases
   3: /// by the appropriate amount.
   4: /// </summary>
   5: [Test]
   6: public void Deposit_AddsValueToBalance()
   7: {
   8:     Account account = new Account();
   9:     //account.Balance is currently zero.
  10:     account.Deposit(100);
  11:  
  12:     Assert.AreEqual(100, account.Balance);
  13: }

And that's our test case.  Simple, isn't it?  We're writing code to test code, utilizing the Assert class to verify that expected things are actually happening.  The Assert.AreEqual call will throw an exception if for some reason the Balance property isn't equal to 100.  Assuming you are using TestDriven.NET, you can run your test by right-clicking on either the name of the test method or the name of the fixture class in the code window, then selecting "Run Tests" from the context menu.  TestDriven.NET will launch in the background, run your tests, then spit out the results in the Output pane within Visual Studio.

EOF

So far, we've tested a single, simple method using a single method from the Assert class.  Unfortunately, testing won't always be this simple and straightforward.  In the next post, we'll begin creating more complex test cases as we try out different Assert methods.  We'll also look at some of the other attributes that NUnit supports and how they can help you create tests more efficiently. 

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Unit Testing in .NET Part 1 - Introduction to Testing

clock December 9, 2008 04:43 by author Matt

Today is the first 'requested topic' post.  If you have a topic you would like to hear more about, drop me a line, and if it's something that I'm either interested in or working with anyway, I'll try to give the topic a proper post.  Serious suggestions only (yes, I'm talking to you, Evil Rob). 

Introduction

Today's post kicks off a series that's all about unit testing.  This series assumes no prior knowledge of unit testing, so it may start out a tad too basic for some, but it will carry all the way in to Test Driven Development. 

Let's start out with some background.  Software testing is simply the act of verifying the behavior of a piece of software.  The goal of testing is to uncover bugs (aka defects) so that they can be evaluated and possibly corrected.  Software testing is the bane of many coders' existence; that's one of the things that makes them coders instead of developers.  Testing is not like 'coding'.  You have to approach it with a different mindset.  Instead of trying to implement a feature, you are trying to break something.  Testing can be depressing.  No matter how much you test, you will probably never uncover all the bugs in anything more complicated than a 'Hello World' application.  Still, studies have consistently shown that testing improves the quality of software systems, reduces maintenance costs, and generally makes life better and more fun once you know how to do it correctly

There are numerous types of testing (and various ways to categorize all the different types).  An exhaustive list is beyond the scope of this post, but here are some of the more common types you may encounter:

  • Unit Testing - Tests that target a specific method or class, usually created by a single person or a small team.
  • Component Testing - Like unit testing, but taken up to the API or component level.
  • Integration Testing - Testing the component pieces together as an integrated system.  This type of testing can be difficult due to external dependencies, such as databases, web services, other network resources, etc.
  • Regression Testing - Re-running previous test cases to verify that no new defects have been introduced that break previously-working functionality.
  • System Testing - Testing the software system in its near-final configuration (beta testing could be considered a type of System Testing).
  • Load Testing - 'Loading' a system with work (such as traffic, method calls, etc.) to evaluate how it behaves under normal workloads. 
  • Stress Testing - Like load testing, except with workloads elevated beyond what are considered normal or expected levels.
  • Security Testing - Testing specifically for security exploits and vulnerabilities.

That's a long (but not exhaustive) list, and while each is probably worthy of a series of posts, I think the lowest-hanging fruit is unit testing.  It's easy to get started with (as we'll see today), enables regression testing, and has great tool support in the .NET world.  So, on to unit testing!

Unit Testing

The purpose of unit testing is to verify that a specific unit of code works as intended.  By 'unit of code', I mean a method or perhaps a class, but certainly nothing larger than that.  Typically, you will create a suite of tests for a library.  A suite is composed of multiple fixtures, which themselves are composed of multiple test cases.  Typically, a suite contains fixtures for an individual library, a fixture contains tests for an individual class, and a test case verifies one specific bit of functionality from a method.  A method should usually have multiple test cases.  You want each test case to verify a specific behavior of the method.  Take the following method as an example:

   1: public void DividePositiveNumberByTen(int number)
   2: {
   3:     if (number < 1)
   4:         throw new ArgumentOutOfRangeException();
   5:     else
   6:         return number / 10;
   7: }

Notice that the method actually does two things: if the parameter is less than 1, it throws an exception, otherwise it returns the result of dividing the number by 10.  You should create two test cases for this method, one that verifies that the method throws an exception when you pass 0 as an input, and the other that verifies that it returns the correct value for a valid input.  (We'll get in to the specifics of *how* you create those tests in the next post.)  In general, you want to test each distinct path of execution through your code.  This keeps your tests simple and clean, plus if a test fails, you will know exactly what is broken.

Unit Testing in .NET

The .NET world has almost a one-to-one ratio between developers and unit testing frameworks.  Some of the more popular ones are MbUnit, MS Test, and NUnit.  I started unit testing with NUnit many moons ago, and I've never found myself motivated enough to switch, so, I'm still using NUnit.  When coupled with TestDriven.NET, you can run your tests directly from within Visual Studio, which makes you more likely to actually run your tests than if you had to build your code, launch an external tool, load your compiled module, then find and execute the test.  Less resistance to testing is a great thing. 

Regardless of which framework you choose, unit testing in the .NET world couldn't be easier.  Most frameworks make defining a test fixture and test cases very easy: simply create a class marked with a special test fixture attribute, then add methods marked with a special test attribute to create test cases.  In addition to a way to mark test cases, every unit testing framework I've ever seen includes an API for asserting various things: that two objects are equal/not-equal, that an exception was thrown, that one value is greater-than/less-than another, and lots more.  Using these assert methods, you can create tests that verify the behavior of classes, method, etc.

That's it for the introduction.  Hopefully you have a basic idea of what unit testing is and how you might use it.  In the next post, we'll delve in to NUnit and its API as we begin writing unit tests!

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Currently rated 5.0 by 2 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Essential Development Team Tools

clock December 2, 2008 09:54 by author Matt

Unless you are a micro-ISV or are developing on your own for fun, I doubt you are working in isolation.  You are probably working as part of a team of developers (and possibly testers, designers, etc) who are all working in parallel on the project.  I've been on the same team since my employer first started developing software, and I've been fortunate enough to watch our development process mature from a chaotic mess of E-mailed files and constant meetings to a saner, more automated process.  Here are the key tools that I think every development team needs to have (note that I have absolutely no ties what-so-ever to any of the products or projects listed below, I just happen to use and like them):

A Version Control System

Any development project needs a version control system (I even keep my hobby projects in a revision control system), but once you have more than one person working on a code base, version control is required.  You cannot get by without it.  E-mailing files around or hosting them on FTP does not work; I know because I've tried.  No, you need a version control system that will track who-did-what-and-when and allow you to quickly go back to any version of any file at any time. 

There are too many version control systems to list, some commercial, some free, some open-source, some not.  I've used Visual SourceSafe, Team Foundation Server, CVS, and Subversion, and my favorite is definitely the free and open-source Subversion project.  You can get a repository up and running in no time flat with the VisualSVN Server package.  From there, I recommend using the (also free and open-source) TortiseSVN client for interacting with the repository.

A Defect and Issue Tracking System

You need a way to track bugs that have been found, features that have been requested, and any other work that needs to be performed related to your software project.  It should be web-based so that everyone on the team can access it.  It should be lightweight enough to not interfere with people trying to do work, but it should be powerful enough to track what people are actually doing. 

For me, FogBUGZ is the way to go.  Joel pioneered the idea of simple defect tracking, and though FogBUGZ has grown more and more powerful with each release, it remains surprisingly simple to use.  The latest version includes time tracking, "evidence-based scheduling" to show you when you are really going to be ready to release that new version, a wiki, and of course, bug and feature tracking.

A Continuous Integration System

Do you have unit tests for your project?  Of course you do (if you don't, you need to correct that ASAP).  But how often are you running them?  If you're only running them when you think about it, or when you think you changed something that might impact the test, you are not getting the full benefit of unit testing.  In a complicated project, you don't know how far down a code change is going to trickle.  Best-case, a code change is going to break something downstream so badly that it won't even compile.  That kind of thing you will probably find when you build your project before releasing it.  Worst-case, a change will quietly break something downstream.  The code may still compile, but now the downstream code is no longer doing what it was intended to do.  Unless you run the unit tests for the downstream code, how will you know?  You need someone who is continuously checking the changes submitted to your version control system, verifying that they don't break anything that's already there.  Someone needs to verify that everything still compiles, and that all test cases still pass.  That can be a full-time job if you have a lot of code-churn.  But who wants that job?  Man, that would be terrible... constantly checking the version control system for new updates, downloading them, building everything, running all the unit tests, typing up reports about what's broken, mailing them to the appropriate people... I don't think many people would want that job.

That's where continuous integration systems come in.  A continuous integration system takes on the role of monitoring your project, insuring that everything builds, and that all your tests pass.  When something goes wrong, it automatically notifies people so that corrective steps can be taken.  There aren't many good free options here (there are a lot of commercial solutions, but I haven't ever used one).  The best I've found is CruiseControl.NET.  It's easy to set up, very configurable, and very powerful.  At my present employer, we have an instance of CruiseControl.NET monitoring eight inter-related projects ranging from web applications to GUIs to backend Windows services.  It's not perfect, but if you are willing to invest a little bit of time in getting it up and running, it can save you a lot of time and pain.

 

That's really not a complete list.  Aside from those things, you need a reliable communication tool (GTalk and Gmail ftw), good developers (remember that writing code does not make you a developer), and a good development process.  I might try to blog about those last two things on Friday if I can't think of anything better to talk about.  Seriously, the stuff I'm working on now is 95% code reviews (which are interesting, but for the wrong reasons) and 5% mundane coding.  Anyway, are there any other critical components that you think are must-haves for a software development team?  If so, post 'em in the comments.

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Peer Reviews

clock September 10, 2008 09:35 by author Matt

Peer reviews seem to be a source of seething hatred for many developers.  I can sort of understand why (no one likes rocks being pitched at something they created), but they can actually be very beneficial if done correctly.  This post will lay out the case for doing peer reviews as well as an approach to performing reviews that I happen to like (stolen from my friend, 'Poprythm').  If you have any suggestions for ways to improve this approach, please share in the comments!  Also, if you're not doing peer reviews, I'd like to hear why.

Purpose

Peer reviews may seem like a hassle, but they actually serve several purposes. First, they help identify and address deficiencies in the code that will become maintenance issues later. By finding them early (or at least earlier), the cost of fixing them is somewhat reduced. Identifying some deficiencies, such as code that is completely unreadable or that is unnecessarily complex, will help save time and money (and also sanity) down the line.

Another benefit of code reviews is that they can help spread knowledge. Both the person doing the review and the original author can learn new things via the process. The original author gets feedback on ways to improve their code, such as alternative algorithms, syntax shortcuts, and design improvements. The person doing the review also gets several of those same benefits (the author may have solved a problem elegantly in a way that the reviewer had never thought of), but they also gain experience in reading code. The ability to read and understand code, especially someone elses, is an underrated skill. Becoming proficient at it may help you to analyze and troubleshoot problems more easily in the future.

Performing the review

When performing the review, follow a few simple rules. First, do not make any code changes at all. When you spot something that needs to be addressed, place a comment in the code prefixed with "PEER:", and explain the issue with the following code block. For example:

   1: //PEER: This will cause a run-time exception!
   2: if (value == null)
   3: {
   4:    logger.Log("Found a null value: " + value.Name);
   5: }

(Sadly yes, that is a real example I ran into from many years ago...)

Second, while no one likes a critic, it is important to identify any and all issues. If the author used a variable name that doesn't make since (such as using 'l' for a logger instead of a more descriptive name), go ahead and note it. It may seem trivial on its own, but trivial issues add up.

Third, (obviously) try to be constructive with any criticism. If you identify a problem, try to explain why it is a problem and offer a couple of ways to address it.

Peer reviews are somewhat subjective, and everyone tends to look for slightly different things. However, here are the things you should absolutely be looking for:

  1. Failure to adhere to coding guidelines or best practices.
  2. Untested code. TestDriven.NET includes a code coverage analysis tool, so use it to examine the code in question. Note any areas that are not tested. While high test coverage doesn't necessarily equate to high-quality code, lack of coverage does indicate low-quality code.
  3. Tests that don't actually test anything (better known as smoke tests). Tests should not just make sure that something runs without throwing an exception, they should also validate the state of things after the method completes. This means they should be testing the return type (and its properties!) and verifying any other state changes the method made.
  4. Poorly designed code. This is somewhat subjective, but there are a few easy things to watch for, such as duplicated code (copy-and-paste is BAD), things that are difficult to test, and things that are coupled to a large number of other classes.
  5. Obvious inefficiencies. Are they using a List where a Dictionary or HashSet would be better?

Aside from that, suggest changes anywhere that you see improvements. Doing so will improve the code base (making life for everyone easier) and will help the original author to become a better developer.

Once you are finished with the review, go ahead and commit the changes to source control. The author can then review your comments by searching for "PEER:" in their code. As each issue is addressed, the author should remove the tag from the code. If there is a question about a comment, the author should ask the reviewer for clarification. If there is a disagreement (which will happen from time to time), a senior developer should be consulted.

If applied correctly, Peer Reviews can help you become a better developer, help save your company money, and make life easier for everyone in the future.  Don't forget, if you're not doing peer reviews, let me know why not in the comments.

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


The importance of testing

clock August 25, 2008 09:41 by author Matt

Grad school just started back for me today, so this is going to be a short post.  Hopefully I'll still be able to do three (hopefully high-quality) posts a week once things stabilize.

I ran into an issue today where one of the systems that I maintain at my day job had been rendered completely unusable by a series of code changes that had been committed over the last month.  The particular system in question is mostly a web forms app, so it's a bit difficult to unit test and therefore has very low test coverage.  Because of that, it falls to the developer making changes to thoroughly test things manually in most cases.  Today, it became evident that this unwritten rule of testing was no longer being followed AT ALL.  In one case, the code modifications sort-of worked.  It's conceivable that the developer making the changes tested them, but failed to test thoroughly enough to notice that he completely broke the vast majority of the existing functionality for a particular operation.  In another case, the code modifications didn't work.  Period.  If you tried to view the page, you were greeted with a NullReferenceException.  It didn't matter how you navigated to the page or what the environment was, there was no possible way the code was ever going to work.  That means the developer didn't even bother to load the modified page up in his browser.  

The result of these gaps in testing is a lot of pain and frustration for me personally, and wasted money for the company.  I had to roll back an entire month of new features and changes to get to a stable build.  That means our testers won't have as much time to test new features, and it means that someone now has to go back and repair all the things that were broken.  And it means that a 15 minute deployment took me three infuriating hours instead.

The morale of this story is: if you make a code change, no matter how seemingly minor, be DAMN sure you test it.  If you absolutely positively can't write a unit test for it, be sure you do very, very thorough manual testing.  Don't just fire-and-commit and assume it all works, because it probably doesn't.  As far as I'm concerned, if you can't prove that it works, it doesn't work, end of story.  If you can't test it, don't commit it.  Ask a more senior developer for advice instead.

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


How to run a software development company (INTO THE GROUND) - Part 3

clock August 22, 2008 07:32 by author Matt

I am really, really glad it is Friday.  It's been one of those really, really (not) awesome weeks.  Anyway, here's how you, too, can successfully take your software development ship and crash it into an iceberg, killing everyone on board!  In a change of pace, I'm going to start putting a "why you shouldn't do this" section at the end that explains things with a bit less sarcasm.

Never. Fire. Anyone.

Firing.  Even the word sounds bad.  As you pilot your software development company to guaranteed fortunes the likes of which haven't been seen since the dotcom bubble burst, you may run into a situation where someone doesn't seem to be a good fit.  Maybe they just don't jive with the team.  Maybe they are clearly "incompetent".  Maybe they're sleeping at their desk for long stretches every single day.  Maybe it's someone that your developers warned you not to hire because they knew this person was toxic to a team, but in your brilliance, you hired them anyway because you saw right through that BS they were feeding you, and you knew that guy was gold.  Even after the guy talked smack about you behind your back to one of your friends less than 24 hours after you hired him, you still knew this person was going to help get you to the pot of gold at the end of the software development rainbow.

Anyway, so you have someone that is a problem, and they're apparently dragging the team down (or so everyone says).  Whatever you do, DO NOT LET THIS PERSON GO.  It isn't their fault that they can't stay awake in meetings, or that their code never compiles.  It's because the rest of the developers are not competent enough to keep up with this person.  You should promote this person instead!  That will show those pesky developers!

The morale of this story...

Not everyone is a good developer.  Not everyone is a good worker.  Hiring good developers is a very tricky process.  I've seen good developers interview terribly, and I've seen terrible developers that interviewed great.  That means that not everyone you hire is going to work out.  While it is important to give these people feedback and to try to help them adjust, there are going to be situations where the best thing for everyone (you, your team, and actually even the problematic individual) is to part ways.  You shouldn't be terrified of taking this step.  You shouldn't be terrified of confronting them and letting them know that what they're doing that isn't up to par.  If you don't, they can't improve, and the environment suffers, and that makes things hard for everyone.

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Using LINQ to elegantly initialize arrays

clock August 20, 2008 09:46 by author Matt

**CORRECTED 8/26/08: Apparently my initial code did not work correctly.  This appears to be a widespread mistake, as I found about a dozen other people doing the exact thing I was doing with reference types.  Corrected code and the non-working example are below.**

I am tired of writing array initialization code that looks like this:

   1: TermVector[] vectors = new TermVector[6];
   2: for (int i = 0; i < vectors.Length; i++)
   3: {
   4:     vectors[i] = new TermVector();
   5: }

I couldn't believe that there wasn't a better way to handle this.  It turns out that there is: just use LINQ! Here's my first try (and the way most forum and blog posts recommend to do it):

   1: TermVector[] vectors = Enumerable.Repeat(new TermVector(), 6).ToArray();

Whoa, that's easy.  But there's a problem!  If you check the contents of the array, you'll notice that it is populated with 6 references to the exact same instance of TermVector!  That's not what I wanted.  That's not what I wanted at all!  Let's try again:

   1: TermVector[] vectors = Enumerable.Repeat(0, 6).Select(i => new TermVector()).ToArray();

If you inspect the array, you'll see that you now have references to 6 different instances instead of 6 references to the same instance.  Sweet!

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


About Matt

I am an overworked (and apparently overpaid) software developer with aspirations of acquiring a PhD in Computer Science. I started off coding in C over a decade ago.  Since then, I've migrated from C to C++ and branched out to C#, PHP, VB.NET, JavaScript, and worked with a wide assortment of other languages that I hope to never deal with again (I'm looking at you, COBOL). Oh, and yes, I've written some Java.  Does that make me a bad person?

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in  anyway.

© Copyright 2009

Sign in