Apr 22 2011

Fluent XML Serialization–Part 3: How Serialization Is Performed

Category: FluentlyXMLMatt @ 00:57

Reflection is a powerful tool when used correctly.  In this penultimate chapter of Fluent XML Serialization, I will show you how Fluently-XML uses configuration data built up from a domain-specific language to perform serialization at runtime.  Keep in mind that the code is still a work-in-progress, but it does indeed meet (most) of my original requirements.  At the very least, it meets the business requirements it was built for. :)

Previous Posts

Where we left off…

When we last visited Fluently-XML, we had taken configuration data from the domain-specific language (DSL) and stored it in configuration objects.  These objects were ‘compiled’ at runtime into type serializers and deserializers that know how to serialize and deserialize specific types.  Additional type serializers and deserializers are created on demand whenever Fluently-XML encounters a new type that wasn’t explicitly configured through the DSL.

So, we have configured Fluently-XML, and today we’ll look at an end-to-end example of actually using Fluently-XML.  We’ll start with configuring behaviors, look at how to obtain a serializer and convert an instance of a type to XML, and how to convert back from XML to an instance of the original type.

Configuring a Factory

Serialization (and deserialization) in Fluently-XML begins with a FluentSerializerFactory.  Through the factory’s constructor, you can directly specify how types should be serialized and deserialized.  However, the preferred approach is to put such specifications in their own serialization specifications, and to load any specifications from the factory, like so:

public static class Application
{
    public static void Main()
    {
        var factory = new FluentSerializerFactory(x =>
        {
            x.ApplyConfigFrom<BarSerialization>();
        });
    }
}

public class BarSerialization : FluentSerializationSpec
{
    /// <summary> </summary>
    public BarSerialization()
    {
        WhenSerializing<Bar>()
            .SerializeAllAncestorsAsThisType();

        WhenDeserializing<Bar>()
            .DetermineIdentityBy(r => r.BarId);
    }
}

Remember that you don’t actually need to specify how to serialize most types.  You only need to specify serialization behavior when you want to alter the default serialization or deserialization behavior. 

Serializing Objects

The type serializers introduced in the last post are not exposed to the outside world.  Instead, serialization operations are performed through the public IFluentSerializer interface that can be obtained from a  FluentSerializerFactory.

public static class Application
{
    public static void Main()
    {
        var factory = new FluentSerializerFactory(x =>
        {
            x.ApplyConfigFrom<BarSerialization>();
        });

        var serializer = factory.CreateSerializer();
        ...
    }
}

Instead of returning an XML string or writing to a stream, FluentSerializer returns an XElement.  XElement’s can be converted to a string representation of the XML by simply calling the XElement.ToString method.  They can also be saved to a stream or file using the various overloads that are exposed for the XElement.Save method:

public static class Application
{
    public static void Main()
    {
        ...
        
        var serializer = factory.CreateSerializer();
        
        var originalBars = new[] { new Bar { BarId = 1 }, new Bar { BarId = 1 }, new Bar { BarId = 2 } };

        var xElement = _serializer.Serialize(originalBars);
        
        //Get a string containing the serialized XML
        var xml = xElement.ToString();
        
        //Save the XML to a file
        xElement.Save("testfile.xml");
    }
}

An instance of IFluentSerializer is capable of serializing *any* type using the default conventions.  This is markedly different than the behavior of the BCL’s XmlSerializer class, which is only capable of serializing types that it is configured for.

Deserializing Types

Deserialization works in much the same way as serialization: simply request an IFluentDeserializer from the factory and pass in the XML to be deserialized:

public static class Application
{
    public static void Main()
    {
        var factory = new FluentSerializerFactory(x =>
        {
            x.ApplyConfigFrom<BarSerialization>();
        });

        var deserializer = factory.CreateDeserializer();

        var xml = File.ReadAllText("bars.xml");

        var deserializedBar = deserializer.Deserialize<Bar>(xml);
    }
}

Note: The current version of the library still uses a raw string for the XML instead of an XElement.  This will be changed in the near future.

The deserializer will perform deserialization according to whatever rules you’ve specified for the type being deserialized.  If no rules are configured for the type, it will simply use the default deserialization conventions.  Again, this is very different from the behavior of the BCL’s XmlSerializer. 

What’s happening internally?

Not much, actually.  When you ask Fluently-XML to serialize or deserialize a type, it looks for an ITypeSerializer or ITypeDeserializer respectively that was built when the configuration data was “compiled” (see the last post).  If no such serializer/deserializer exists, one is created and cached for reuse in the future. 

If you’re really curious, I recommend checking out the code

What’s Next?

There’s no official release of Fluently-XML yet, but the project page is out there at fluentlyxml.codeplex.com.  My focus as far as open-source projects goes remains on LiteGrid and getting its 1.0 release out the door.  After that, my efforts will turn towards polishing up Fluently-XML and its 1.0 release.  For version 1.0, I’m planning on adding better support for specifying and overriding the default conventions and creating NuGet packages for both the library and a corresponding sample application.   While I’ve found performance to be acceptable for my use cases, some optimization is almost certainly needed to support other use cases.  I plan to focus on performance improvements sometime after the 1.0 release. 

In the meantime, please let me know what you think, and check out the code for more information.

Tags:

Mar 20 2011

Fluent XML Serialization–Part 2: Compiling Serialization

Category: FluentlyXMLMatt @ 13:40

The design and implementation of Fluently-XML’s domain-specific language was covered in my last post.  The language builds up configuration information that must then be transformed into a form that’s useful for performing serialization and deserialization at runtime.  In this post, I’ll show you how that translation occurs.

The Configuration Data

As I illustrated in the previous post, Fluently-XML’s domain-specific language (DSL) does little more than translate calls into configuration data.  The most complicated translation is the DetermineIdentityBy operation, which is really just three lines of code:

/// <summary> </summary>
internal class TypeDeserializationSpec<T> : ITypeDeserializationSpec<T>
{
    private readonly IDeserializationConfig _config;

    public TypeDeserializationSpec(IDeserializationConfig config)
    {
        _config = config;
    }

    /// <summary> </summary>
    public ITypeDeserializationSpec<T> DetermineIdentityBy(Func<T, object> identitySelector)
    {
        //Selector must be converted for use with the Reflection-based core deserialization process. 
        Func<object, object> wrappedSelector = o => identitySelector((T)o);
        _config.SetIdentityFunction(typeof(T), wrappedSelector);
        return this;
    }
    ...
}

(For more about the “magic” that’s going on, see the last post.)

The underlying deserialization configuration data is stored in IDeserializationConfig.  There is a corresponding interface, ISerializationConfig, that exists for storing configuration information about performing object-to-XML serialization.  Here are both interfaces:

internal interface ISerializationConfig
{
    void SerializePropertyAsArray(Type target, PropertyInfo property);
    void SetIgnoredProperty(Type target, PropertyInfo property);
    void MarkTypeAsSealed(Type type);
    void SerializePropertyAsElement(Type target, PropertyInfo property, string elementName);
    void SerializePropertyAsElementUsing(Type target, PropertyInfo property, string elementName, Func<object, object> converter);
}

internal interface IDeserializationConfig
{
    void SetIdentityFunction(Type target, Func<object, object> selector);
    void SetPropertyDeserializer(Type target, PropertyInfo property, PropertyDeserializer propertyDeserializer);
    void AddPostDeserializationCallback(Type target, Action<object> callback);
}

Both interfaces are currently implemented by a single class, FluentSerializationConfig.  This class exists primarily for merging of configuration from multiple sources.  This is important since you could potentially have multiple WhenSerializing<T> calls for a specific type, and in such cases the configuration data needs to aggregated in order to produce the correct serializer at runtime.

Building Up Serializers

Internally, FluentSerializationConfig builds up configuration for each type that requires either custom serialization or deserialization behavior.  This information is stored in TypeSerializationConfig and TypeDeserializationConfig respectively.  Prior to actually performing serialization or deserialization, Fluently-XML “compiles” FluentSerializationConfig into a new class, RuntimeSerializationConfig, that contains ITypeSerializer and ITypeDeserializer instances.  Here’s the high-level transformation:

internal class FluentSerializationConfig : ISerializationConfig, IDeserializationConfig
{
    ...
    
    public RuntimeSerializationConfig Compile()
    {
        var runtimeConfig = new RuntimeSerializationConfig();

        foreach (var serializerConfig in _typeSerializationConfig.Values)
        {
            var serializer = serializerConfig.BuildSerializer();
            runtimeConfig.AddSerializer(serializer);
        }

        foreach (var deserializerConfig in _typeDeserializationConfig.Values)
        {
            var deserializer = deserializerConfig.BuildDeserializer();
            runtimeConfig.AddDeserializer(deserializer);
        }

        return runtimeConfig;
    }

}

These instances are only created for types that were configured using the DSL.  However, Fluently-XML is still able to serialize and deserialize types that were not configured by dynamically creating new instances of ITypeSerializer or ITypeDeserializer on demand.

ITypeSerializer and ITypeDeserializer sport a par of methods, one for determining if the instance applies to a type, and another to perform serialization/deserialization:

internal interface ITypeSerializer
{
    bool CanSerialize(Type type);
    XElement Serialize(Type type, object obj, ISerializationContext context);
}

internal interface ITypeDeserializer
{
    bool CanDeserialize(Type type);
    object Deserialize(Type target, XElement element, IDeserializationContext context);
    object GetIdentityFor(object obj);
}

(You’ll notice that ITypeDeserializer actually has an extra method, GetIdentityFor.  This is a bit of cruft that I plan to refactor, I just haven’t gotten around to it.  It exists to support the deserialization of complex object hierarchies in cases where the same object may appear at multiple locations within the object graph. )

These interfaces aren’t exposed externally.  Instead, Fluently-XML exposes IFluentSerializer and IFluentDeserializer, which are capable of serializing and deserializing any type, respectively. 

Performing Serialization and Deserialization

In this post, I gave you an overview of how configuration data built up by Fluently-XML’s DSL is converted into types that can perform serialization and deserialization at runtime.  I didn’t get into too many details, but I plan to go more in-depth in the future once I’ve covered the high-level design.  In the next post, I’ll show you the classes that implement IFluentSerializer and IFluentDeserializer and how they leverage the types that are “compiled” from the DSL’s configuration data.

Questions or comments?  Please comment below or feel free to contact me through my site!

Tags:

Mar 13 2011

Fluent XML Serialization–Part 1: The Domain Specific Language

Category: FluentlyXMLMatt @ 13:10

Serializing objects to XML is easy in .NET thanks to the XmlSerializer class, but developers will quickly find that the built-in serializer is limited and not easy to extend.  A more flexible approach is needed to support complex serialization needs. Today I’ll show you Fluently-XML’s domain-specific language for configuring serialization behaviors, and I’ll dive (a bit) into how it’s implemented.

Warning: I confess that I am a newb at creating domain-specific languages (DSLs).  I may very well have approached the design and implementation of the DSL in completely the wrong way.  If so, please feel free to enlighten me in the comments. :)

Crafting the DSL

The Fluently-XML project was born out of a real, concrete business need. As such, I had plenty of requirements to help drive the design of the DSL.  I started from the top-down and worked as if the DSL already existed, taking an object that needed special serialization behavior and writing code that specified the desired behavior, and I created tests to verify the object was serialized correctly.  Here’s an example of using the DSL to control how Fluently-XML tracks object identity:

public class Bar
{
    public int BarId { get; set; }
    public int CustomId { get; set; }
    public string Name { get; set; }
    public string Value { get; set; }
    public string Label { get; set; }
}

public class BarSerialization : FluentSerializationSpec
{
    /// <summary> </summary>
    public BarSerialization()
    {
        WhenSerializing<Bar>()
            .SerializeAllAncestorsAsThisType();

        WhenDeserializing<Bar>()
            .DetermineIdentityBy(b => b.BarId);
    }
}

And here’s a corresponding test case:

[TestFixture]
public class When_deserializing_bar_array
{
    ...

    [SetUp]
    public void When()
    {
        var factory = new FluentSerializerFactory(x =>
        {
            x.ApplyConfigFrom<BarSerialization>();
        });

        _serializer = factory.CreateSerializer();
        _deserializer = factory.CreateDeserializer();

        _originalBars = new[] { new Bar { BarId = 1 }, new Bar { BarId = 1 }, new Bar { BarId = 2 } };

        _xml = _serializer.Serialize(_originalBars).ToString();

        _deserializedBars = _deserializer.Deserialize<Bar[]>(_xml);
    }

    ...

    [Test]
    public void Deserializing_bars_does_not_create_duplicates()
    {
        Assert.That(_deserializedBars[0], Is.SameAs(_deserializedBars[1]));
    }
}        

Approaching the design of the DSL in a top-down way helped guide me towards a language that was naturally suited to solving my particular problem. 

Implementing the DSL – Separating Specification From Execution through CQRS

I approached the implementation of the DSL in a test-driven development manner: I started with a failing test, wrote code that didn’t compile using a non-existent DSL, and began fleshing things out until I had compiling code and a passing test. Early on I decided that the DSL would build up configuration in a way that kept the DSL-related classes simple, and that I’d “compile” that configuration into a form that was better suited to the serialization/deserialization core logic. 

image

This proved to be a very wise decision, as it keep the implementation very simple and easy to work with.  This approach effectively separated the DSL’s “commands” (the “do this when serializing this type”) from the core serialization framework’s “queries” (the “how do I serialize this property on this type?”).  If this sounds familiar, it’s probably because you are at least  somewhat familiar with Command-Query Responsibility Segregation (CQRS).  This design was indeed inspired by CQRS principles as well as ideas I picked up from Jeremy Miller’s blog over the years. 

Here’s an example of the DSL implementation.  In fact, this is actually one of the most complicated bits in the entire DSL API:

/// <summary> </summary>
internal class TypeDeserializationSpec<T> : ITypeDeserializationSpec<T>
{
    private readonly IDeserializationConfig _config;

    public TypeDeserializationSpec(IDeserializationConfig config)
    {
        _config = config;
    }

    /// <summary> </summary>
    public ITypeDeserializationSpec<T> DetermineIdentityBy(Func<T, object> identitySelector)
    {
        //Selector must be converted for use with the Reflection-based core deserialization process. 
        Func<object, object> wrappedSelector = o => identitySelector((T)o);
        _config.SetIdentityFunction(typeof(T), wrappedSelector);
        return this;
    }
    ...
}

I’m not exaggerating, this one method is probably the most complicated thing in the entire DSL.  The tricky bit there is the conversion from a generic Func<T,object> to a non-generic Func<object,object>.  That’s actually the solution to a problem that took me quite a while to figure out.  The DSL has to be generic in order to provide a nice, strongly-typed developer experience.  However, at run-time, the serialization framework works with the objects to be serialized and deserialized in a non-generic way, meaning it references everything as if it is a System.Object.  The core serialization classes cannot be generic because they need to access properties, methods, etc. at runtime that can’t even be guessed about at compile time.  I struggled with that problem for quite a while before finding an elegant solution.  I’m actually quite proud of that one line of code, even if the solution is trivially simple. :)

Ignoring that, the important piece here is how simple the actual DSL implementation is: it’s really just a wrapper that provides a nice way to build up some configuration data that will be “compiled” into serialization/deserialization objects at runtime.  I’ll talk more about how the config is handled in the next post.

Improving the DSL – Using Interfaces To Limit Methods

One of the goals I had for the DSL was to avoid the “AutoMapper problem.”  While I do love AutoMapper, I really don’t care for the DSL it uses to specify custom mapping behavior.  It relies on nested lambdas that can look quite ugly:

Mapper.CreateMap<Widget, WidgetViewModel>()
    .ForMember(dest => dest.OwnerName, opt => opt.MapFrom(src => src.TheOwner.Name));

I wanted to avoid this nesting of lambdas.  I accomplished this by having each “token” or operation in the DSL return an interface that exposed only the tokens that were valid at that point.  Here’s the Intellisense list you are presented with immediately after a “WhenSerializing” statement:

image

And again after specifying which property the statement applies to:

image

At this point, I can either specify additional options on the property, or I can select a different property, so the Intellisense gives me both options:

image

Note that the “Using” token no longer appears in the list.  “Using” is actually a poorly-named method (which I will fix eventually), but it’s purpose is to allow you to completely override how Fluently-XML serializes a type.  As such, the token is only valid as the first (and only) statement for a particular type.  Once you’ve specified custom serialization behavior for a specific property, it no longer makes sense to completely override how serialization is to be performed.  There are other tokens I need to perform similar filtering on, such as “IgnoreAllProperties”, and I should really filter out methods inherited from Object as well, such as ToString… but I digress.

You might be thinking that all this mess with interfaces must make the DSL’s implementation a nightmare, but it actually doesn’t.  There are only two actual classes in the DSL for serialization: one for class-level statements, and another for property-level statements.  Each implements all the methods for all the applicable class/property interfaces.  It’s only through the return types of each method that the visibility of tokens is controlled.  This keeps the implementation quite simple while still providing a clean, filtered API for specifying serialization behavior.

There are advantages to AutoMapper’s DSL, namely that an incomplete statement in the DSL won’t even compile.  This example wouldn’t even compile as it’s not valid code:

Mapper.CreateMap<Widget, WidgetViewModel>()
    .ForMember(dest => dest.OwnerName);

However, an incomplete statement in Fluently-XML’s DSL will not generate a compile-time error:

WhenSerializing<Widget>()
    //Missing a token after property selection!
    .Serialize(w => w.Name);

Indeed it won’t even generate a runtime error.  I have a couple of vague ideas about how to solve this problem, but I doubt the problem is going to occur often enough to be worth solving.  In the end, I much prefer Fluently-XML’s approach over AutoMapper’s. 

Coming Up Next…

I hope this post has shed some light on the design of Fluently-XML’s DSL as well as given you some insight into the tradeoffs one must make when building a DSL.  In the next post, I’ll show you how configuration built up from the DSL is converted into objects to perform serialization at runtime.

Tags:

Mar 6 2011

Fluent XML Serialization–Introduction

Category: FluentlyXMLMatt @ 09:14

The System.Xml.XmlSerializer class enables .NET applications to serialize/deserialize most types to and from XML using only a few lines of code.  This is a great capability and provides an easy API for simple persistence and interoperability scenarios.  As a developer, you have some degree of control over the XML that’s generated, but the process is mostly rigid and not easy to extend or customize.  There are also numerous “gotchas” around XML serialization, such as the inability to serialize IDictionary types, the inability to serialize and deserialize interfaces, and no support for the concept of “identity” when deserializing object graphs.  Usually one can find a way around these limitations, but on a recent project I found that the pain of working around them was too great to bear.  Out of that pain was born a new flexible XML serialization framework that overcomes the limitations of the XmlSerializer class.  Read on to find out more.

XmlSerializer’s Abilities

XmlSerializer is a useful class that all .NET developers should be at least somewhat familiar with.  Using it, you can easily transform most types to XML and back again, like this example from MSDN illustrates:

private void SerializeObject(string filename)
{
   Console.WriteLine("Writing With Stream");

   XmlSerializer serializer = 
   new XmlSerializer(typeof(OrderedItem));
   OrderedItem i = new OrderedItem();
   i.ItemName = "Widget";
   i.Description = "Regular Widget";
   i.Quantity = 10;
   i.UnitPrice = (decimal) 2.30;
   i.Calculate();

   // Create a FileStream to write with.
   Stream writer = new FileStream(filename, FileMode.Create);
   // Serialize the object, and close the TextWriter
   serializer.Serialize(writer, i);
   writer.Close();
}

You can convert almost any object to XML with a single line of code thanks to this extension method:

public static class XmlExtensions
{
    public static string ToXml(this object obj)
    {
        var serializer = new XmlSerializer(obj.GetType());

        using (var writer = new StringWriter())
        {
            serializer.Serialize(writer, obj);
            return writer.ToString();
        }
    }

    //Usage: string xml = myObject.ToXml();
}

This is great for simple persistence scenarios, for making simple configuration settings files, and for providing interoperability with other systems.  It is also very performant.  The first time you construct a new XmlSerializer for a type, a custom serializer is emitted that can process the type without using reflection each time.  This custom serializer is cached for the life of the application domain, and you can actually generate a serialization assembly if you want to avoid the performance hit the first time you serialize a new type each time your app runs.

XmlSerializer’s Inabilities

XmlSerializer is really quite useful for basic scenarios.  However, like many things in the .NET BCL, XmlSerializer was not built with extensibility in mind.  To customize how it serializes a type, you have two options: use XML attributes or implement the IXmlSerializable interface and write your own serialization logic.  If you try to use attributes to customize the behavior of XmlSerializer, you’ll quickly find that your control is very limited.  You can ignore or rename properties and perform other such simple transformations, but that’s about it.  The attribute-based approach also requires that you dirty up your objects with attributes, which some people consider a violation of the Single Responsibility Principle.  If you want to customize something that isn’t supported by the very limited set of attributes, you’re out of luck unless you want to implement IXmlSerializable, and at that point you’re basically on your own for serializing your type. 

Another weakness of XmlSerializer is that it has no concept of object identity.  When deserializing an object graph, XmlSerializer will create a new instance for each object in the graph. It has no way of knowing that an object might appear in multiple locations in the XML. 

XmlSerializer also doesn’t support serializing object graphs that contain cycles.  You can use the XmlIgnore attribute to ignore properties that cause cycles, but that property will also be ignored when deserializing the object graph, which means you’ll have to manually rebuild properties in the object graph after XmlSerializer finishes.

Introducing Fluently-XML

One of the many projects I’m working on is a cost modeling system known as InGauge.  InGauge uses XML to provide interoperability with other cost modeling tools.  We’re dealing with very complex object graphs that must be correctly serialized and deserialized in order to maintain the integrity of the data as it passes from one system to another.  Our team found that the limitations of XmlSerializer proved to be too painful to work around, so we created a custom XML serialization framework that gave us the control and flexibility we needed.  Fluently-XML (that’s what I’m calling this framework) provides the same basic serialization/deserialization capabilities right out of the box as .NET’s XmlSerializer, but it also sports a fluent domain specific language (DSL) that can be used to customize the serialization and/or deserialization process for any type.  It gives us complete control over how our object graphs are handled in a lightweight manner.  Here are just a couple of things it currently provides beyond XmlSerializer:

Object Identity

[Test]
public void Deserialization_respects_object_identity()
{
    var bar1 = new Bar { BarId = 1 };
    var bar2 = new Bar { BarId = 2 };
    
    var bars = new[] { bar1, bar2, bar 1};
    
    string xml = _serializer.Serialize(bars).ToString();
    
    var deserializedBars = _deserializer.Deserialize<Bar[]>(xml);
    
    //This passes!
    Assert.That(deserializedBars[0], Is.SameAs(deserializedBars[2]);
}

Serializing Proxies

[Test]
public void Proxied_types_can_be_serialized_and_deserialized()
{
    var generator = new ProxyGenerator();
    var bar = (Bar)generator.CreateClassProxy(typeof(Bar));
    bar.Name = "Test!";

    string xml = _serializer.Serialize(bar).ToString();
    
    //Even though it's a proxied type, it still gets serialized 
    //as the correct underlying type!
    Assert.That(xml, Is.StringContaining("<Bar>");
    
    var deserializedBar = _deserializer.Deserialize<Bar>(xml);
    
    //We can deserialize the XML back to a normal bar!
    Assert.That(deserializedBar.Name, Is.EqualTo("Test!"));
}

Cycles and Parent/Child Relationships

[Test]
public void Cycles_and_parent_child_relationships_are_supported()
{
    var parent = new Foo { ID=1, Name = "Parent" };
    var child = new Foo { ID=2, Name = "Child" };
    parent.Children.Add(child);
    child.Parent = parent;

    string xml = _serializer.Serialize(parent).ToString();
    
    //The parent is serialized by it's ID!
    Assert.That(xml, Is.StringContaining("<Parent>1</Parent>");
    
    var deserializedParent = _deserializer.Deserialize<Foo>(xml);
    
    //Even though it was serialized as a single integer, 
    //the Parent property is still deserialized correctly!
    Assert.That(deserializedParent.Children[0].Parent, Is.SameAs(deserializedParent));
}

Coming up next…

This is the first of hopefully many posts on this framework.  We’ll start in the next post by looking at the fluent DSL that can be used to customize the serialization/deserialization process and how it’s used to build up custom serialization behavior at runtime. 

Tags: