Apr 10 2009

A (much) more informed view of Infer.NET

Category: MachineLearningMatt @ 08:44

In response to my naive impression of Infer.NET, John Winn has supplied a wealth of insight into its design and implementation:

Many thanks for taking the time to look at Infer.NET and post your comments. As one of the researchers behind Infer.NET, I would like to explain a bit more about the project which may make some of the design choices more understandable. Software packages like Weka are great for loading, visualizing and applying existing machine learning algorithms to data. In these packages, the machine learning algorithms are 'black boxes' - you supply them with data in the form they expect and they perform clustering/regression/classification or whatever. In contrast, Infer.NET is a tool for making new algorithms, customized to solve particular problems that don't fit neatly into one of the existing forms. For example, a recent problem we looked at involves ranking a number of papers submitted to a conference where each paper has a certain number of reviewers and reviewers have biases - so you need to learn both the paper's quality and the reviewer's bias simultaneously. This problem does not fit into one of the above categories, but can be represented as a Bayesian model and solved in Infer.NET. You can also solve standard clustering, regression and classification problems by constructing the appropriate model - for example, clustering is achieved by creating a mixture using a Switch expression, such as in this example. But because Infer.NET opens up the black box you can then take your model and tailor it to your problem (e.g. these two classification results should be similar, some of my data is labeled and I'd like the clustering to reflect that, I know that this output must lie between these other two outputs). This ability to create an algorithm exactly matched to the problem you are trying to solve turns out to be very valuable in many application domains, since you can include rich domain knowledge in the model. We also have many examples of problems which simply cannot be solved without this ability.

To handle a large range of models efficiently, the Infer.NET engine is structured as a compiler - it takes a model definition and compiles it into C# code for solving the model. This allows Infer.NET to be efficient enough to apply to huge datasets (e.g. billions of datapoints). Infer.NET can execute the generated code for you or you can take the code and embed it in a .NET application of your own. The magic string name you mention is the name of the variable in the generated code - it is entirely optional - if you don't supply it then the system will generate a name for you, but the code will be less readable. The Variable.If construct does require you to define your model in a single thread - however the generated code *is* fully thread safe and can be executed either multi-threaded or fully distributed on a cluster. One alternative to the admittedly ugly Variable.If construct would be to have a separate text file describing the model (like in BUGS) and you could load the model from the file. However, this would prevent dynamic construction of the model in code and would require users learning a new modeling language, whereas the slightly less elegant API version allows dynamic construction and can be called from any .NET language. Another alternative we considered is using LINQ expressions to specify the model, but these do not support statements (such as if,switch) at the moment. These are planned for future versions of LINQ, at which point we will look again at this option.

In the project so far, we have concentrated our effort on the core inference engine - making it so that it can be applied to a large range of models, making inference fast and validating the engine on a wide range of problems in different domains. We realize that, in its current form, the compiler is less approachable than GUI-based inference software, and this is something we are looking to address in future. However, if you have a problem which is not in a form that is solvable by a standard black-box machine learning algorithm then Infer.NET has a lot to offer.

Best wishes,

John Winn
Microsoft Research Cambridge

So, I think I missed the point of what Infer.NET provides: it’s not a framework of machine learning techniques (like Weka), it is a framework for building machine learning algorithms.  A lot of the syntax nastiness makes a lot more sense now, too.  It’s probably still beyond the reach of many developers, but those experienced in probabilities and machine learning should be able to build powerful models using Infer.NET.  Thanks for the response, John!

Tags:

blog comments powered by Disqus