ML.NET in Jupyter Notebooks
Microsoft recently reached a major milestone in the development of the open-source cross-platform library ML.NET at MS Ignite 2019 in Orlando. With the release of a long awaited C# kernel, ML.NET is now fully supported to run in Jupyter Notebooks.
So, what is Jupyter Notebook, and why is this significant? Jupyter Notebook is an open-source web application that is heavily utilized by the data science community. A notebook consists of multiple cells, which can either contain interactive code or text, with the code being executed on a separate kernel.
What really makes Jupyter Notebooks so appealing is that the code is very easy to share (it’s all contained in a .ipynb file). A notebook can contain interactive plots and charts to describe the data, and each cell can be executed in isolation. With the support of Jupyter Notebooks, we’re able to use ML.NET for the full machine learning workflow, and not only to train our model.
Alright, enough talking, let’s see some code! To get started, we need to set up our local environment
Install Jupyter - There are many ways to install Jupyter, but the easiest way is to download and install Anaconda.
Install dotnet try - The C# kernel is based on the dotnet try tool. To install the dotnet try tool, open either the command prompt or PowerShell and type in the following command: dotnet try jupyter installg dotnet-try
Install the .NET Jupyter Kernel – to connect Jupyter with the dotnet try tool, execute the following command in a command prompt or PowerShell to install the .NET Kernel: dotnet try jupyter install
To start Jupter Notebooks, open Anaconda and click on Jupyter Notebook.
It’s always easier to learn something new using examples. To that effect, I’ve created a small GitHub repository with a couple of sample notebooks for you to explore at your convenience.
Let's walk through a multi-class classification example to get familiar with the structure of a notebook.
Declare package dependencies
Just like in regular C# code, we are able to import and consume third-party NuGet packages.
To do so, we can prefix the package name with #r.
Declare data types
A cell can either contain a class or a method. Once a cell has been executed it can be referenced and used by other cells in the notebook.
One of the key benefits of using Jupyter Notebooks with ML.NET is its support for data exploration and plotting. In order to build a robust machine learning model, we need to know our data inside and out. In Jupyter, we can take a look at the data by dumping it to a table.
We can also use third-party libraries such as Xplot.Plotty to create plots.
With the introduction of Jupyter support, ML.NET takes a big step in the right direction. There are still plenty of bits and pieces to improve, e.g. support for native C# IntelliSense, but I'm positive that the ML.NET team is on the right track.
I hope you've found this post useful. Should you have any questions, or just want to chat ML.NET, you can always find me on Twitter at @alexslotte!