This walkthrough is also available as a Jupyter ipynb Notebook - you can run yourself

Vega-Lite is a charting library that provides a great deal of flexibility while also allowing for very simple use cases.

The Vega-Lite JavaScript API provides a convenient way to write Vega-Lite specifications in a programmatic fashion.

(see @vega/vega-lite-api-v5 on observablehq for a great introduction)
(see vega-lite documentation for deeper options)

Screenshot of vega lite examples from vega-lite home

Under the hood, it creates complete Vega Charting specifications that further uses d3 as a visualization kernel

In our case two options are supported:

through the vega-lite JSON specification - a high level grammar that creates
through the vega-lite-api - a JavaScript api to write the vega-lite specifications

(As quick summary: Vega-Lite-Api: creates -> Vega-Lite JSON specifications: creates -> Vega charting specifications: -> using D3 as a visualization kernel)

For more:

See the excellent Vega-Lite-API Observable Notebooks and Jeffrey Heer's Curriculum
or see the Vega-Lite Examples and Vega Examples
additionally see the y-combinator article with a developer here

Libraries Used

First we will need an instance of the jupyter-ijavascript-utils library.

utils = require('jupyter-ijavascript-utils');
['utils'];

Getting Data

We'll be working with three datasets:

Explicit Data:

//-- explicit data
simpleData = [{fruit:'Apples',yield:20,year:'2020'},{fruit:'Apples',yield:22,year:'2021'},
              {fruit:'Bananas',yield:15,year:'2020'},{fruit:'Bananas',yield:12,year:'2021'},
              {fruit:'Pears',yield:18,year:'2020'},{fruit:'Pears',yield:19,year:'2021'}];
$$.json(simpleData);
//-- JupyterLab shows the data in a collapsible viewer

Datasets module / Vega-Datasets

Data loaded from the datasets / vega-datasets module

//-- note that as this is using the ijs.await function,
//-- cells afterward can be synchronous and still assume the data is loaded

utils.ijs.await(async ($$, console) => {
  cars = await utils.datasets.fetch('cars.json');
  console.log(`loaded cars:${cars.length}`);
});
//-- loaded cars:406

URL within the Spec

We can also request that vega make the request for the data, if it is within the specification.

// utils.vega.svg((vl) => vl.markCircle()
//     .data({ url: 'https://vega.github.io/vega-lite/examples/data/movies.json' })
//     ...
// });

Getting Started

The simplext example is to render a small dataset:

simpleData = [{fruit:'Apples',yield:20,year:'2020'},{fruit:'Apples',yield:22,year:'2021'},
              {fruit:'Bananas',yield:15,year:'2020'},{fruit:'Bananas',yield:12,year:'2021'},
              {fruit:'Pears',yield:18,year:'2020'},{fruit:'Pears',yield:19,year:'2021'}];

We would like to render this into a chart, say with points first:

utils.vega.svg(
    // accept the reference to the vega-lite instance
    (vl) =>
    // render as points
    vl.markPoint()
        // use simpleData as the data source
        .data(simpleData)
        // title
        .title('Fruit by Yield')
        // this is where the x, and y are encoded
        .encode(
            // use the 'yield' property as the y axis
            // note this is fieldQ - a qualitative number with a numeric value
            vl.y().fieldQ('yield'),
            // use the 'fruit' property as the x axis
            // note this is fieldN - a nominative string with no numeric value
            vl.x().fieldN('fruit')
        )
)

Screenshot

Flip the X and Y axis

Note that the order that we put the axes don't matter.

Let's swap the fields used by the x and y axes:

utils.vega.svg(
    // accept the reference to the vega-lite instance
    (vl) =>
    vl.markPoint()
        .data(simpleData)
        .title('Fruit by Yield')
        .encode(
            // use the x axis as the yield
            vl.x().fieldQ('yield'),
            // use the y axis for the fruit type
            vl.y().fieldN('fruit')
        )
)

Screenshot

Make a Line Chart

We can change the type of mark to markLine to make it a line chart

utils.vega.svg(
    // accept the reference to the vega-lite instance
    (vl) =>
    vl.markPoint()
        .data(simpleData)
        .title('Fruit by Yield')
        .encode(
            vl.x().fieldQ('yield'),
            vl.y().fieldN('fruit'),
            //-- add the shape attribute
            vl.shape().fieldN('year').title('Year'),
            //-- add the color attribute
            vl.color().fieldN('year').title('Year')
        )
)

Screenshot

Make a Line Chart

We can change the type of mark to markLine to make it a line chart

utils.vega.svg(
    (vl) =>
    // change from `markPoint` to `markLine`
    vl.markLine()
        .data(simpleData)
        .title('Fruit by Yield')
        // set the width of the graph
        .width(150)
        .encode(
            vl.y().fieldQ('yield'),
            // change the field to 'year' from 'fruit'
            vl.x().fieldN('year'),
            // add a new series coloration, using 'fruit'
            vl.color().fieldN('fruit')
        )
)

Screenshot

Make a Bar Chart

We can change the type of mark to markBar to make it a bar chart

utils.vega.svg(
    (vl) =>
    // change to `markBar`
    vl.markBar()
        .data(simpleData)
        .title('Fruit by Yield')
        // set the width of the graph
        .width(150)
        .encode(
            vl.x().fieldQ('yield'),
            vl.y().fieldN('year'),
            vl.color().fieldN('fruit')
        )
)

Screenshot

Datasets module / Vega-Datasets

We can also load data from the the datasets / vega-datasets module

//-- note that as this is using the ijs.await function,
//-- cells afterward can be synchronous and still assume the data is loaded

utils.ijs.await(async ($$, console) => {
  cars = await utils.datasets.fetch('cars.json');
  console.log(`loaded cars:${cars.length}`);
});
//-- loaded cars:406

Note that even though the following line is synchronous, is still waits for the line above to complete before rendering.

(It does this because we are using the ijs.await function - otherwise it wouldn't wait)

console.log(`how many cars types do we have? ${utils.aggregate.distinct(cars, 'Name')}`);
// how many cars types do we have? 311

Types of Fields and Scales

Types of fields:

fieldT (type: 'temporal') - time based
fieldN (type: 'nominal') - Represents a discrete category (string)
fieldQ (type: 'qualitative') - Represents a number

See the Vega-Lite Axis Documentation

utils.vega.svg((vl) => vl.markPoint()
  .data(cars)
  .title('Displacement vs Horsepower')
  .width(500)
  .height(500)
  .transform(
    vl.filter('datum.Cylinders > 4')
  )
  .encode(
    //-- Qualitiative field - a number
    //-- that can have the position determined relative to another and charted
    vl.x().fieldQ('Displacement'),
    vl.y().fieldQ('Horsepower'),
    vl.color().fieldN('Origin').title('Car Origin').scale({ range: ['green', 'red', 'blue'] })
  )
)

Screenshot

Objects / Specification

Vega-Lite is simply a JavaScript API for creating the JavaScript Object specifications.

You can also include those specifications within the definition, like this:

//-- notice how this and the specification above mirror each other.
utils.vega.svg((vl) => vl.markPoint()
    .data(cars)
    .title('Displacement vs Horsepower')
    .width(500)
    .height(500)
    .transform(
      vl.filter('datum.Cylinders > 4')
    )
    .encode(
      vl.x({field:'Displacement', type:'quantitative'}),
      vl.y({field:'Horsepower', type:'quantitative'}),
      vl.color({
       field:'Origin', title:'Car Origin', type:'nominal',
       scale: {range: ['green', 'red', 'blue']
      }}
     )
    )
  )

Screenshot

Rendering Via Specifications

Rendering via the specifications can be easier for some people to understand than through the Vega-Lite-Api

It also means, you can pass directly from the vega lite examples - such as this one using the bar charts example

sampleData = [
  {"a": "A", "b": 28}, {"a": "B", "b": 55}, {"a": "C", "b": 43},
  {"a": "D", "b": 91}, {"a": "E", "b": 81}, {"a": "F", "b": 53},
  {"a": "G", "b": 19}, {"a": "H", "b": 87}, {"a": "I", "b": 52}
];
utils.vega.svgFromSpec(
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "description": "A simple bar chart with embedded data.",
  "data": {
    //-- data listed explicitly in the spec is also supported
    "values": sampleData
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "a", "type": "nominal", "axis": {"labelAngle": 0}},
    "y": {"field": "b", "type": "quantitative"}
  }
});

Screenshot of object schema

Loading directly via URL

To load data by URL, simply set the data attribute like so:

    .data({ url: 'https://vega.github.io/vega-lite/examples/data/movies.json' })

For example:

utils.vega.svg((vl) => 
  vl.markPoint()
    .title('Binned Rotten Tomatoes Rating by IMDB Rating')
    .data({ url: 'https://vega.github.io/vega-lite/examples/data/movies.json' })
    .encode(
      //-- limit to 20 bins
      vl.x().fieldQ('Rotten Tomatoes Rating'),
      vl.y().fieldQ('IMDB Rating')
    ));

Screenshot IMDB all Movies

Unfortunately though, the data is really hard to read at this size.

Bins

Bins are especially helpful in numeric columns, as they can then group values based on various bin sizes.

Say for example, if we instead limited it to only 20 categories each. Would that be easier to read?

Binning becomes rather simple as we

(notice we access the movies directly - without loading)

utils.vega.svg((vl) => 
  vl.markCircle()
    .title('Binned Rotten Tomatoes Rating by IMDB Rating')
    .data({ url: 'https://vega.github.io/vega-lite/examples/data/movies.json' })
    .encode(
      //-- limit to 20 bins
      vl.x().fieldQ('Rotten Tomatoes Rating').bin({maxbins: 20}),
      vl.y().fieldQ('IMDB Rating').bin({maxbins: 20}),
      //-- set the size based on the number of records in the 'bin'
      vl.size().count()
    ));

Screenshot

Much better.

Appendix

Note on Accessing Data

Some examples, may directly reference the data

(Note that sometimes they may use a URL that you may want to download or access directly)

In our case, the Scatterplot Pan & Zoom example

the data is originally as follows:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "data/cars.json"},
  ...

We need to change it to the full path:

utils.vega.svgFromSpec({
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "https://vega.github.io/vega-lite/data/cars.json"},
  "params": [{
    "name": "grid",
    "select": "interval",
    "bind": "scales"
  }],
  "mark": "circle",
  "encoding": {
    "x": {
      "field": "Horsepower", "type": "quantitative",
      "scale": {"domain": [75, 150]}
    },
    "y": {
      "field": "Miles_per_Gallon", "type": "quantitative",
      "scale": {"domain": [20, 40]}
    },
    "size": {"field": "Cylinders", "type": "quantitative"}
  }
});

Note that the data binding can also work within vega-lite-api:

utils.vega.embed((vl) => vl.markCircle()
  .data({"url": "https://vega.github.io/vega-lite/data/cars.json"})
  .params({
    "name": "grid",
    "select": "interval",
    "bind": "scales"
  })
  .encode(
    vl.x().fieldQ('Horsepower').scale({"domain": [75, 150]}),
    vl.y().fieldQ('Miles_per_Gallon').scale({"domain": [20, 40]}),
    vl.size().fieldQ('Cylinders')
  )
)

Screenshot of zooming with data

Transformations

There are many different types of transforms, and aggregations you can also do within vega.

Personally, I tend to prefer doing the aggregation beforehand, and passing the results to the data attribute, but there are a number of rather interesting points why you may want to do it within Vega.

See the y-combinator article with a developer here

(Notice the commas separating out statements)

utils.vega.svg((vl) =>
  vl.markBar()
    .title('Gross by Director')
    .width(400).height(400)
    .data({ url: 'https://vega.github.io/vega-lite/examples/data/movies.json' })
    .transform(
      //-- filter records in dataset (datum) where .Director is found
      vl.filter('datum.Director != null'),

      //-- group by the Director field, and aggregate into the Gross field
      vl.aggregate(
          vl.sum('Worldwide Gross').as('Gross')
      ).groupby('Director'),

      //-- sort by 'Gross' and generate Rank field
      vl.window(vl.rank().as('Rank'))
        .sort(vl.field('Gross').order('descending')),

      //-- limit to only the top 20
      vl.filter('datum.Rank < 20')
    )
    .encode(
      vl.x().fieldQ('Gross'),
      vl.y().fieldN('Director')
        .sort(vl.field('Gross').order('descending'))
    ));

Screenshot

Maps / Choropleths

Choropleths are data driven maps.

Screenshot - example choropleth

See the Let's make a Data Driven Map to make your own.

Jupyter Lab support for Vega-Lite

For this tutorial, we'll be working with the vega module.

Note that Jupyter Lab has support for the vega mimetype, and can be used through the vega.vegaMimeType() and vega.vegaLiteMimeType() entries.

However, it seemed at the time of this witing that it did not work in all cases:

Such as this example.

vegaSpec = {
  //-- spec from the example
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "description": "Drag the sliders to highlight points.",
  "data": {"url": "https://vega.github.io/vega-lite/examples/data/cars.json"},
  "transform": [{"calculate": "year(datum.Year)", "as": "Year"}],
  "layer": [{
    "params": [{
      "name": "CylYr",
      "value": [{"Cylinders": 4, "Year": 1977}],
      "select": {"type": "point", "fields": ["Cylinders", "Year"]},
      "bind": {
        "Cylinders": {"input": "range", "min": 3, "max": 8, "step": 1},
        "Year": {"input": "range", "min": 1969, "max": 1981, "step": 1}
      }
    }],
    "mark": "circle",
    "encoding": {
      "x": {"field": "Horsepower", "type": "quantitative"},
      "y": {"field": "Miles_per_Gallon", "type": "quantitative"},
      "color": {
        "condition": {"param": "CylYr", "field": "Origin", "type": "nominal"},
        "value": "grey"
      }
    }
  }, {
    "transform": [{"filter": {"param": "CylYr"}}],
    "mark": "circle",
    "encoding": {
      "x": {"field": "Horsepower", "type": "quantitative"},
      "y": {"field": "Miles_per_Gallon", "type": "quantitative"},
      "color": {"field": "Origin", "type": "nominal"},
      "size": {"value": 100}
    }
  }]
};

This will work correctly:

// this will work because we are using
utils.vega.embedFromSpec(vegaSpec);

Screenshot of Vega-Lite with Sliders

However the following will not in all cases, as an older version of Vega Lite is embedded within Jupyter. (At least as of May 2022)

//-- use the mimetype supported by vega will fail
display.mime({ 'application/vnd.vega.v5+json': vegaSpec });
// Javascript Error: Cannot read properties of undefined (reading 'length')

This is the same as calling the vega.vegaMimeType function:

utils.vega.vegaMimeType(vegaSpec);
// Javascript Error: Cannot read properties of undefined (reading 'length')

See the vega.vegaMimeType() and vega.vegaLiteMimeType() entries for more.

See the excellent Vega-Lite-API Observable Notebooks and Jeffrey Heer's Curriculum
or see the Vega-Lite Examples and Vega Examples

Tutorial: Vega-Lite

Libraries Used

Getting Data

Explicit Data:

Datasets module / Vega-Datasets

URL within the Spec

Getting Started

Flip the X and Y axis

Make a Line Chart

Make a Line Chart

Make a Bar Chart

Datasets module / Vega-Datasets

Types of Fields and Scales

Objects / Specification

Rendering Via Specifications

Loading directly via URL

Bins

Appendix

Note on Accessing Data

Transformations

Maps / Choropleths

Jupyter Lab support for Vega-Lite