Unsupervised Classification

Tutorial: Unsupervised Classification[edit]

Install SAGA

For the unsupervised classification we will use SAGA tools. If you do not have SAGA already, please install it. An installation tutorial can be found here and the latest version can be downloaded here.

Load bands

First, we need to open a new QGIS project and add a satellite image of the area we are interested in. We will use Sentinel-2 data (bands 1-8) for this tutorial.

Click Add Layer → Raster Layer and Select and add all bands.

Build virtual raster

First, we will build a virtual raster so that we can see a true colour image. This will help us later to assess the classification.

Go to Raster → Miscellaneous → Build Virtual Raster.

Then choose all the bands. Make sure to tick “Place each input file into a separate band” and click Run.

Now go to the Layer Properties → Symbology.

To display a true color image, assign band 4 as the red band, band 3 as the green band and band 2 as the blue band. You can also create false colour images this way. Here’s an overview for possible combinations. Be aware that these are specific for Sentinel-2 data.

K-means clustering

Open the Processing toolbox and search for K-means clustering for grids. Alternatively, look for Isodata clustering for grids. We will continue with K-means here, but you can try both. For Isodata, select an initial number of classes (e.g. 4) and a maximum number (e.g. 10). For a detailed explanation of the different methods read this article.

Then select the bands you want to use for the classification. You can select all or only specific ones, try what works best. Next, select the number of clusters / classes. You can try different numbers here and see how the results differ. Save to a location. You can leave the rest on default, but you can also try out different settings. Finally, click Run. A new layer will be added to your Layers Panel.

In the Layer Properties change the Symbology to Palleted / Unique values and click Classify to assign a different colour to each class. By default, colours are assigned randomly. You can change them via double-clicking.

This is what the classification looks like with six different classes:

Assess the classification

Visually assess the classification by comparing it to the true colour image we built in the beginning. Which classes represent which land covers? Is the classification able to distinguish between them reasonably well? Try out different settings (e.g. a different number of classes) or Isodata clustering and compare the results.

Reclassify (optional)

Chances are that the classes identified are not exactly what you want. For example, in the classification above, when comparing with the true colour image, classes 1, 2 and 6 all represent some form of grassland / agricultural fields, classes 3 and 4 both represent areas without vegetation (e.g. bare soil, infrastructure) and class 5 mainly includes forested areas. To change this, go to Processing toolbox and search for Reclassify by table. Choose the classification you want to modify. Then, click on the three points for Reclassification table and add rows according to the number of your classes.

For Minimum and Maximum insert the same number as the class number (e.g. for class 1 -> row 1, Min=1 and Max=1). For Value choose the new class number – what the class should turn into (e.g. turn class 2 into class 4 -> row 2, Min=2 and Max=2, Value=4). In Range boundaries change to min<=value<=max. Then, save and Run.

Change the Symbology of the new layer again, as before. You can also add labels.

This is the result. The number of classes has been reduced to three now. Unfortunately, the classification was not able to distinguish between soil, water and infrastructure reliably, this is why they are now all merged in the “no vegetation” class.

To quantitatively assess the accuracy of your classification, you should perform an accuracy assessment next. This will be explained in a separate tutorial.