Task 1: Detect Building Blocks
This task consists in detecting a set of closed shapes (building blocks) in the map image.
Building blocks are coarser map objects which can regroup several elements. Detecting these objects is a critical step in the digitization of historical maps because it provides essential components of a city. Each building block is symbolized by a closed which can enclose other objects and lines. Building blocks are surrounded by streets, rivers fortification wall or others, and are never directly connected. Building blocks can sometimes be reduced to a single spacial building, symbolized by diagonal hatched areas.
Given the image of a complete map sheet and a mask of the map area, you need to detect each building block, as illustrated below on a excerpt of the input.
We identified the following challenges:
- Building blocks can have very variable sizes.
- They can be reduced to a single building (diagonal hatched area).
- Their contour can be damaged (ie non-closed).
- Several other layers of information can be overlaid on building block outlines (text, railways, underground lines, graticule lines among others).
- There may be decompositions inside a building block, increasing the number of edges to filter.
- Building blocks may be large empty areas only surrounded by a contour. The lack of texture information and the variable sizes may be challenging to multiscale texture methods like convolutional neural networks.
- Producing closed contours requires to embed strong guarantees in the method.
The inputs form a set of JPEG RGB images like the one illustrated below. There are complete map sheet images cropped to the relevant area for which the non-relevant area is replaced by black pixels, as illustrated below. Those images can be large (8000x8000 pixels).
To help participants identify the non-relevant pixels, we also provide a mask image for which non-relevant pixels have value
0 and relevant pixels have value
255, as illustrated below. Theses masks are the expected output for task 2 (cropped to the relevant part).
Ground truth and Expected outputs
Expected output for this task is a binary mask of the building blocks. It must be stored in PNG (lossless) format with an 8-bit single channel. Background must be indicated with pixel value 0, and building block areas with pixel value 255. We will threshold the image to discard any other value.
The resulting image should look like the one below, which is the expected output for the sample input previously shown.
Results need to be output in a PNG file with the exact same format and naming conventions as the ground truth, except for the
GT part of the filename which should be changed into
if the input image is named
train/301-INPUT.jpg, then the output file must be named
Content for task 1 is located in the folder named
1-detbblocks in the dataset archive.
File naming conventions
Train, validation and test folder (if applicable) contain the same kind of files:
JPEG RGB image containing the input image to process. This image was cropped to map content and irrelevant content (masked out by the mask file described below) are set to black (RGB=
(0,0,0)). Those images can be large (10000x10000 pixels).
PNG GREY image containing a mask of the map area (same size as input). While the image was cropped to the meaningful area, some elements need to be discarded (map legend for instance). Map area is indicated by pixels of value
255; only predictions within this area is to be kept.
PNG GREY image containing a mask of expected building blocks (same size as input). Building blocks are indicated by pixels of value
255; all other pixels (background and irrelevant) are set to
Number of elements per set
- train: 1 image
- validation: 1 image
- test: 3 images
Evaluation tools and illustrative notebooks provide participants with more details than the summary below. Please subscribe to updates to be notified when they are available.
For each map sheet, we will extract the connected components from the predicted label mask. Based on this mask, we will compute the intersection over union (IoU) between each ground truth component and each predicted one, and retain only the matches with a value of at least . When the IoU is strictly superior to we have 1-to-1 matches between ground truth components and predicted ones, and this enables the computation of prediction, recall and scores.
We will compute the values for each possible IoU threshold in and compute the area under the resulting curve. Such score will not only be free of any threshold, it will also be insensitive to shape area; thus weighting large and small shapes equally and provided an accurate measure of the number of objects properly detected.
Finally, we will compute the average of the measures for all individual map images to produce a global indicator.
The resulting measure is a float value between 0 and . A high value is better.