List of Data Sets

identifier Preview has imaging details has biological terms has format has Ground truth

Systems Science of Biological Dynamics database (SSBD:database)


Diverse microscopes (mostly light microscopy, but some electronic microscopy as well). SSBD tries to store and curate 4D datasets, ie images that are 3D together with time element. 

It also stores all the qualitative data (i.e. segmented data or ROIs) separately as numerical datasets. Quantitative data are represented by using a unified data format, the Biological Dynamics Markup Language.

Nuclei segmentation in histopathology images

1602 histopathology, nuclei

The dataset contains ground truth annotation for the segmentation of the nuclei.

MoNuSeg - Multi-organ nuclei segmentation challenge

1601 histopathology, nuclei

FAIRsharing Eurobioimaging collection

1579 Fairsharing


Some have ground Truth available, such as the BBBC Broad Bioimage Benchmark Collection. 

muscle cross-sections

1577 mosaick

Immunofluorescent sections were imaged on a Nikon AR1 confocal or Nikon Widefield CCD Microscope. Each confocal image is a composite of maximum projections, derived from stacks of optical sections.

Muscle Stem Cells, Muscle tiff

NO. Data used to exemplify muscleQNT: Muscle fiber counting

Mouse embryos

1576 embryo DIC

There are 15 images. The images were acquired using a Nikon Eclipse TE200 microscope with a 20x, 0.45 NA objective lens and a 0.52 NA condenser lens, and are provided courtesy of the W.M. Keck 3D Fusion Microscope Facility at Northeastern University. Each image contains 640 x 480 pixels with an approximate size of 0.42 x 0.42 μm.

embryo, cells tiff

For the purpose of collecting ground truth, the samples were Hoechst-stained and imaged by confocal microscopy, and the cells were counted by a simple human. A tab-delimited text file contains cell counts in each of the 15 images.

two-photon images of dendritic spines

1575 MIP example

Two-photon imaging was performed using a galvanometer-based scanning system (Prairie Technologies, acquired by Bruker Inc.) on an Olympus BX61WI equipped with 60X water immersion objective (0.9 NA), using a Ti:sapphire laser (Coherent Inc.) controlled by PrairieView software at 910 nm. Z-stacks (0.3 μm axial spacing) from secondary or tertiary dendrites from CA1 neurons were collected every 5 min for up to 4 h. The field of view was 19.8 × 19.8 μm at 1024 × 1024 pixels.

Dendritic Spine

Annotated data and mask labels provided

Human colon tissue

1574 Colon tissue

The dataset was generated using the virtual microscope imitating the microscope Zeiss S100 (objective Zeiss 63x/1.40 Oil DIC) attached to confocal unit Atto CARV and CCD camera Micromax 1300-YHS.

tissue, cells tiff

Ground Truth is provided as binary masks.

Clustered Cell Nuclei Data

1573 HL60 cells

The dataset was generated using the virtual microscope imitating the microscope Zeiss S100 (objective Zeiss 63x/1.40 Oil DIC) attached to confocal unit Atto CARV and CCD camera Micromax 1300-YHS

nuclei, HL60 cells tiff

Ground Truth is provided as segmentation mask.

CSIRO science image library

1571 plant, textiles, minerals

Breast Cancer Histopathological Database (BreakHis)


histopathology images  - 700X460 pixels,RGB 8-bit images stored as PNG

breast cancer, histopathology

Image patch classified as benign or malignant

Zebrafish larvae - Widefiled/Brightfield

1559 zebrafish tif

Medaka embryo in 96 well plate - Widefield Brightfield

1558 Preview medaka, embryo jpg

ANHIR: Automatic Non-rigid Histological Image Registration

1472 Logo pathology

The Ground truth is denoted using landmarks - key points that are marked consistently for each set of images with different stains.


1434 Drosophila Kc167 cells

There are 10 fields of view of each sample, for a total of 50 fields of view. The images were acquired on a Zeiss Axiovert 200M microscope. The images provided here are a single channel, DNA. The image size is 512 x 512 pixels. The images are provided as 8-bit TIFF files.

Drosophila melanogaster, RNAi tiff

A tab-delimited text file contains the number of cells in each image, as determined by two different human counters. To compare an algorithm's results to these, first compute for each sample the algorithm's mean cell count over the 10 images of the sample. Next, calculate the absolute difference between this mean and the average of the humans' counts for the sample, then divide by the latter to obtain the deviation from ground truth (in percent). The mean of these values over all 5 samples is the final result.

Note: The two human observers vary by 16% for this image set.

Diadem challenge

1191 Diadem challenge

Imaging varies by neuron type and species, includes:

  1. Transmitted light brightfield
  2. Confocal
  3. in-vivo 2-photon laser scanning
neurite arbor

Manually traced digital neural reconstructions,

CREMI challenge

1182 cremi samples A and B

Serial section Transmission Electron Microscopy (ssTEM).

  • Neuron segmentation
  • Synaptic cleft segmentation
  • Synaptic partner annotation

Arabidopsis plants (low resolution)

1146 arabidopsis plant (low resolution) Arabidopsis thaliana jpg

Arabidopsis plants (high resolution)

1145 arabidopsis plant (high resolution) Arabidopsis thaliana

leaves stained with gfp and rfp

1143 leave infection image plant virurs, leaf infection

Artemia color images

1139 artemia color image Artemia tif

Microtubules 3D

1137 3D microtubules microtubules

Arabidopsis thaliana seedlings

1131 thumbnail of Arabidopsis thaliana seedlings Arabidopsis thaliana seedlings jpg

2D bright field yeast cell images with ground truth annotations

41 2D bright field yeast cell images

Bright field microscopy



  1. binary segmentation masks
  2. in-focus cells' center point coordinates text files