The CLEAR Benchmark
  • The CLEAR Benchmark: Continual LEArning on Real-World Imagery
  • Introduction
    • Motivation of CLEAR Benchmark
    • Visio-Linguistic Dataset Curation
    • 📊Evaluation Protocol on CLEAR
    • 🚀1st CLEAR Challenge (CVPR'22)
    • About us
  • Documentation
    • Download CLEAR-10/CLEAR-100
    • Avalanche Integration
Powered by GitBook
On this page
  1. Introduction

Visio-Linguistic Dataset Curation

How to efficiently annotate a million-scale Internet image collection?

PreviousMotivation of CLEAR BenchmarkNextEvaluation Protocol on CLEAR

Last updated 1 year ago

We start from the public collection that contains Flickr images uploaded between 2004 and 2014. We downloaded a 8M subset to build CLEAR-10, and a 40M subset to build CLEAR-100.

We use the upload time to recreate the temporal stream and split the 8M/40M into 11 buckets of images, each spanning on average 1 year. The 0th bucket is reserved for unsupervised pre-training (e.g., MoCo), and we curate a small but high-quality labeled set (with CLEAR-10 and CLEAR-10 ontology defined ) for each of the 1st to 10th buckets.

To avoid excessive human annotation cost on web-scale data, we use the visio-linguistic dataset curation approach proposed in our . The key idea is to leverage OpenAI's recent model and prompt engineering techniques for efficient image retrieval. The top-scoring images retrieved by CLIP are later verified by human annotators to ensure 99% precision (crowdsourced workers for CLEAR-10 and c for CLEAR-100).

The entire pipeline can be summarized:

Assets for Future Research

In addition to the high-quality labeled subset, we also release a wealth of assets per time bucket for future research on continual learning, including:

  • Abundant unlabeled data: For research on unsupervised continual learning. This includes ~0.8M unlabeled images per bucket for CLEAR-10, and ~3.6M unlabeled images per bucket for CLEAR-100.

  • Metadata: For research on continual multi-modal learning. This includes all the YFCC100M released metadata such as upload and captured timestamps, captured location, social media hashtags, user description, image title, and etc.

Instruction sets for human annotation: For improving dataset transparency. CLEAR-10 instruction set on MTurk platform can be found in the supplemental of . CLEAR-100 instruction set can be found here ().

Check out this if you want to make your own dataset out of YFCC100M with a visio-linguistic approach.

NeurIPS'21 paper
Chinese ver.
repo
YFCC100M
NeurIPS'21 paper
CLIP
MTurk
ommericial labelling service
here
Overview of this page