The CLEAR Benchmark
  • The CLEAR Benchmark: Continual LEArning on Real-World Imagery
  • Introduction
    • Motivation of CLEAR Benchmark
    • Visio-Linguistic Dataset Curation
    • 📊Evaluation Protocol on CLEAR
    • 🚀1st CLEAR Challenge (CVPR'22)
    • About us
  • Documentation
    • Download CLEAR-10/CLEAR-100
    • Avalanche Integration
Powered by GitBook
On this page
  1. Introduction

Evaluation Protocol on CLEAR

How to faithfully evaluate a continual learning algorithm on CLEAR?

PreviousVisio-Linguistic Dataset CurationNext1st CLEAR Challenge (CVPR'22)

Last updated 1 year ago

For bucket 1st to 10th that each comes with an annotated labeled trainset, we also release a held-out testset over the same timespan (now downloadable ).

Evaluating on CLEAR is the same as on any other continual learning benchmarks: we measure the performance of a model per timestamp on all the 10 testsets. This produces a 10x10 accuracy matrix.

Different parts of the accuracy matrix focus on different aspects of the performance of a model. For example, the diagonal entries represent performance on a testset that is sampled from the same distribution as the current bucket (assuming each bucket is a locally iid distribution). Lower triangular part of the matrix instead focus on test performance on previously seen buckets, which is the focus of most state-of-the-art works combatting the forgetting issue. Therefore, we introduce 4 simplified evaluation metrics to summarize the accuracy matrix:

  1. In-Domain Accuracy: The average of diagonal entries, i.e., test performance within the same domain of current bucket.

  2. Next-Domain Accuracy: The average of super-diagonal entries, i.e., test performance on the immediate next domain.

  3. Backward Transfer: The average of lower triangular entries, i.e., test performance on previously seen domains.

  4. Forward Transfer: The average of upper triangular entries, i.e., test performance on future unseen domains.

We hope that our proposed metrics can simplify evaluation on CLEAR and similar domain-incremental benchmarks; nonetheless, CLEAR can also be repurposed for task-/class-incremental scenarios, which could be exciting future works.

📊
here
A visual illustration of the 4 evaluation metrics on a 4x4 accuracy matrix.