# Motivation of CLEAR Benchmark

## Why Continual Learning?

Most of the successes in nowadays vision and learning community are achieved on **static benchmarks** that did not change ever since they are released:

![ImageNet (2010) and COCO (2015) are the modern test stones of visual recognition and detection algorithms. However, they did not model the temporal dynamic aspect of real world.](/files/3B2btJthzEuIK5YrqdJ5)

In real world, such "IID" assumption does not usually hold. Therefore, researchers have put efforts in the field of **continual (or incremental/lifelong) learning**, aiming for learning systems that are **more robust under distribution shifts**.&#x20;

Yet, most of the existing works focus on combatting the **catastrophic forgetting** nature of neural networks, a phenomenon commonly observed on popular continual benchmarks with **extreme distribution shifts between tasks** such as "*Permuted-MNIST*", "*Split-CIFAR*", "*Incremental-ImageNet*", and so on..

![Popular continual learning benchmarks that do not align with practical applications.](/files/a9NBfPARwCaLkbgnfigM)

Made from existing vision datasets, these benchmarks usually contain **synthetic distribution shifts** via randomly shuffling pixels, or splitting labels into disjoint subsets. Instead, we posit that **a more practical continual learning benchmark should reflect how the real world is changing**, such as when AVs moving to a new city, and when seeing brand new car models:

![Examples of real world distribution shifts for AVs.](/files/EOaaE2xNWErf1OcNhxJ1)

## Temporal Evolution of Visual Concepts

In the context of visual recognition, we observe that a lot of visual concepts in Internet imagery are evolving over time, i.e., temporal evolution of visual concepts.

![The visual concept of "computer" naturally evolved from 2004 to 2014 as laptops became more popular than bulky desktops.](/files/SznQFyHT6EQ0cixJHCxt)

Therefore, we propose to make the CLEAR benchmark featuring such natural continual learning scenarios. We select dynamic visual concepts that are common in Internet image collections to form the label space of [CLEAR-10](/the-clear-benchmark/documentation/download-clear-10-clear-100.md#clear-10-s3-download-links) and [CLEAR-100](/the-clear-benchmark/documentation/download-clear-10-clear-100.md#clear-100-s3-download-links).

![Label space of CLEAR-10 and CLEAR-100.](/files/7b9y3HmxxHnD3DuweeRy)

We will discuss next how we curate the CLEAR benchmark with an efficient visio-linguisitic dataset curation approach, as well as some of the valuable assets made available for the vision\&learning community.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://linzhiqiu.gitbook.io/the-clear-benchmark/introduction/motivation-of-clear-benchmark.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
