Connectionists: Call for tasks highlighting limitations of enormous language models

Fri Jan 29 17:23:51 EST 2021

You are invited to contribute a task to the Beyond the Imitation Game
Benchmark (BIG-Bench <http://github.com/google/BIG-Bench>). BIG-Bench will
be a collaborative benchmark intended to probe large language models, and
extrapolate their future capabilities.

Tasks will be submitted to the benchmark as GitHub pull requests, and will
be subject to non-anonymous peer review through discussion on the pull
request. The benchmark will be released at the Workshop on Enormous
Language Models: Perspectives and Benchmarks at ICLR 2021, and published in
an associated paper. All submitters of accepted tasks will be included as
co-authors on the paper introducing the benchmark.

Teams at Google and OpenAI have additionally committed to evaluate
BIG-Bench on their best-performing language model architectures, across
models spanning from tens of thousands through hundreds of billions of
parameters. The results of this evaluation will be released at the workshop
and also included in the associated paper.

We believe that this benchmark is particularly timely, due to dramatic new
capabilities that have recently been demonstrated by scaling up existing
language model architectures (eg in GPT-3, BERT, RoBERTa, and T5). These
results simultaneously raise the importance of understanding what
additional capabilities will be unlocked with even greater scale, and
suggest that performance on most existing benchmarks will be saturated
within the next several years. We need new benchmarks that will measure the
capabilities and limitations of large language models, and that will enable
us to extrapolate their future capabilities.

We particularly encourage submission of tasks by researchers in fields
which probe the nature of language or intelligence, including: linguistics,
cognitive science, philosophy, neuroscience, psychology, animal
intelligence, and logic. We also particularly encourage submission of tasks
by researchers who are skeptical of large language models. If you
understand a limitation of large language models, this is an excellent
opportunity to demonstrate it to the field. We emphasize that tasks which
quantify social bias in language models are in scope for this benchmark.

GitHub pull requests proposing tasks must be initiated by March 5 to
participate in the workshop, and May 14 to be included in the benchmark.
Authors are expected to participate in the review process via discussion on
the pull request.

We are also interested in including the performance of other large language
models in the benchmark announcement paper. If you are a developer of a
large language model, please reach out.

See http://github.com/google/BIG-Bench for more information.

Please forward this email to relevant email lists and interested
colleagues, especially those in non-machine-learning disciplines!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20210129/77515f03/attachment.html>