The Agentur für Innovation in der Cybersicherheit GmbH (Cyberagentur) invites you to:


Partnering Event

"Holistic Evaluation of Generative Foundation Models in a Security Context (HEGEMON)"



31 July 2024, 1:30 - 3:30 pm


Please register until 26 July 2024 using this form.

THE SUBMISSION DEADLINE FOR THE PRESENTATION SLIDES IS 28 July 2024.

Further information below




"Holistic Evaluation of Generative Foundation Models in a Security Context (HEGEMON)"


Background

Generative AI applications such as ChatGPT or Midjourney are currently attracting a great deal of attention. These models can be used in a wide variety of application areas without prior technical knowledge, as they can generate complex and multimodal outputs (e.g. text, image, audio, video) based on free input (prompts). The increasing adaptation of generative AI models in the domains of internal and external security is foreseeable in view of the great application potential. The foundation models behind generative AI applications are mostly trained by private-sector companies, mostly in the USA and China, at great expense and can then be used for many tasks with little additional training. Their underlying data sets, training mechanisms and model architectures are usually not (or no longer) published. In terms of the security context, the high application potential is therefore offset by a currently high level of technological dependency and risks in terms of cyber and application security.

Evaluations and comparisons in the form of benchmarks are useful for improving the assessment of the properties of externally trained models. However, due to the high versatility and unstructured outputs of these models, they represent a complex problem that takes on additional urgency in the security context. In view of the recent strong growth in the capabilities of large AI models, holistic benchmarking in particular remains an open and increasingly relevant research question.

Aim

The aim of the competition is to develop comprehensive benchmark sets - consisting of tasks, metrics and suitable test data sets - that enable a holistic evaluation of pre-trained generative AI base models (e.g. text-image models) for a given use case. In addition, foundation models are to be adapted to this use case (fine-tuning or in-context learning), evaluated with the help of the various benchmarks developed and implemented in the form of an application demonstrator. In addition, conceptual insights are to be gained into the fundamental problem of evaluating universally applicable AI systems in particular.

Disruptive Risk Research

The development of the benchmarks and adaptation of the basic models as well as their demonstrator implementation are carried out in a unique competitive constellation in which each participant is in direct comparison with all other participants both in terms of benchmark and model development. Each model is evaluated and ranked against all benchmarks developed - both in-house and third-party developments. All benchmarks are also evaluated separately in terms of their characteristics. There is a possibility that no sufficiently suitable evaluation mechanisms will be found for certain AI systems under certain (holistic) requirements, as each benchmark is specific, finite and contextual ist.

Contact:

hegemon@cyberagentur.de