Connectionists: Fwd: Postdoc at MPI-SWS Saarbrücken in Reliable LLM-based Data Curation Systems

Joël Ouaknine joel at mpi-sws.org
Thu Oct 31 07:49:24 EDT 2024


(see: https://people.mpi-sws.org/~joel/postdoc-LLM-DCS.html )

About the project

We invite applications for a postdoctoral research position in the
Foundations of Algorithmic Verification group led by Prof. Joël Ouaknine.
The successful candidate will work in close collaboration with an
industrial partner, delving deep into the verifications of Large Language
Models (LLMs) based software programs, and contributing to bridging
scientific research and applications.

*Project Insight:* We are embarking on a pioneering project that aims to
develop reliable LLM-based data curation systems for data verification and
data enrichment tasks such as verifying or discovering entity relationships
from textual documents and/or the Web.

An LLM-based data curation system deconstructs complex data problems into
manageable sub-problems, each addressed using LLMs. However, these models
can introduce uncertainties and errors, including hallucinations, which
hinder their adoption in industrial production environments where high
accuracy is critical.

Consider a knowledge graph enrichment system designed to identify or infer
relationships between two entities within a document. This system may
utilize a long-context LLM, capable of processing the entire document, or
employ a Retrieval Augmented Generation (RAG) process, including GraphRAG,
to pinpoint and analyze the most relevant information. However, research
suggests that both strategies can yield inaccuracies, presenting challenges
for their deployment in production environments.

This project aims to propose a verification methodology that ensures the
reliability and accuracy of an LLM-based data curation system at both the
sub-component and whole-program levels.

*Additionally*, the project will focus on several critical research areas:

   1. Effective retrieval of pertinent information from documents.
   2. Balanced integration of RAG and long-context LLMs to mitigate
   trade-offs.
   3. Detection and correction of "hallucinations" or incorrect inferences
   by LLMs.
   4. Verification of LLM-based reasoning to ensure result accuracy.
   5. Optimization of overall system efficiency.

The postdoctoral researcher will contribute to defining the methodology and
develop and refine this approach, assisting in the development of a system
optimized for data curation using LLMs.

*Focus of the position:*

   1. Research and development of innovative verification methods to ensure
   the reliability and accuracy of LLM-based data curation programs.
   2. Actively collaborate with industrial partners and engage in creative
   design and development of an LLM-based data curation system.

While the successful candidate will be hired by, and work at, the Max
Planck Institute for Software Systems in Saarbrücken, frequent
collaborations with, and visits to, research partners, in particular TU
Wien (Vienna, Austria), UCL (London), University of Calabria (Rende,
Cosenza, Italy), and to industrial partners are necessary. In addition, the
successful candidate is expected to spend one or more internships in
industry. The project will build on methods and software provided by our
industrial partners. We are thus looking for a candidate who is keen and
able to liaise with industry, and who is interested in transformational
research, working on practical problems of industrial relevance.

Your qualifications and responsibilities

*Required:*

   - A PhD degree (earned or near completion) in algorithmic verification,
   machine learning, information extraction, large language models, databases,
   knowledge graphs, or a related field.
   - Strong algorithm design and coding skills, along with proficiency in
   popular ML development frameworks such as TensorFlow, PyTorch, and
   frameworks for building LLM-based applications, such as LangChain and
   LlamaIndex.
   - A thorough understanding of Large Language Models' underlying
   techniques and experience in fine-tuning or customizing such models.
   - Research publications in top-tier journals or conferences. In
   exceptional cases, industry experience with a solid background in
   industry-based software engineering that has led to highly innovative
   products or results could partially or fully replace the publication
   requirements.
   - Ability and willingness to liaise with industrial partners and to work
   on problems of practical relevance.
   - Ability to supervise students and/or research assistants.
   - Proficiency in written and spoken English. (Knowledge of German is not
   necessary.)

*Beneficial:*

   - Proficiency in RAG or GraphRAG-related techniques, along with
   experience in building RAG-based applications.
   - Research experience in topics relevant to generating accurate results
   with LLMs, including hallucination detection and correction.
   - Relevant experience in fields such as Information Extraction from
   Unstructured Text, Knowledge Graph Enrichment, Databases, or Fuzzy Logic.
   - Industrial experience, particularly experience in areas like Big Data
   Engineering and MLOps, coupled with familiarity with cloud services such as
   AWS.
   - A product-oriented mindset and product design capabilities.
   - Experience with software verification.
   - Experience leading teams or projects, as well as supervising junior
   developers or researchers.

For informal enquiries, please contact Prof. Joël Ouaknine (joel at mpi-sws.org
).

To apply, please send a cover letter and CV by email to Ms. Lena Schneider (
lschneid at mpi-sws.org).

Applications will be reviewed until a suitable candidate is found. To
ensure full consideration, please submit your application on or before
*25 Nov. 2024*. We expect to hold online interviews in early December 2024.

-- 
*Joël Ouaknine*
Max Planck Institute for Software Systems
Saarland Informatics Campus, Germany
http://mpi-sws.org/~joel/ <http://people.mpi-sws.org/~joel/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20241031/ff6d3270/attachment.html>


More information about the Connectionists mailing list