Connectionists: Pseudocode data and search engine
Giles, C Lee
clg20 at psu.edu
Tue Apr 8 08:28:20 EDT 2025
Colleagues
Announcing PseudoSeer: A Pseudocode Search Engine for Scholarly Papers
We are pleased to announce the launch of PseudoSeer, a specialized search engine designed for indexing
and searching pseudocode from scholarly papers. The goal is to enhance the way researchers and academics
access algorithmic content within the academic literature. All pseudocode is found and extracted from
documents in the arXiv. The pseudocode data is online.
Key Features: PseudoSeer offers several advanced search capabilities:
• Contextual Search: Employs LLMs to compute semantic similarity, enabling the retrieval of con-
textually relevant search results. Retrieval performance is further enhanced by leveraging pseudocode
descriptions (i.e., captions) as positive anchors in contrastive representation learning to fine-tune the
LLM-based contextual retrieval model.
• BM25-Based Search: Employing the BM25 ranking function for keyword-based queries.
• Multi-Field Weighted Search: Allowing queries across various document fields using weighted
BM25.
• Advanced Filtering: Offering date, author, and topic-based filters for refined searches.
• Hybrid Ranking: Combining sentence transformer embeddings with BM25 rankings for comprehen-
sive results.
• Exact Query Search: Supporting precise phrase matching.
PseudoSeer indexes more than 150,000 scholarly documents, tailored to the needs of researchers in mul-
tiple fields which include but is not limited to computer science, mathematics, physics and biology.
PseudoSeer can be found at
https://pseudoseer.ist.psu.edu
We hope PseudoSeer will be a useful tool for researchers working with algorithms. Any feedback is welcomed as we continue to improve the platform.
Data can be found at:
https://doi.org/10.7910/DVN/EX2OCT
Related paper can be found at
https://arxiv.org/abs/2411.12649
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20250408/7f93ee9e/attachment.html>
More information about the Connectionists
mailing list