[ACT-R-users] Turing Test (Cyber User) Prize Challenge
Buchler, Norbou CIV USARMY CCDC ARL (USA)
norbou.buchler.civ at mail.mil
Thu Nov 21 15:43:34 EST 2019
(submitted by Norbou Buchler & Vladislav "Dan" Veksler, both U.S. Army Data & Analysis Center)
Events
The Turing Test Prize Challenge (Cash Award)
https://dreamport.tech/prize-challenge-the-turing-test.php
- details forthcoming on their website / newsletter
Date: Spring, 2020 | Location: DreamPort Facility in Columbia MD
Overview
In honor of Alan Turing, DreamPort is announcing a prize challenge event for interested parties to develop a 'Turing Test'. In 1950, Alan Turing conceived of the idea to identify in a conversation between two entities, which entity was a robot, and which was human. Our DreamPort Turing Test will be a challenge event where participants develop a stand-alone automated process to interact with a Microsoft Windows machine just as a human user may do with the goal being to fool a human judge who is monitoring target computers via Remote Desktop Protocol (RDP) or Virtual Network Computing (VNC) into thinking a normal user is interacting with that machine and not an automated program or process. This challenge requires participants to develop a complete stand-alone solution on their own for intermediate and final demonstration. Work on your own, at your facility, home and then demo your solution at DreamPort! A final winner will be chosen a team of DreamPort and government personnel and the resulting product will be open-soured so it may not include copyrighted material.
To be clear, we want a developer or team to produce a complete program that can interact with a Windows client mimicking a normal user. Before you ask, do not provide AutoIt and call it a day. We are going to a whole new level. Your solution must deliver on the following features:
•Must take a script input of actions to execute that include:
◦Start a process or program
◦Check if a process/program is running
◦Close running process
◦Hide Desktop/Show Desktop
◦Find specific icon on desktop (by title or word)
◦Moving the mouse to X,Y coordinates
◦Move mouse to a specific element (by title, string or object type)
◦Open window (based on title bar), maximize or minimize (if visible)
◦Left or right click the mouse buttons
◦Type English keystrokes using SendKeys syntax
◦Type English keystrokes (normal syntax)
◦Type random keystrokes
◦Type a single random English word (see below regarding 'grammar')
◦Type a sentence of random English words including a period (proper grammar is not required but is desired)
◦Authenticate to a remote computer (remote desktop, file share)
•Must include a delay feature for any event executed (specify in milliseconds)
•Must include support for a machine-readable configuration (how to specify account details, passwords, etc.)
•Must include a 'mimic mode' which will record all GUI actions of a human user and store for replay ◦This mode must include delays between actions in the replay.
◦There are no restrictions on file format for a period of mimic of a real user but the format must be shareable between instances of this software
•During execution of a script, must provide option record or save text, labels of programs being interacted with for later review.
•Must include a learning mode (see below, this is how participants will shine)
•Must include a remote API allowing a remote user to invoke any of the previous events (allow a user to specify which network address to listen on)
•Must include an HTML-5 compliant remote user interface for invoking features
• ◦Must be password protected
◦Must be able to control multiple instances of software from single interface
◦Must be SSL encrypted (self-signed cert is acceptable for satisfaction of requirements)
Participants should plan on using both a fully patched Microsoft Windows 7 Ultimate and Windows 10 client as the target environments and will have time to interact with the target systems for test or instrumentation. Beyond this, we will not provide any advance information on what installed software will present on the targets in advance. There are no restrictions on the type of implementation (e.g. script, program, etc.) that a participant may use so long as the final solution is stand-alone (e.g. run from USB). Participants can assume their program, script or service will have system level access to the target machine. The judges will not be allowed to browse the hard drive(s) of the target machine(s) or view the System Manager (process listing) of the target(s). They may only watch the desktop, but this means they can view the system tray if they choose.
Grammar Support
Those of you who have worked on HTML, CSS and JavaScript, graphic design or advertising will remember the filler text "Lorum Ipsum". This text is used in certain automation situations with regards to testing software. When we say the candidate solution must support typing English grammar and words, we do not want this filler text in the final solution. You are encouraged to include this filler text as an intermediate step of development, but we want a solution which takes steps to include a parse-able English dictionary file and natural language processing (NLP) features. Provide an API that can type a single word without telling the system what word to type. In addition, provide a feature that can type whole sentences without repeat. You are free to explore using large word sources other than or in addition to the English dictionary.
Who wants to break the mold and provide a trained model of phrases based on input words? Provide an API feature that types a declarative statement or an argument or heated exchange or 'leet speak'?
At a minimum you must provide:
•Ability to type a one or more complete sentences in English that includes a subject, verb and object.
•Ability to type a one or more words in English.
Learning Mode
The ultimate desired feature for this solution is what we call 'Learning Mode'. If you are going to play in this challange, we want you to build upon the mimic mode for a completely new concept. Let's say your solution records a user browsing the Internet for ten to twenty minutes. First you must be able to replay this later based on input events (e.g. what pages did the user browse, what links did they click on) but in the case of a web page advertisement that was dynamic and no longer present upon replay, your solution should not err out. But we want your solution to learn. Can your solution replay similar actions given a different starting event? What if the web pages the human browsed were related to news, could you browse a set of pages related to sports? What if the original software that was opened was Microsoft Word, could you replay similar actions (e.g. typing text) in Microsoft Notepad or Wordpad? Could you execute commands in PowerShell if the Command Prompt was originally opened?
In a larger sense, how do you classify what actions someone performs on a computer? Can you classify them in a such a way that you can replay a similar event? You should assume you can scan what applications are installed before your solution runs.
To be specific here, the winner solution will given a mimic file, be able to execute similar actions define as:
•A web browsing mimic event should result in a learned approach to browsing the web with a new browser
•A web browsing event of a specific URL (plus additional URLs as a series of links) should result in a completely different series of links (you must click on at least 50% of the same number of links as the mimic event but from a different website)
•An event that opens a text document or office file should result in the creation of a new file from the same or similar application (Word is similar to Notepad, Notepad++, Wordpad, LibreOffice; Excel is similar to Word, LibreOffice;Client-side applications are similar to Google Docs and Office 365)
•An even that opens a command prompt and execute a command should result in the execution of a different command that will not cause the computer to reboot
Evaluation
Evaluation will take place in 2 stages. First at around the halfway point of the challenge, you must schedule and come to DreamPort for a 30 min demonstration of your solution and progress. At this time, you must show progress against the requirements we set forth and provide a demonstration of your effort. While we won't stipulate how much progress is necessary to have completed at that time please know we may disqualify you from final evaluation if you have not made any progress or if your solution will not be able to deliver on the final desired end state.
The final evaluation will be a demonstration and review by a team of DreamPort and US Government personnel. First, each solution will be run for a team of reviewers. Each participant will run their solution on an identical copy of the same stand-alone virtual machine (connected to the Internet) while reviewers watch. Participants will be given a general description of actions to mimmic to No one participant will be identifiable and intermixed in between solutions will be one or more human users driving the same exact virtual machine. The goal of the reviewers is to spot the robot among the humans. The teams whose solution last the longest are considered final winning candidates and a winner will be selected from this group by a selection of experts from DreamPort and the United States government.
Expected Solution
The expected solution from this RPE is a configurable, functional prototype which can interact with a Microsoft Windows installation in a realistic way. No solution exists that can automate and imitate every aspect of a human interacting with a computer, but we want to advance the state of the art in that direction.
Solutions can be scripted or compiled but they can only rely on the built-in functions of Microsoft Windows or resources installed along with the solution (e.g. configuration files, executable code, etc.). Solutions can require special items or hardware be attached to the test system using commonly available interfaces only (e.g. USB 2 or 3).
You must provide the source code for the final solution which will become property of DreamPort but please note the intended outcome is to open source the end product after consultation with government personnel.
Suggested Skills
This RPE requires participants have the at least intermediate the following skills:
•Windows API (e.g. User Interface Automation)
•C/C++, C#
•.NET Framework (if you use C#)
•Visual Studio or Visual Studio Code
The following skills are suggested:
•UIA
•PowerShell
•VBScript
•Machine Learning
•AutoIt Scripting
•Natural Language Processing (NLP)
There will be a cash award for this challenge. More details to come on the website/newsletter, so stay tuned!
More information about the ACT-R-users
mailing list