ChatGPT MWP Challenge
Can we predict ChatGPT failures on DRAW-1K? Pre-Register for the challenge here!
Metric: Precision when recall=0.5 on predicting ChatGPT was totally incorrect for our Feb. 2023 dataset (Plus results, with work).
Download the slides from AAAI-MAKE or read the paper on Arxiv.
Can use ground truth equations, any data from problem, any text output by ChatGPT
Baseline is 0.78 (Random Forest)
Bonus: Provide an explainable (preferably symbolic) result
Deadlines / submission instructions forthcoming – check AAAI MAKE or this page for updates, pre-registration here.