ChatGPT MWP Challenge

Can we predict ChatGPT failures on DRAW-1K? Pre-Register for the challenge here!

Metric: Precision when recall=0.5 on predicting ChatGPT was totally incorrect for our Feb. 2023 dataset (Plus results, with work).

Download the slides from AAAI-MAKE or read the paper on Arxiv.

Can use ground truth equations, any data from problem, any text output by ChatGPT

Baseline is 0.78 (Random Forest)

Bonus: Provide an explainable (preferably symbolic) result

Deadlines / submission instructions forthcoming – check AAAI MAKE or this page for updates, pre-registration here.

Click here to pre-register for the challenge!