Showing 1 notes
A benchmark that tests how well AI coding agents can read web content. 10 tests, 20 points.