A big problem that the researchers found is that “Many benchmarks are not valid measurements of their intended targets.” That ...