Building GenAI Benchmarks: A Case Study in Legal Applications