new evaluation metrics and improved training