How to Distinguish Between Helpful and Harmful Responses