Response is being content_filtered even though every thing is safe and not filtered

Question

I'm getting this response when I call Azure open AI service:

{'choices': [{'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}, 'finish_reason': 'content_filter', 'index': 0, 'logprobs': None, 'message': {'role': 'assistant'}}], 'created': 1725965767, 'id': 'chatcmpl-A5spTE3rMOIkgodzQ6lcKtzQFPY7E', 'model': 'gpt-4', 'object': 'chat.completion', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'system_fingerprint': 'fp_e49e4201a9', 'usage': {'completion_tokens': 167, 'prompt_tokens': 357, 'total_tokens': 524}}

The root cause is the word "muncher" in our prompt. When I changed that to "munch" we are getting good response ('finish_reason': 'stop' and 'content' is in message). But I dont understand why we are getting this even though severity of all four(hate, self-harm, violence and sexuality) is 'safe'.

Answer

Hello Soumith Reddy Aireddi,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you are having discrepancies in the content_filter of your Azure OpenAI.

This will be a misalignment or issue in how the filtering system communicated the status of the content. I will suggest you use clear and unambiguous and since synonyms like "munch" can bypass the filter, it indicates that the prompt’s specific wording can influence the filtering outcome. So, to avoid triggering the filter you will consider rephrasing by review and adjust your prompts to avoid terms that might trigger the content filter.

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful

Answer

Hello Soumith,

Thanks for following up, yes, we have already forwarded this feedback to product team and it should be fixed in next revise, 10/15. If the issue still there after next revise, please let us know.

I hope this helps.

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Share via

Response is being content_filtered even though every thing is safe and not filtered

2 answers

Your answer