Response is being content_filtered even though every thing is safe and not filtered

Soumith Reddy Aireddi 0 Reputation points
2024-09-10T11:29:40.0166667+00:00

I'm getting this response when I call Azure open AI service:

{'choices': [{'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}, 'finish_reason': 'content_filter', 'index': 0, 'logprobs': None, 'message': {'role': 'assistant'}}], 'created': 1725965767, 'id': 'chatcmpl-A5spTE3rMOIkgodzQ6lcKtzQFPY7E', 'model': 'gpt-4', 'object': 'chat.completion', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'system_fingerprint': 'fp_e49e4201a9', 'usage': {'completion_tokens': 167, 'prompt_tokens': 357, 'total_tokens': 524}}

The root cause is the word "muncher" in our prompt. When I changed that to "munch" we are getting good response ('finish_reason': 'stop' and 'content' is in message). But I dont understand why we are getting this even though severity of all four(hate, self-harm, violence and sexuality) is 'safe'.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,949 questions
Azure AI Content Safety
Azure AI Content Safety
An Azure service that enables users to identify content that is potentially offensive, risky, or otherwise undesirable. Previously known as Azure Content Moderator.
16 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Sina Salam 10,176 Reputation points
    2024-09-10T22:41:00.9733333+00:00

    Hello Soumith Reddy Aireddi,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having discrepancies in the content_filter of your Azure OpenAI.

    This will be a misalignment or issue in how the filtering system communicated the status of the content. I will suggest you use clear and unambiguous and since synonyms like "munch" can bypass the filter, it indicates that the prompt’s specific wording can influence the filtering outcome. So, to avoid triggering the filter you will consider rephrasing by review and adjust your prompts to avoid terms that might trigger the content filter.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful


  2. YutongTie-MSFT 50,866 Reputation points
    2024-09-16T07:16:40.8266667+00:00

    Hello Soumith,

    Thanks for following up, yes, we have already forwarded this feedback to product team and it should be fixed in next revise, 10/15. If the issue still there after next revise, please let us know.

    I hope this helps.

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.