Evaluating neural toxic degeneration in language models

This paper highlights how language models used to automatically generate text produce toxic, offensive and potentially harmful language. They describe various techniques that can be employed to avoid or limit this, but demonstrate that no current method is failsafe in preventing this entirely.

Share this resource:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.