OpenAI, the leading AI research organization, has been engaged in extensive discussions and deliberations regarding the release of a tool capable of discerning whether an image has been created using DALL-E 3, their generative AI art model. According to Sandhini Agarwal, an OpenAI researcher specializing in safety and policy, while the accuracy of the classifier tool is deemed “really good,” it has not yet met OpenAI’s stringent quality standards.
Key Takeaway
OpenAI is grappling with the challenge of releasing a reliable tool that would have a significant impact on the interpretation of images, particularly determining whether they are authentic or misleading.
Mira Murati, OpenAI’s chief technology officer, mentioned during the Wall Street Journal’s Tech Live conference that the classifier tool exhibits an impressive 99% reliability in identifying unmodified photos generated by DALL-E 3. Although OpenAI’s goal could be to achieve 100% accuracy, Agarwal declined to confirm. Notably, the classifier maintains an accuracy of over 95% even when images undergo common modifications like cropping, resizing, or JPEG compression, and when elements from real images are superimposed on generated images.
The hesitation from OpenAI might stem from past controversies surrounding its public classifier tool used for detecting AI-generated text. OpenAI had to withdraw the tool due to its low accuracy rate, which had received widespread criticism. Agarwal hints that the philosophical question of what constitutes an AI-generated image is also causing deliberations. While artwork produced entirely by DALL-E 3 is undeniably AI-generated, the classification becomes uncertain when images undergo multiple iterations, merge with other images, and undergo post-processing enhancements.
Agarwal emphasized the importance of gathering feedback from artists and individuals significantly impacted by such classifier tools to navigate these complex questions. Interestingly, other organizations are exploring watermarking and detection techniques to address the rise of AI deepfakes. For instance, DeepMind has proposed SynthID, a method to mark AI-generated images that are imperceptible to humans but detectable by specialized detectors. Startups like Imatag and Steg.AI also offer watermarking tools claiming resistance to image modifications.
However, the industry has yet to establish a unified watermarking or detection standard, which further raises concerns about their reliability. When asked about OpenAI’s image classifier supporting images generated by non-OpenAI tools, Agarwal did not commit to a definitive answer. Nevertheless, she acknowledged that depending on the reception of the current image classifier tool, OpenAI might explore expanding its capabilities to include detection of images created by other generative tools in the future.