In a previous post I summarized some of the limitations I noted during conversations with an AI-Based “ChatBot”. Before offering my conclusions in my next post, it’s become necessary to reset the context of my study, as a result of learnings, recent technology announcements and wider reporting on the experience of others using these tools.
First, we are witnessing a fast-paced technology evolution. The limitations I observed over the last couple of months in three tools — ChatGPT, You.Com and ChatSonic — appear to have been overcome in Microsoft’s recently available New Bing ChatBot, which I have been testing for the last week. Specifically, its answers are more deterministic and importantly, they appear to be more complete. This speaks to the dangers of generalizing from a small sample set, as well as the dynamic nature and pace of progress of this technology.
However, reviewing the release of the New Bing several publications, including Atlantic Magazine, the Washington Post, the New York Times, CBC, The Toronto Star, among many others, have reported concerns ranging from ethical issues to the bot’s apparent paranoid and possibly malignant character. This speaks to the broader implications of such a complex technology, what some might call the unintended consequences, that need to be worked through. It has been said that not even the programmers know what will come out of these learning machines. I’m left with the impression that results are fluid, or more to my fundamental point, outcomes are non-deterministic, and if this is true, can they be coherent?
My goal was to determine the coherence of images created by text-to-image tools, such as MidJourney. Struggling with the question of how to measure generated images, I chose to explore ChatBots as a surrogate — like a mouse in some drug trial — as it was my assumption that coherence is easier to see in text than in image. An assumption here is that coherence in one technology is equivalent in the other. I don’t have the technical capabilities to determine this to be true or false.
Even though this may be a fundamental flaw, the process of going through the test has been insightful and resulted in me asking the ChatBot what characteristics I should look for to determine the coherence of a painting. Its answer was straightforward and may be obvious:
- Composition: Look for a clear and organized arrangement of visual elements such as lines, shapes, colors, and textures. A well-composed painting should have a harmonious balance between its various elements.
- Color scheme: Look for a consistent color scheme throughout the painting. A painting with a coherent color scheme will use colors that complement each other and create a unified visual impression.
- Style: Look for a consistent artistic style in the painting. A painting with a coherent style will have a consistent use of techniques, brushstrokes, and other stylistic elements.
- Subject matter: Look for a clear and recognizable subject matter in the painting. A painting with a coherent subject matter will convey a clear message or theme that ties all the visual elements together.
- Mood or tone: Look for a consistent mood or tone in the painting. A painting with a coherent mood or tone will evoke a consistent emotional response from the viewer.
- Depth and perspective: Look for a consistent use of depth and perspective in the painting. A painting with a coherent sense of depth and perspective will create a sense of space and dimensionality that ties all the visual elements together.
While I think there are other dimensions to consider, including cultural factors, symbolics and concepts, I feel I have a checklist I can use to evaluate the coherence of generated images.
Leave a Reply