Category: Easy Wins

These are posts that go into the Easy Wins blog.

  • How “AI” Image Generators Work

    How “AI” Image Generators Work

    I’m writing this article to help artists and users understand what the current crop of AI image generators work do and how they work. I won’t dive into technical details. This explanation is “right enough” for a 5,000 foot discussion of how to work with the technology and address high level concerns, such as whether training content is “stolen”.

    I’ll be using the picture attached to this post for discussion. It was generated by xAI’s Grok using the following prompt:

    Mona is a 6 year old white Boxer dog. 
    Johnny is a 2 year old Jack Russell Corgi mix. Johnny is black and white with brown highlights. 
    Eeyore is the popular children's book character.
    Paul Bunyan is the popular children's book character.
    Make a picture where these characters play a high stakes game of go fish to raise money for renovating the dog park.

    I used a similar cue — “Make a story” — with a private IBM Granite 3 Instruct LLM running on my laptop to create a delightful story that the picture will accompany.


    Delight and Realism

    The first thing to notice about the picture is that it truly is a delightful depiction of a scene I had in mind. The Boxer looks passably like Mona, or at least her stunt double. The little dog looks amazingly like Johnny. More on that in a future post. Paul Bunyan and Eeyore are out of focus behind the main scene. Johnny and Mona seem to be playing a hybrid of Go and some card game. Literalist critics would derisively call that a “hallucination”, but to me, that’s delightful and funny. You are allowed to smile when you use these tools!

    The second thing to notice is that it’s got a well-executed photo realistic feel to it. Obviously, it’s not real, but with costumes, stage dog training, and fortunate timing, you could imagine capturing a shot like that. A Photoshop hack like me or even a professional would have a hard time putting that scene together seamlessly.


    Visualization

    The first conclusion you should make about the technology is that it is good at bringing ideas to life. If you can express the idea in a few sentences, it can probably render something close, even close enough. That makes it a powerful tool for both visualization and prototyping.

    While it might feel wrong to use an AI image generator for a final, professional product, and might cast doubt on the workmanship that went into that product, it is quite valuable in the iterative process of drafting and refining. This applies to all kinds of things activities that generative AI can do, from pictures to stories to analysis to code and beyond. Generative AI is excellent at providing samples and templates that can serve as placeholders for production work or even adapted to finished products.


    How It’s Made

    The picture itself is created using a model and an algorithm. The model is a large collection of weights, billions of statistical measurements about billions of images in a training set of images. The algorithm is called diffusion. It starts with random pixels — think of a noise pattern on an old tube television, like the picture on the left, courtesy of Grok.

    The algorithm iteratively changes the pixels so that the picture looks more and more like what was requested. The picture on the right is that same old tube television tuned into another picture of Mona and Johnny playing Go Fish. That mashup was assembled in Photoshop.

    “Looks more and more like…” — That is the key! Again, at a high level, it takes each step between all noise and all picture, and it “asks” the model if it looks more like what is described in the prompt. E.g. Is there a white Boxer dog? Is there a Jack Russell Corgi mix, black and white with brown highlights? Are they playing Go Fish? The things that can answer these yes/no questions are called “classifiers”. They are part of the model.

    When the model is trained on millions or billions of images, each image has a description, often in English, associated with it. It could also have a more structured description, with pre-assigned category names, etc. The original work of classifying images is/was done by humans, and is often very low paying work done by very low-skilled workers. That may or may not give you pause, or paws if you’re a dog. The situation is more nuanced than any critical documentary shows.

    To get a sense of how the classifiers work, consider the image at left. That’s not a Boxer. It’s a white Siberian Husky. As the picture changes from noise to a finished image, a Boxer classifier would keep it from evolving this way. But a Siberian Husky classifier would move the picture to a result like this one.

    Notice what Grok did with “go fish” for this image. Someone cut all the heads off and cleaned them! There is a lot of delight in the ambiguities of our English language. Image and language models are more than happy to surface them!


    Elements are Random, not Copied or Directly Computed

    The most important thing to know about the generated images are that the elements — such as Mona, Johnny, Eeyore, the fish, etc. — are randomly generated to pass the classifier tests, not copied from any of the original training images. If there is a large, diverse subset of all images in the model in the “yes” images for a classifier, there will be plenty of unique variation in the outputs of generated components. Each component is better thought of as a freehand interpretation of all the “yes” images for its classifiers than a “composite” or “average”. They are most definitely not copies in production scale models.

    Creative people and copyright holders are genuinely concerned about unlicensed use of their images for training these models. The argument of the model makers is that US copyright law enumerates the rights a creator has, and doing math on an image to help build a classifier is not one of those rights. Model builder point out that the images themselves do not become part of the model, and are unlikely to be generated by the model.

    Creative people also argue that these generators compete with their services. While proponents have no legal need to make a case against replacement, they point out something you can fairly assume from the example in this post. Nobody — not even me — is going to pay for a visualization of Johnny, Mona, Eeyore, and Paul Bunyan playing Go Fish. It is not an activity that is worth the time of a creative person.

    I side (vociferously) with the model builders, as you might surmise from my description of how the process works. Copyright is intended to be a limited right, and we need to be vigilant to keep it limited. You’re welcome to pick whichever side resonates with you. The technical legal issues are working their way through courts as you read this.

    Now… Just because the model itself and the process of making the model doesn’t violate copyright, you certainly can by using the model. If you publish a generated image that has a protected character, such as Sonic the Hedgehog in it, you may be (read: “are likely”) violating the copyright of Paramount Pictures or the original video game developer. And that’s a good reason to limit generated images to your own personal, non-commercial use or as prototypes that won’t be distributed to the public. You’ll notice that I use characters that belong to me (my dogs) or characters that are now in the public domain, like Eeyore and Paul Bunyan. Standard “I am not a lawyer” disclaimer for this whole section.


    Let’s Discuss!

    I hope this very high level explanation has helped you understand why image generators like Grok can produce so much pure delight, and how they work at a high level. The details are very interesting too, but you don’t need to know the details to have a sense of what these systems do to create delightful images.

    I’ve posted a link to this post on X, and invite you to go there and discuss this article with me if you are so moved!

    -Brad

  • When Trust is the Negative Space of Distrust

    When Trust is the Negative Space of Distrust

    One of my favorite connections on LinkedIn, Mr. Mahesh Shastry, posted this reminder of a guiding principle of the tech industry:

    In the 1990s, begging forgiveness was a secret superpower. Not everyone knew the approach. Those who applied it understood that to beg forgiveness, they were obligated to deliver something more amazing than they could have by asking permission first. Imagine a plumber you’ve called to your home to fix a low water pressure problem while you’re at work and unreachable that day.

    I’m sorry for digging up your lawn and leaving it all a mess. I had to fix the leaking portion of the pipe.

    That’s no good!

    We had to dig up your lawn to find and fix the broken water pipe. We replaced the old pipe from the meter at the street up to your home. We installed fresh sod to repair the grass area. Please give it some extra water the next couple of weeks.

    He did what he had to to fix it right and not leave a mess.

    When begging for forgiveness was a new thing, we did it with confidence, knowing that in the end, we would delight the customer. Sometimes we’d have a difficult customer, and this approach was the only way to get them on board. We risked disapproval because failure was not going to happen. Even if we did fail, the worst case outcome was that we lost a customer. There are plenty of customers.


    This brings me to trust. When you called that plumber, you may not have had any experience with him. You needed a problem fixed, and you weren’t available to supervise. You needed to trust him to do the right thing. Trust here, was not earned, and certainly not hard earned. It was granted because there was no reason to distrust. Until he didn’t fix the whole problem and left your lawn all dug up.

    In transactional dealings, trust is assumed and distrust is earned. Trust becomes the lack of distrust, or the negative space of distrust. The concept of negative space comes from photography and other visual arts. It’s best explained with an example. Have you ever noticed the arrow between the E and the x in the FedEx logo? It’s no Bob Ross happy accident!

    If you had never noticed that, you’ll never not see it again. And this is what I want to point out about trust.

    When that plumber shows up, he shows up with a blank canvas. That blank canvas is the negative space of distrust. You have no reason to expect disaster, and every reason to expect he will fix the problem correctly and adequately. If he’s able to tell you what he is doing every step of the way and completes each subtask successfully, that canvas of distrust stays blank. But when you get home to see the mess in the yard and the problem not really fixed, the canvas comes to life, and those empty areas of trust are gone and forgotten.


    Let’s look at an example where trust is earned, where it is the positive space. Your teenage kid wants to borrow your car for an unchaperoned overnight road trip for a concert with his friends.

    Don’t you trust me?

    No. You don’t. That’s the point. Trust is not the negative space here. Trust here is built from experience, preferably at a slower and more deliberate pace than driver’s license to unchaperoned road trips in less than a year. You can see potential for a bad situation leading to a bad outcome.

    Be thankful your kid didn’t see his worst case outcome as losing a customer and just ask for forgiveness when he got home.


    My hope for today is to identify these two ways of looking at trust to set the stage for an article on AI agents. If we insist on them having permission, we may never be able to start automating. But if we let them beg forgiveness, we’re in for disasters in short order.

    The featured picture is a collaboration between Grok and me.

    -Brad


  • LLMs are Bad at Facts but Good at Knowledge

    LLMs are Bad at Facts but Good at Knowledge

    Do you know the difference between facts and knowledge? Did you know there is a difference?

    A main criticism of large language models (LLMs) is that they “hallucinate”. That is, they make up facts that are not true. This bothers most people. It’s bad enough when people get facts wrong, but when computers, trained on the writings of people, manage to invent entirely new wrong facts, people get really concerned. Since LLMs answer with such confidence, they end up being extra untrustworthy.

    In the two years since LLMs took the world by storm, we have seen airline chat systems give incorrect discount information to customers, and a court rule that the airline was liable for bad information. We have seen legal LLMs cite court cases that don’t exist with made up URLs that don’t resolve to a valid web site or page.

    One way we deal these wrong facts is to insist that people double check any facts an LLM outputs. In practice, many users don’t know about the problem or just don’t do the work.


    What is knowledge, if not facts?

    This is the trap in which so many critics of LLMs get caught. The reasoning goes: Since it can’t get the facts right, and makes more work checking and fixing them, it’s not worth asking the LLM. But it turns out it is! Here is the LinkedIn post that changed my mind, dramatically.

    If you’ve read this far, you’ll know what sets off any critic: The facts will be wrong, and you, Jc, will look like an idiot coming in prep’d by this ChatGPT document. See the comments. I told him as such.

    I left my comment there, and apologized for it a short time later. This was the example I needed to see that the knowledge he sought was not actual facts about the company. He sought the vibe. Factual errors even add more value.

    Be the sales guy for a moment. “I understand that ABC is a leader in LMNOP.”

    Your prospect replies, “Well, you’re too kind. DEF is the clear market leader at LMNOP, but we feel that our solution is better and we hope to overtake them this year.”

    Now you have a conversation. Perhaps XYZ can help ABC improve or sell LMNOP. Perhaps that’s why or adjacent to why you called in the first place. The fact did not matter. The error of fact sparked discussion. The knowledge was the relevance of LMNOP. LLMs are really good at surfacing this knowledge.


    At the end of 2023, when I was challenged by a friend and long term business mentor to figure out my AI story, I spoke with a potential client about what they would expect. They expected that I could automate some important process with ChatGPT. I said that in my initial research, ChatGPT doesn’t really work for that.

    But I asked ChatGPT to write a story about Eeyore and Paul Bunyan using their flagship backup product to save the forest by “backing it up”. The story was delightful and surfaced many features of their offering in a fictional context. I shared the story with that potential client as an example of what they might use ChatGPT for. They were quite delighted and even more dismissive. I felt like the hero of this commercial.


    My hope for today is that y’all’s don’t throw out easy sources of knowledge because they get creative with facts. I also hope that you’ll use private, local LLMs for this, as they are just as good in practice at general knowledge as any LLM in the public cloud. You’ll be pleasantly surprised if you haven’t discovered that yet.

    -Brad


    Addendum: Despite being in a boxing ring, the Grok generated dogs in the picture, Knowledge and Facts, are not fighting. There’s no need for them to fight. They’re just different animals.