Author: Brad Hutchings

AI is Really DIY

I’m sorry the robot dog isn’t wearing pants. I told Grok I needed a robot dog wearing a DIY (“Do It Yourself”) shirt. It gave me exactly what I asked for. That’s how the image creation models work. The picture passed all the classifiers I specified. It didn’t think to ask me if the dog should be wearing pants, mostly because these systems do not think. But we have a tradition of cartoony characters who do this, from Winnie the Pooh to Donald Duck to Whacko Warner. Thankfully humans have retained some modesty.

If I still paid my human illustrator to make goofy pictures for my essays, he would have drawn the robot dog with pants. If not, I would have sent the picture back to him and asked for pants. With Grok, I haven’t figured out how to get it to make a change to a generated picture. Currently, I can only ask it to draw a new picture with a refined prompt. Grok gave me the ability to do it myself, so I have to work within it’s limitations if I’m going to do it myself. I most certainly could never draw even a terrible DIY Dog Robot.

Managers Want to DIY

My introduction to ChatGPT came about two years ago, when a new co-worker, ostensibly brought in to manage our small firm’s chaotic workflow, used his GED (“General Educational Development”) diploma equivalent from the Australian Outback to tell the team that AI would replace us in 6 months if we didn’t figure out how it could help us. I know that is a shared experience for many software and creative professionals.

The great industries through history have sought to leverage the work of human beings. Small teams proved ideal for Skunk Works type projects, but the goal of all heads of companies for almost 150 years of industrialization has been to grow a workforce in order to grow a business. Technology has chipped away at that attitude. For the 40 years of the personal computer era, jobs that could be computerized have been automated and outsourced. There were still people doing work, they were just elsewhere. Downsizing a workforce has been seen as a sign of failure, and has usually been difficult to recover from. Even today, firms try to hide downsizing with return to office mandates, patterned on what Marissa Mayer did at Yahoo! 12 years ago now. “We have to cut our payroll,” the C-suite people discuss grimly in their C-suite meetings, “Let’s try to do it without drawing undue attention.”

It wasn’t until ChatGPT would offer a confident answer to any question asked that managers began to think they didn’t need their workers or cheaper replacements. They could do the work themselves. At a minimum, this gave them immediate bargaining power and a heavy upper hand. Instantly, there was no shortage of people willing to hype up the potential of the technology to do any job. And no shortage of managers willing to lap up, amplify, and even act on the hype.

Prior to ChatGPT, most managers had no hope of doing the work themselves. Few had knowledge of the work. Few firms put competent workers — developers, artists, analysts, etc. — in management. ChatGPT gave them hope. It was obviously misplaced to anyone familiar with the actual work.

Leaders Need People Working for Them

I’m no Barbra Streisand fan, but her lyric — People who need people are the loveliest people — offers insight into our situation. In the great tradition of industry as noted above, managers who don’t need people (to manage) are garbage managers. Expensive failures of so many AI initiatives are beginning to shine disinfecting light on these garbage people.

Notice that I have not called these managers “leaders” yet. Because they are not leaders. A recent mentor of mine, a very successful entertainment industry talent manager, repeatedly emphasizes that the number one task of a leader is to provide safety. That is invariant whether you are an in charge at the top of an org chart, or leading an initiative involving co-workers, customers, suppliers, etc. People look to you as a leader for safety. When you provide that, work can get done.

You might see how a new manager with a GED from the Outback threatening the team’s livelihood didn’t instill a sense of safety. Dude was no leader, and did not move the leadership dynamic on that team a metric half inch. Nor was he or could he be any help actually doing the work, so we couldn’t even repurpose him for that. I can’t imagine how badly things would have gone had he been forced to do the work himself, as he seemed so eager to do.

What Could Possibly Go Wrong?

Avid DIY-ers reading this have war stories a metric marathon long of things that have gone wrong on their way to becoming kind of good at DIY. I’ll share some.

In the early 2000s, I decided to remodel a bathroom. Although the property was a cookie cutter California condo built in the late 1980s, I wanted to spare no expense and no effort. I got a bid for $10K to not quite do what I wanted in about a month. I decided to do it myself. I tore the bathroom down to studs, moved plumbing, did back-breaking shower and tile work, etc. I ended up spending more than a year plus $25K on tools, trial and error, materials, and some professional electrical and plumbing assistance/oversight to get less of what I wanted. That tile saw was sick, and I learned to use it like an artist, not just a skilled craftsman.

About 15 years ago, my Dad was using a screwdriver to try to repair something he should have and could have hired a handyman to repair. My Dad doesn’t have a lot of patience with tools, or a lot of things for that matter. I know he’s my Dad because every bit of patience I have was self-taught, painfully. Neither of us have that gene. While prying at something, the screwdriver slipped and he shoved it through his hand. He needed surgery to repair the damage and extensive occupational therapy to get most of its function back.

I made a new friend in April, 2024. He was successful with his business, and we were looking at generative AI angles that were actually intellectually sound. He did a lot of things himself to build his business and his wealth. He owned a few properties and had to do some cleanup at one this summer. There was a wasp nest in the yard that he needed to get rid of. Instead of calling a beekeeper, he did it himself. He was attacked, and died in the hospital a month later.

So to lawyers, software developers, artists, musicians, and other professionals who have been told for two years that their jobs would be replaced by AI, you should find immediate comfort in knowing that AI is just DIY. There is a tremendous amount that can go wrong, because this AI is just making up answers one metric word (“token”) at a time. It often offers very wrong answers, with the confidence of the most accomplished professional. It makes stuff up, which is great for real generative tasks, but not so great for important decisions.

AI / DIY is great for tasks you would not pay a person to do, where correctness to specification isn’t terribly important. I hope you realized I was joking about paying an illustrator for my essay art before Grok became free to use on X. Grok can’t give me a layered .psd file I can open and adjust with Photoshop, like a human artist would. Without Grok, I’d mashup my own photos with Photoshop to make sillier meme pictures.

AI / DIY is also great for some repetitive generative work. Literacy is an unsolved problem. Most kids, even those from well-off families, don’t have a variety of relevant reading material, especially for younger readers, available at home. I help parents create a new story every day in five minutes, that they can print and read with their kids. The stories are about people, places, and characters their kids know and care about — relevant stories that will delight young readers. No professional, recognized children’s book author can write such at that pace.

AI / DIY is not great for activities that can drain your bank account, impale you, kill you, land you in jail, attract unwanted attention, etc. People are, and they will remain, the more prudent choice for most existing intellectual tasks for decades to come.

The Moral of this Story

It took two years for me to figure out that the AI hype crowd who was and remains excited about replacing skilled workers don’t operate with evil intent — mostly. They are intrigued by the personal empowerment of doing tasks themselves to their own specifications. They used to have to pay for this work to be done and accept the costs of other people doing them other people’s ways. Or perhaps not do the work at all, because it was not worth paying for.

I hope that this essay helps you appreciate that there are benefits to AI / DIY, and potentially unfathomable costs. That’s a healthy mindset for approaching this technology.

January 5, 2025
How “AI” Image Generators Work
I’m writing this article to help artists and users understand what the current crop of AI image generators work do and how they work. I won’t dive into technical details. This explanation is “right enough” for a 5,000 foot discussion of how to work with the technology and address high level concerns, such as whether training content is “stolen”.

I’ll be using the picture attached to this post for discussion. It was generated by xAI’s Grok using the following prompt:
```
Mona is a 6 year old white Boxer dog. 
Johnny is a 2 year old Jack Russell Corgi mix. Johnny is black and white with brown highlights. 
Eeyore is the popular children's book character.
Paul Bunyan is the popular children's book character.
Make a picture where these characters play a high stakes game of go fish to raise money for renovating the dog park.
```
I used a similar cue — “Make a story” — with a private IBM Granite 3 Instruct LLM running on my laptop to create a delightful story that the picture will accompany.

Delight and Realism

The first thing to notice about the picture is that it truly is a delightful depiction of a scene I had in mind. The Boxer looks passably like Mona, or at least her stunt double. The little dog looks amazingly like Johnny. More on that in a future post. Paul Bunyan and Eeyore are out of focus behind the main scene. Johnny and Mona seem to be playing a hybrid of Go and some card game. Literalist critics would derisively call that a “hallucination”, but to me, that’s delightful and funny. You are allowed to smile when you use these tools!

The second thing to notice is that it’s got a well-executed photo realistic feel to it. Obviously, it’s not real, but with costumes, stage dog training, and fortunate timing, you could imagine capturing a shot like that. A Photoshop hack like me or even a professional would have a hard time putting that scene together seamlessly.

Visualization

The first conclusion you should make about the technology is that it is good at bringing ideas to life. If you can express the idea in a few sentences, it can probably render something close, even close enough. That makes it a powerful tool for both visualization and prototyping.

While it might feel wrong to use an AI image generator for a final, professional product, and might cast doubt on the workmanship that went into that product, it is quite valuable in the iterative process of drafting and refining. This applies to all kinds of things activities that generative AI can do, from pictures to stories to analysis to code and beyond. Generative AI is excellent at providing samples and templates that can serve as placeholders for production work or even adapted to finished products.

How It’s Made

The picture itself is created using a model and an algorithm. The model is a large collection of weights, billions of statistical measurements about billions of images in a training set of images. The algorithm is called diffusion. It starts with random pixels — think of a noise pattern on an old tube television, like the picture on the left, courtesy of Grok.

The algorithm iteratively changes the pixels so that the picture looks more and more like what was requested. The picture on the right is that same old tube television tuned into another picture of Mona and Johnny playing Go Fish. That mashup was assembled in Photoshop.

“Looks more and more like…” — That is the key! Again, at a high level, it takes each step between all noise and all picture, and it “asks” the model if it looks more like what is described in the prompt. E.g. Is there a white Boxer dog? Is there a Jack Russell Corgi mix, black and white with brown highlights? Are they playing Go Fish? The things that can answer these yes/no questions are called “classifiers”. They are part of the model.

When the model is trained on millions or billions of images, each image has a description, often in English, associated with it. It could also have a more structured description, with pre-assigned category names, etc. The original work of classifying images is/was done by humans, and is often very low paying work done by very low-skilled workers. That may or may not give you pause, or paws if you’re a dog. The situation is more nuanced than any critical documentary shows.

To get a sense of how the classifiers work, consider the image at left. That’s not a Boxer. It’s a white Siberian Husky. As the picture changes from noise to a finished image, a Boxer classifier would keep it from evolving this way. But a Siberian Husky classifier would move the picture to a result like this one.

Notice what Grok did with “go fish” for this image. Someone cut all the heads off and cleaned them! There is a lot of delight in the ambiguities of our English language. Image and language models are more than happy to surface them!

Elements are Random, not Copied or Directly Computed

The most important thing to know about the generated images are that the elements — such as Mona, Johnny, Eeyore, the fish, etc. — are randomly generated to pass the classifier tests, not copied from any of the original training images. If there is a large, diverse subset of all images in the model in the “yes” images for a classifier, there will be plenty of unique variation in the outputs of generated components. Each component is better thought of as a freehand interpretation of all the “yes” images for its classifiers than a “composite” or “average”. They are most definitely not copies in production scale models.

Creative people and copyright holders are genuinely concerned about unlicensed use of their images for training these models. The argument of the model makers is that US copyright law enumerates the rights a creator has, and doing math on an image to help build a classifier is not one of those rights. Model builder point out that the images themselves do not become part of the model, and are unlikely to be generated by the model.

Creative people also argue that these generators compete with their services. While proponents have no legal need to make a case against replacement, they point out something you can fairly assume from the example in this post. Nobody — not even me — is going to pay for a visualization of Johnny, Mona, Eeyore, and Paul Bunyan playing Go Fish. It is not an activity that is worth the time of a creative person.

I side (vociferously) with the model builders, as you might surmise from my description of how the process works. Copyright is intended to be a limited right, and we need to be vigilant to keep it limited. You’re welcome to pick whichever side resonates with you. The technical legal issues are working their way through courts as you read this.

Now… Just because the model itself and the process of making the model doesn’t violate copyright, you certainly can by using the model. If you publish a generated image that has a protected character, such as Sonic the Hedgehog in it, you may be (read: “are likely”) violating the copyright of Paramount Pictures or the original video game developer. And that’s a good reason to limit generated images to your own personal, non-commercial use or as prototypes that won’t be distributed to the public. You’ll notice that I use characters that belong to me (my dogs) or characters that are now in the public domain, like Eeyore and Paul Bunyan. Standard “I am not a lawyer” disclaimer for this whole section.

Let’s Discuss!

I hope this very high level explanation has helped you understand why image generators like Grok can produce so much pure delight, and how they work at a high level. The details are very interesting too, but you don’t need to know the details to have a sense of what these systems do to create delightful images.

I’ve posted a link to this post on X, and invite you to go there and discuss this article with me if you are so moved!

An article I published on my website about how image generators like Grok images work and issues you should be aware of. This is for non-geeks who want a broad understanding, not a detailed technical article. I invite your comments, love, and hate here.https://t.co/DCIo9LkjGL pic.twitter.com/zspp8PLJ6N
— Brad Hutchings 🍌🐶 (@BradHutchings) December 29, 2024

-Brad
December 29, 2024
When Trust is the Negative Space of Distrust
One of my favorite connections on LinkedIn, Mr. Mahesh Shastry, posted this reminder of a guiding principle of the tech industry:

In the 1990s, begging forgiveness was a secret superpower. Not everyone knew the approach. Those who applied it understood that to beg forgiveness, they were obligated to deliver something more amazing than they could have by asking permission first. Imagine a plumber you’ve called to your home to fix a low water pressure problem while you’re at work and unreachable that day.

I’m sorry for digging up your lawn and leaving it all a mess. I had to fix the leaking portion of the pipe.

That’s no good!

We had to dig up your lawn to find and fix the broken water pipe. We replaced the old pipe from the meter at the street up to your home. We installed fresh sod to repair the grass area. Please give it some extra water the next couple of weeks.

He did what he had to to fix it right and not leave a mess.

When begging for forgiveness was a new thing, we did it with confidence, knowing that in the end, we would delight the customer. Sometimes we’d have a difficult customer, and this approach was the only way to get them on board. We risked disapproval because failure was not going to happen. Even if we did fail, the worst case outcome was that we lost a customer. There are plenty of customers.

This brings me to trust. When you called that plumber, you may not have had any experience with him. You needed a problem fixed, and you weren’t available to supervise. You needed to trust him to do the right thing. Trust here, was not earned, and certainly not hard earned. It was granted because there was no reason to distrust. Until he didn’t fix the whole problem and left your lawn all dug up.

In transactional dealings, trust is assumed and distrust is earned. Trust becomes the lack of distrust, or the negative space of distrust. The concept of negative space comes from photography and other visual arts. It’s best explained with an example. Have you ever noticed the arrow between the E and the x in the FedEx logo? It’s no Bob Ross happy accident!

If you had never noticed that, you’ll never not see it again. And this is what I want to point out about trust.

When that plumber shows up, he shows up with a blank canvas. That blank canvas is the negative space of distrust. You have no reason to expect disaster, and every reason to expect he will fix the problem correctly and adequately. If he’s able to tell you what he is doing every step of the way and completes each subtask successfully, that canvas of distrust stays blank. But when you get home to see the mess in the yard and the problem not really fixed, the canvas comes to life, and those empty areas of trust are gone and forgotten.

Let’s look at an example where trust is earned, where it is the positive space. Your teenage kid wants to borrow your car for an unchaperoned overnight road trip for a concert with his friends.

Don’t you trust me?

No. You don’t. That’s the point. Trust is not the negative space here. Trust here is built from experience, preferably at a slower and more deliberate pace than driver’s license to unchaperoned road trips in less than a year. You can see potential for a bad situation leading to a bad outcome.

Be thankful your kid didn’t see his worst case outcome as losing a customer and just ask for forgiveness when he got home.

My hope for today is to identify these two ways of looking at trust to set the stage for an article on AI agents. If we insist on them having permission, we may never be able to start automating. But if we let them beg forgiveness, we’re in for disasters in short order.

The featured picture is a collaboration between Grok and me.

-Brad
- Discuss on LinkedIn.
December 21, 2024
LLMs are Bad at Facts but Good at Knowledge
Do you know the difference between facts and knowledge? Did you know there is a difference?

A main criticism of large language models (LLMs) is that they “hallucinate”. That is, they make up facts that are not true. This bothers most people. It’s bad enough when people get facts wrong, but when computers, trained on the writings of people, manage to invent entirely new wrong facts, people get really concerned. Since LLMs answer with such confidence, they end up being extra untrustworthy.

In the two years since LLMs took the world by storm, we have seen airline chat systems give incorrect discount information to customers, and a court rule that the airline was liable for bad information. We have seen legal LLMs cite court cases that don’t exist with made up URLs that don’t resolve to a valid web site or page.

One way we deal these wrong facts is to insist that people double check any facts an LLM outputs. In practice, many users don’t know about the problem or just don’t do the work.

What is knowledge, if not facts?

This is the trap in which so many critics of LLMs get caught. The reasoning goes: Since it can’t get the facts right, and makes more work checking and fixing them, it’s not worth asking the LLM. But it turns out it is! Here is the LinkedIn post that changed my mind, dramatically.

If you’ve read this far, you’ll know what sets off any critic: The facts will be wrong, and you, Jc, will look like an idiot coming in prep’d by this ChatGPT document. See the comments. I told him as such.

I left my comment there, and apologized for it a short time later. This was the example I needed to see that the knowledge he sought was not actual facts about the company. He sought the vibe. Factual errors even add more value.

Be the sales guy for a moment. “I understand that ABC is a leader in LMNOP.”

Your prospect replies, “Well, you’re too kind. DEF is the clear market leader at LMNOP, but we feel that our solution is better and we hope to overtake them this year.”

Now you have a conversation. Perhaps XYZ can help ABC improve or sell LMNOP. Perhaps that’s why or adjacent to why you called in the first place. The fact did not matter. The error of fact sparked discussion. The knowledge was the relevance of LMNOP. LLMs are really good at surfacing this knowledge.

At the end of 2023, when I was challenged by a friend and long term business mentor to figure out my AI story, I spoke with a potential client about what they would expect. They expected that I could automate some important process with ChatGPT. I said that in my initial research, ChatGPT doesn’t really work for that.

But I asked ChatGPT to write a story about Eeyore and Paul Bunyan using their flagship backup product to save the forest by “backing it up”. The story was delightful and surfaced many features of their offering in a fictional context. I shared the story with that potential client as an example of what they might use ChatGPT for. They were quite delighted and even more dismissive. I felt like the hero of this commercial.

My hope for today is that y’all’s don’t throw out easy sources of knowledge because they get creative with facts. I also hope that you’ll use private, local LLMs for this, as they are just as good in practice at general knowledge as any LLM in the public cloud. You’ll be pleasantly surprised if you haven’t discovered that yet.

-Brad

Addendum: Despite being in a boxing ring, the Grok generated dogs in the picture, Knowledge and Facts, are not fighting. There’s no need for them to fight. They’re just different animals.
- Discuss on LinkedIn.
December 19, 2024