Tech Life I typed "Thank you"?

The Robot Report #2 — Her

There are two classes of songs I listen to when writing. Words and no words. Word songs are used when I the writing does not require flow — deep thought. The problem with words is they get in your head, bump around, and start creating more words. At a time when I am attempting to focus on a specific set of words, word songs are not the solution.

Nonword songs have no words and are deployed to encourage the correct words. The lack of work but the presence of melodic feeling (plus one to three cups of coffee) is the perfect partner for new word writing.

Since the release of Spike Jones Her in 2013, the perfect reigning champion of non-word songs continues to be Dimensions by Arcade Fire or Owen Pallett> — it’s unclear who wrote it1.

“Her” was back in the news recently with the report that Scarlett Johansson was super pissed that OpenAI allegedly trained on her voice for the 4o2 release of their model, which included scary impressive voice interactions. Since these allegations were revealed, OpenAI has been diligently releasing information that proves they trained on a totally different person… who kind’a sounds like Scarlett Johansson. It was reported that one of Sam Altman’s favorite movies was Her, where Johansson uses her trademark semi-gravely voice to give life to Joaquin Phoenix’s AI companion.

This important kerfuffle regarding protecting actor’s persona is not my point. My point is: Her is a profoundly sad piece of cinema (that I deeply love). It beautifully documents a not-too-distant future where we no longer ignore each other with our faces jammed into our phones; we’ve been liberated and now freely walk the world talking to our phones… ignoring each other.

The Proper Interface

Yesterday, I was working on a future piece regarding my beliefs regarding team size and organization depth. I have shared this information for years: the ideal team size is 7 +/- 3, and the ideal organization depth is 5 — not including the CEO. I concluded this thought with the closing point, “And these constraints fan out nicely; you can build quite large organization following these guidelines.”

But I didn’t do the actual math. I estimated. With the ChatGPT omnipresent on my desktop, I described the above constraints and asked, “How big of an organization can I build?” There were typos and colloquialisms in my question, and ChatGPT answered it instantly and correctly. When it was done, I typed, “Thank you.”

I typed “Thank you”?

Who was I thanking?

There is a spectrum of how humans think about large language models (“LLMS”). On one side, some declare, “They are superhuman-level autocomplete engines,” on the other end, we have those who believe, “They are partially sentient future destroyers of the world.” As is custom, the answer is someone in the middle.

Wherever you lie on that spectrum, you are skipping the most important innovation of these LLMs: the conversation.

Return to the example above and explain how I would complete the same task in Google. I wouldn’t. I’d start scribbling the math on a piece of paper and figure out the potential size of these organizations. Maybe if I were stuck, I would type “common math equations regarding measuring organization sizes” and stare dumbly at a wall of ads and possibly valuable equations.

Read that last sentence again. I wrote, “I would type,” not “I would ask.” Typing keywords versus asking a question. It’s an entirely different mode of thinking for me. If I’m typing something in a search engine, I’m trying to figure out the keywords that give me a page that might answer my question. If I’m asking a question, I’m using my natural and familiar language to describe the problem I am trying to solve or the question I am attempting to answer.

Here’s my prompt:

“If I have a rule that teams can only be seven to ten in size and there can only give five levels of management, what is my maximum organziation [sic] size?”

Google’s response included:

  • Did you mean? Where it corrected my typo.
  • A bunch of ads for Microsoft Teams.
  • Then, there is a link to an article from a VP at Stripe who describes how to size and assess teams.

ChatGPT’s response answered the question and showed its work so I could verify the math.

There’s more. Because I was in a chat mindset, I asked for follow-ups. What were the organizational size caps of with different constraints? What if I added one more layer? And when I was done with my queries, I typed, “Thank you.”

That final thank you feels like a throwaway conversational flourish until you think like a robot. It didn’t parse that as thanks; it parsed it as “This human believes my answer was correct.” This is essential data to help future queries.

Comparing Google and ChatGPT is not a fair comparison. It’s comparing a search engine to a large language model. Two vastly different stacks of technology. Guess what, it doesn’t matter. Your average human is searching for the lowest possible friction means to get the highest possible quality answer. I’ve been twisting my brain into mental knots for decades, trying to figure out the proper set of keywords and searching for the proper web page that might answer my questions.

ChatGPT answers my question because I ask my question like I’m talking to a human.

Yes, ChatGPT is aggressive and confident even when it’s impressively wrong. Guess what? That makes it more human than robot.

Profoundly Sad

When the iPhone was first announced, a recurring debate amongst my friends was, “The touch screen makes or breaks this device.” See, we’d been promised touch screens for years before the iPhone — they existed, but every single screen before the iPhone had discernible lag. From the moment you began to touch and drag on the screens, there was a bit of distracting lag. This brief moment of dissatisfaction ruined the magic. This is another touch screen that reminds me of its technology.

Technology is magic when it meets our expectations and reflects our reality. When you touch and drag a screen, you expect it to react precisely as when you perform the same actions in the real world.

The magic of the ChatGPT 4o voice interaction demo wasn’t the bajillions of engineering hours that went into powering the models that allowed the robot to respond, it’s that she responded how you expected. She responded instantly. She understood your half-words. She stopped when you interrupted. She laughed at your dumb jokes. It was magic because it met our expectations and reflected our reality. This is how I expect a conversation to work.

It is still just a tool.

Her is a profoundly sad movie because it intimately describes how these tools we love have driven us apart. Scene after scene shows vast crowds walking public areas in a low-grade conversational murmur. All the humans are pleasantly talking… to their devices. This was not the primary intent of the movie; it’s a story of the search for love, but it also describes the need we humans have to connect.

Like social media before it, I remain steadfastly exuberant about the potential for this next generation of technology to help. Still, I’m now properly educated that the potentially unimaginable consequences could outweigh the benefits.

It’s a tool. It’s not a human. It’s not Her.

  1. True story. For years and years, the soundtrack for this album was not available. A handful of the jams were available, but the album, the bulk of the music, was strangely not released. Someone was fighting with someone about something. The original score was finally released in 2021, which means from 2013 until 2021, I searched the dark corners of the internet for this song. This contributed to its uniqueness. 
  2. Version numbers are arbitrary, and I’m confident they didn’t call this ChatGPT 5 because they wanted to quench the “they’re moving too fast” vibe. 

Leave a Reply

Your email address will not be published. Required fields are marked *

2 Responses