Dear friends,
In JC’s Newsletter, I share the articles, documentaries, and books I enjoyed the most in the last week, with some comments on how we relate to them at Alan. I do not endorse all the articles I share, they are up for debate.
I’m doing it because a) I love reading, it is the way that I get most of my ideas, b) I’m already sharing those ideas with my team, and c) I would love to get your perspective on those.
If you are not subscribed yet, it's right here!
If you like it, please share it on social networks!
👉 OpenAI’s plans according to Sam Altman (Humanloop)
❓ Why am I sharing this article?
Interesting to understand the roadmap of OpenAI and what they are going to do to assess how we can guide our own roadmap using LLMs
Longer content/context and cheaper cost per inference seems like the direction it is going to go
We should also consider that finetuning is going to be standard by the end of the year.
I wonder what multimodality changes in our approach.
Technical information:
The longer 32k context can’t yet be rolled out to more people. OpenAI haven’t overcome the O(n^2) scaling of attention and so whilst it seemed plausible they would have 100k - 1M token context windows soon (this year) anything bigger would require a research breakthrough.
The finetuning API is also currently bottlenecked by GPU availability.
Dedicated capacity offering is limited by GPU availability.
OpenAI also offers dedicated capacity, which provides customers with a private copy of the model.
To access this service, customers must be willing to commit to a $100k spend upfront.
OpenAI’s near-term roadmap
Sam shared what he saw as OpenAI’s provisional near-term roadmap for the API.
2023:
Cheaper and faster GPT-4 — This is their top priority. In general, OpenAI’s aim is to drive “the cost of intelligence” down as far as possible and so they will work hard to continue to reduce the cost of the APIs over time.
Longer context windows — Context windows as high as 1 million tokens are plausible in the near future.
Finetuning API — The finetuning API will be extended to the latest models but the exact form for this will be shaped by what developers indicate they really want.
A stateful API — When you call the chat API today, you have to repeatedly pass through the same conversation history and pay for the same tokens again and again. In the future there will be a version of the API that remembers the conversation history.
2024:
Multimodality — This was demoed as part of the GPT-4 release but can’t be extended to everyone until after more GPUs come online.
The usage of plugins, other than browsing, suggests that they don’t have PMF yet.
He suggested that a lot of people thought they wanted their apps to be inside ChatGPT but what they really wanted was ChatGPT in their apps.
👉 Meta Open Sources Another AI Model, Moats and Open Source, Apple and Meta (Stratechery)
❓ Why am I sharing this article?
More open source models to come!
Multi-modal is going to come sooner than later.
Meta says that its model, ImageBind, is the first to combine six types of data into a single embedding space. The six types of data included in the model are: visual (in the form of both image and video); thermal (infrared images); text; audio; depth information; and — most intriguing of all — movement readings generated by an inertial measuring unit, or IMU. (IMUs are found in phones and smartwatches, where they’re used for a range of tasks, from switching a phone from landscape to portrait to distinguishing between different types of physical activity.)
What I am particularly interested in, though — and building on last week’s Update about Facebook’s earnings — is the license. ImageBind is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license (CC BY-NC-SA).
👉 Financial Services Will Embrace Generative AI Faster Than You Think (Andreessen Horowitz)
❓ Why am I sharing this article?
Do not be the incumbent, let’s move fast enough to build things!
In the battle between incumbents and startups, the incumbents will have an initial advantage when using AI to launch new products and improve operations, given their access to proprietary financial data, but they will ultimately be hampered by their high thresholds for accuracy and privacy.
👉 How Google is making up for lost time (Platformer)
❓Why am I sharing this article?
Some interesting AI tools from Google to explore.
Google’s new “AI snapshot” feature is a major overhaul to its standard search results page, replacing links to third-party sites with a screenfilling generative response. It’s a radical new approach to search that may have drastic effects on publishers. (David Pierce / The Verge)
➡️ We should work on being the first answer on key topics.
Codey is a new code generation tool based on PaLM 2, specifically trained to work on programming-related prompts to challenge GitHub’s Copilot. (Frederic Lardinois / TechCrunch)
Google is testing a new “Universal Translator” that redubs video footage in a new language and syncs a speaker’s lips accordingly. The company said it is aware of how the feature could be misused for deepfakes. (Devin Coldewey / TechCrunch)
➡️ Very powerful.
👉 The Post GPT Software Era (Inside My Head)
❓Why am I sharing this article?
“For the long tail, most UIs will be generated and iterated on the fly based on the task, used, and then disappear into the ether”. Interesting tech on how UI is going to change with LLMs.
Could we have mini-apps being automatically built in Alan?
How much are we increasing the productivity of engineering thanks to copilot and co?
Below is an example from Sean Grove - where he talks about ephemeral UI, and how we at runtime will have apps conjure from nothing and to then simple fade away.
Single-use apps: If we have AI that at runtime get’s a request from a human to complete a task or a goal. That then gets to work, figures out what it needs to do, write a small backend and a front end, all in mere seconds. Then presents you with the UI necessary to complete the task.
Today we have GPT4, PaLM 2, Github Co-pilot X, Hugging Face StarCoder and Replit Ghostwriter to mention a few of the LLM’s that can code. There are of course more.
👉 Intercom’s AI Evolution (The Generalist)
❓Why am I sharing this article?
Good benchmark on conversations and cost per conversation
How they approach Fin and AI
In March, Traynor’s firm debuted Fin, a customer service bot powered by GPT-4. Though in its early days, Fin is seeing “insane levels of demand,” according to the Intercom
We used new AI models to summarize conversations that agents could expand or shrink as they liked. executive. It’s purportedly capable of resolving 50% of customer queries.
We used new AI models to summarize conversations that agents could expand or shrink as they liked.
It’s funny, but a lot of the work we did on Fin was getting it to not do things.
The capabilities of a lot of AI models are great, but they have an inflated sense of their knowledge.
We needed to tame that tendency so that Fin understood its confidence level when answering.
Sometimes it should say, “I’m not sure, but here are two articles that might help,” other times, it should say, “I know the answer – here you go.”
Most crucial was getting Fin to learn when to say, “I have no clue, I’m going to pass you over to a human agent and they’ll take it from here.” Most of the research we did focused on helping it make that hand-off.
Why is that happening? The best way I can say it is: society is ready for a bot. ChatGPT and the advancements it represents have gotten people acquainted with the idea. Now, companies are realizing that this can be the frontline of their support, and it’s smart enough to escalate to humans when needed. The combination of automation plus humans is the pitch customers respond to the most.
There are features we could build that we won’t because they’re too expensive. For example, we could use GPT-4 to summarize every conversation that every customer has with every business on Intercom. We could do that, but it would cost a lot of money because Intercom powers 500 million conversations a month. That’s a lot of API calls, right?
Depending on where a support agent is and how complex the issue they’re addressing is, it costs between $5 to $25 per conversation, and each agent is tasked with closing between 50 and 100 conversations a day. The cost of calling an API is a rounding error compared to a company’s fully loaded expenses.
It’s already over! Please share JC’s Newsletter with your friends, and subscribe 👇
Let’s talk about this together on LinkedIn or on Twitter. Have a good week!