How Convictional Is Leveraging AI

An image of a top secret folder with the text Convictional AI Strategy.

The recent wave of generative AI tools like ChatGPT and DALL-E has captured the imagination of technologists, companies, and consumers. Every company claims to be building AI into their products, but what does that mean for Convictional?

In this roundtable, Roger Kirkness (CEO), Adam McCabe (AI Lead), and Bill Tarbell (VP of Market Development) talk to Sam Beale about our approach to AI, opportunities this technology enables, precautions users should take with the current state of AI, and examples of how we're applying AI to our product at Convictional.

Watch the conversation below or read on for the highlights and a full transcript.

Roundtable Highlights

The biggest opportunities in AI

According to Roger, generative AI elevates knowledge work to the equivalent of the management level of that knowledge work. Instead of performing a work task, future knowledge workers may be tasked with overseeing an AI to perform that same work task.

For groups of people working together in companies, AI enables a higher volume of information to be shared faster. For example, at Convictional, we're using AI to share more customer insights with the team faster. The implication is that with a better understanding of customer insights, groups of people can work on the right problems and generate more value for customers faster.

Another opportunity for us with AI is using transformer models to create instantaneous trade relationships. Connecting systems that operate on different file formats will be a solved problem because AI will be able to translate the data that lives on those systems.

Adam and Bill shared their vision for the future of retail enabled by AI. It includes a high degree of personalization for the consumer where the training data is past purchase history, browsing activity, and sizing preferences.

What makes us skeptical about the current state of AI

The current state of AI isn't the vision of artificial general intelligence (AGI) that exists in science fiction. But after using ChatGPT and DALL-E 2, several consumers think it is. At the same time, there's another group of people frustrated with the limits of these tools.

Our perspective on the current state of AI is to take everything with a grain of salt. Understand how existing models work and what they're capable of and you'll be able to set your expectations accordingly.

Another area of skepticism with the current state of AI is that it's extremely energy inefficient. It makes current AI tools fairly inaccessible for everyone.

Finally, AI tools today aren't good at citing their sources. They're so good at compressing information and responding to questions efficiently that they can't attribute their answers to individual sources. Future AI models might have to be less efficient so that they can draw a relationship to their source material. We expect this to happen with narrow AI models, where a model is trained on a specific corpus of information, and can correctly attribute answers to their source text.

How we're leveraging AI for our team and customers

Multiple teams in Convictional are leveraging AI to work more efficiently or to deliver higher impact for customers:

The Seller Sourcing team is using ChatGPT to write more persuasive pitch emails to brands at scale.
The People team is using Whisper to summarize customer calls and GPT-3 to summarize them so that team members can access customer insights faster.
The Product team is identifying opportunities to incorporate AI in our product. One early use case is auto-assigning product taxonomies to vendor products based on product titles and descriptions.
The Engineering team is using Github Copilot to draft a first pass of code for specific features, after which they review the code and make edits based on nuances in our codebase.

Roundtable Transcript

Samuel (00:00):

Hello and welcome to the third episode of Convictional People Podcast. Today I got Adam, Bill, and Roger together to talk about their perspectives on AI, how it will impact Convictional and how you can apply AI in your role. So to get us started off, let's talk about what AI is. So Adam, I think that you had a pretty good answer to this one. So how do you explain AI to someone who doesn't know much about it?

Adam (00:27):

Yeah, that's a good question. And I think AI is a pretty broad term, it's used kind of everywhere to cover a lot of different types of algorithms. And when we say artificial intelligence, what we mean are systems or machines, and they could be organic, which can mimic human thinking, meaning that it's not deterministic necessarily in how it's put together, but it's able to measure and estimate very complicated probability distributions.

(01:01):

Machine learning is a subset of AI, which has exploded over the last 20, 40 years, I guess particularly as compute and storage has come down in price. And in that case, we're using computers, physical machines like we're used to, to do learning, which is again, it's not deterministic. So instead of giving an algorithm add two plus two, then multiply by five and subtract four. What we do is we give a set of steps that allows the computer to figure out those rules by itself and we don't have to intervene. So it's non-deterministic programming, which allows, again, for the estimation of very complicated probability distribution. So that given an input, we can say we think this is the right answer. And it turns out that this has a lot of applications across images, video, sound, really anything that can be coded in the numbers, even language. Yeah, so that's about everything. So it's a great way to make predictions, answer questions that are not totally tractable from a traditional programming standpoint.

Samuel (02:08):

Awesome. Thanks for that. That that's an awesome intro. Gives you sort of the reason why AI exists and also a little bit of the technical aspect so you can understand why it works. What do we think are some of the biggest opportunities in AI?

Roger (02:24):

I think inside of the business there's a few different time horizons to think about. So short term, it seems like it's able to compress text significantly without reducing the meaning that's conveyed, which is very useful for communicating with broad groups of people. So many aspects of how management science, like how people and groups of people coordinate and work together assume that you can't do that basically. And so being able to do that, you can, for example, allow everyone in a company of any size to hear from a significant number of customers on a weekly basis in a way that would not be possible if not, or would not necessarily be feasible if not. So we're building something that will enable us to do that very soon. And I think that will make everyone smarter about customers long term.

(03:10):

My observation is that generative AI elevates knowledge work to the equivalent of the management level of that knowledge work. So rather than performing a work task, you might be overseeing an AI performing a work task. At this point it still requires a smart user with the judgment to prompt it for the right information. And so you'd be coaching your AI on the tone to use and the context to use certain things in. That will definitely benefit the company and customers. And I think also anyone that chooses to learn how to harness AI to be more effective in their role. And it's very likely that it will actually amplify the effect that domain experts can have because the more experience someone has with a topic, the easier it is to discern the quality of the output of the AI.

Bill (03:59):

One of the things I've been thinking about, and especially with these new generative language models where you can ask a question and the neural network has been trained to give a pretty decent text answer, and it's kind of making me think about the mission of the company, which is to integrate and connect the world trading partners. A big part of what we do at Convictional is essentially map their domain for orders and products. And that's all sort of currently established in computer language. We have structured data in JSON files or APIs. And just thinking about why that is today, that's because we've had to, similar to what Adam said, deterministically tell computers what information means. So what I'm interested in is can this technology actually improve integrations between businesses using natural language as sort of a least common denominator as opposed to some more standard structured things like EDI or even APIs. So that's an interesting thing that I think we should keep an eye on going forward.

Roger (05:17):

Building on what Bill said, I've been thinking about what a retailer is actually made of in the future. What is a retailer versus a marketplace? To me, they're actually all the same thing. They aggregate value by bringing products together that people want to buy, usually by focusing on a specific type of customer. It's made out of a search engine of some kind, personalization, some kind of e-commerce system like checkout and commerce that's low friction. And all of that is backed by supplier enablement, which is the role I foresee us playing. We call that the supplier universe. And while I think prior to this, it seemed impossible to me that you could eliminate the barriers to forming these new trade relationships, as in it could not happen instantaneously. I no longer believe that. I think there's a state will reach where these things can be instantaneously, the trade relationships can be instantaneously formed and it's like a bot on both sides performing discreet parts of the commercial relationship forming role.

Samuel (06:17):

That's really amazing to think about. Adam, did you have something to add?

Adam (06:21):

I was just going to say, yeah, I really like Roger's vision there, the future where as a consumer, when I'm searching for a product, if there's a brand that I love or a retailer that I want to buy that product from that that's aggregating everything together. In theory, I could be completely unlimited in my choice, but the products returned again because of high degrees of personalization and the role that Convictional can play in helping to identify the right brands and partners, the products that return are going to match the voice, the tone, and the quality standards that you'd expect out of that retailer. But it becomes almost a seamless shopping experience. And retailers, the days of losing carts or checkouts to in availability of stock, that's just going to go away. And we should be able to do this almost in real time, not yet, but hopefully soon.

Bill (07:10):

I've been thinking about where the technology will reside across the entire stack. So from the end consumer, are they going to have their own trained taste models in terms of what I like to buy and where or are retailers going to keep that for me or is that going to be something that they build on the fly? I've seen a lot of folks play around with the image generation stuff where they can make avatars of their own face, and I wonder if that translates to commerce tastes or clothing sizes or style or that kind of stuff. It's just interesting to think about

Adam (07:50):

Pre-trained prompts that you store on your mobile phone that become almost your identity to feed into these transformers as you enter a site that tells it your size, your previous purchasing history, your preferences.

Bill (08:03):

Yeah.

Adam (08:03):

Yeah.

Samuel (08:06):

Wow. What would you say you're most skeptical of within AI?

Adam (08:12):

Yeah, that's a good question. And it's actually one that I get a lot talking to friends and family about this kind of thing. A lot of them equate it to some of the smoke and mirrors that you might see in crypto or web three, but I think what they're missing, the point about there is that the technology, the underlying map that's really powering these algorithms is solid and it's been around for a long time. This perception on the kind of atomic unit of a neural network was first published in the fifties, 1950s. It's only recently that we've actually had scaled compute and storage at a cheap enough cost to really proliferate the research. But that being said, this is the solid foundation that this is all built on.

(08:53):

I think where I get skeptical is around some of the hype. You'll see a lot of people out there in the community who see ChatGPT come out and praise it as the next AGI, and it's clear that it's not. It's there yet. This is a large language model that's making statistical predictions on the next token and a sequence of tokens. And then on the other side, you have people who are looking at things like, again, let's use ChatGPT as an example, and getting frustrated at times with some of the results, not understanding what it was trained or intended to do and trying to push it beyond its capabilities and having these both unreasonable expectations on the lower and upper extreme I think.

(09:31):

So for me, it's not necessarily skepticism, but it's a healthy caution around the expectations on the quality of output that I'll get. I take everything with a grain of salt and I'm always sure to check if I don't have a great understanding of what's happening underneath the hood or I'm not being served real time evaluation metrics, helping me to know what the quality or confidence in that answer is. And some tools, again, coming back to Open AI and some of the work that they've done are fantastic at doing that. So yeah.

Roger (09:59):

The only thing I would add to what Adam said is that AI is still extremely energy inefficient. So the return on energy invested is to me still low. While useful, it makes it fairly in inaccessible and hard to diffuse, but I think Bill's point about having a local model that's trained to your preferences is likely where this goes once it gets much more efficient and huge efficiency breakthroughs seem to be happening quite regularly. It's just not there yet to be able to have these things like trained locally on device beyond very simple applications.

Adam (10:34):

Yeah, we've all got to brush up on our feed forward algorithm now for edge and mobile computing.

Bill (10:44):

Yeah, I mean, I've been thinking about what additional new responsibility this technology brings on users, on application developers, on the people who actually design and train the models. I think what's exciting and hopeful about this is that a lot of the information to learn how these things work is readily available and actually fairly accessible to learn, even if you don't have a massive math or computer science background and would definitely encourage taking some time to do that. But yeah, once you understand how it works, I think that kind of coupled with ethics and responsibility, you'll be in a better position to kind of decide where it fits in your life or in a business or in policy.

Adam (11:36):

And I mean further to that point, Bill, I think it's really nice to see that a lot of the large organizations that are becoming model platforms, the Googles, the Amazons, the Microsofts, the Open AIs of the world are trying hard to publish what their principles and policies are. And I know that we've implemented our own here at Convictional. So I think it isn't just the fun of deploying the models, it's thinking about some of these big questions around how they should be used and potential biases that can be introduced into this predictions based on the training that it saw, et cetera. Yeah.

Samuel (12:11):

I have heard a lot of concern about how AI can give you the wrong answer or when you ask it for a source, it gives you the wrong source. Does anyone have any hopeful input to that or any ways to manage that?

Adam (12:29):

Yeah, you might have heard this being called hallucinating, these large language models. It's a term that's starting to be coined, I guess. And it speaks to the fact that, again, when you think about ChatGPT, it's a large language model which has learned how to A, embed all of human sentences and words effectively into this very high dimensional vector space. And then using that to basically train a neural network to retrieve what the next token in a sequence of words might be. So when I submit a question, all it's trying to do is predict what that next word might be, and each time it's updating its prediction.

(13:07):

So it's not unusual that when you ask it for something where, okay, so if you ask it for a research paper supporting some topic and it doesn't have sources that it's aware of to provide you, it just knows that the most probable words that are going to be output are going to be formatted as a source. There are going to be words that look like a name. There are going to be words that look like a title. There's probably some brackets and a date, and that's what it's going to spit out for you because it doesn't actually have, again, ChatGPT and nothing that we've seen today is true AGI, where it's actually consciously thinking about what it is providing you. It doesn't understand that output necessarily. It only understands so far is what the next word is going to be, really.

Roger (13:50):

I read also that in order to make citation possible for these very large language models, you have to make it less efficient because it can't compress things as much. It has to somehow draw some relationship back to the source material. But what it's doing, the only way it's even possible now is that it's compressing things so meaningfully down that it can draw these relationships and infer them. So it's likely that citation will start with narrow AI. So you'll have an AI that's great at Delaware corporate law, and then it'll slowly kind of get better as the math makes it more efficient. But there may be some natural limit right now on a ChatGPT thing that's able to cite its sources, may just require too much compute.

Samuel (14:34):

Interestingly similar to asking a human to cite their sources if you're just having a conversation. So I think that goes back into how the neural networks work. Switching gears a bit into applying AI at Convictional. How have you seen our team start to use AI to become more efficient already?

Bill (14:54):

Yeah, I mean, I think since our common days focused week late last year, I've noticed a lot of different teams coming up with a lot of creative use cases, both in terms of projects as well as personal use. So for example, I know our seller sourcing and sales team have used it to write better, more persuasive emails or to look up brand recommendations based on a retailer. We're currently using it to summarize transcripts for calls so we can skim and double click on a piece of content if we want to spend more time looking at it. It seems to be doing a good job at that. And then on the product side, we've started to get into some really useful use cases like automatically assigning products to a retailer's categories or taxonomy based on the content within that product, which is going to save a ton of time at our customers who tend to do that manually. And I think it's exciting that this is just the first couple months of us focusing on this. There's probably a lot more use cases.

Adam (16:04):

Yeah. So would you boil it down, Bill, into two buckets, like knowledge work enhancements, whatever you want to call it, and the productization or customer value coming out of AI?

Bill (16:18):

Yeah, I think that's a good way to think about it. I think for me personally, I usually have the ChatGPT tab open now and use it alongside Google for a lot of things. So I think it's going to be integrated to each individual's personal workflow based on what their preferences and then more formally within the business. I think maybe you and Roger could speak to how it's changing our engineering and product process in terms of the data architecture and what you're thinking there.

Adam (16:49):

Yeah, I mean know in terms of our engineers, they're adopting tools as well, like ChatGPT been helpful, Github's CoPilot I've heard has been fantastic and Roger can probably speak more specifically to some of the gains that some of our engineers have seen and some of the interesting trends that we've seen as well around seniority and ability to leverage that. But then even ourselves within our data model and what we call our data engine here, there's pieces that we now recognize need to be made more robust to allow for true deployment of many models at scale with feedback loops coming in. And it's a really kind of undervalued, underestimated scope of work that when done really well can just unleash the power of machine learning on your products. So that's where we're hoping to get the automated taxonomy that you've mentioned already, kind of the first example, and it's already proven to be powerful, is going to give us some great learnings.

Roger (17:47):

On the co-pilot thing, my observation was more experienced engineers actually found it easier to use because they could snap judge the output so they could say, I kind of know what I'm looking for and I could write it, but I'll ask co-pilot for it. And then what comes back, they're able to immediately say whether or not it's what they were looking for. And I heard an anecdote from the former head of AI at Tesla Autopilot saying that he used it for about 80% of coding tasks, which implies some relationship again between coding ability and ability to discern if the resulting outputs are accurate.

(18:24):

And then on the question of how we're applying it to product, I think over time, my observation is that we actually have a data advantage over customers because the amount of data we have in the domain is the sum of the data that our customers give us. And so we can be theoretically better at any AI application than our customers can. They would have to rely on other third party models. So it doesn't seem like something that our customers will have as a competency in their business long term. It seems like something that would actually make sense for them to hire for. And that's true of us as well, where we don't have data advantages. So there'll be types of models like transcription where we should use some kind of high level off-the-shelf product rather than roll it ourselves because we will always be at a data disadvantage.

Samuel (19:12):

What advice do you have for people who are wanting to learn to apply AI?

Adam (19:16):

I think there's a million and one resources out there that you can probably get into. There's community forums where you can ask for help on stock exchange or Reddit or Twitter, take your pick. But I think in terms of getting started, you need to really identify first coming back to what Bill had mentioned, are you going to use this for knowledge work or is that what you're interested in learning more about how you can leverage this knowledge work and make yourself faster? Or are you actually wanting to get into Python and starting start to code an actual neural network? Like some of us have just for the fun of it.

(19:56):

I think if you're getting into the knowledge work, again, the communities are probably a great place to start. It's just awareness, really. It's chat, G P T kind of put this on everyone's radar, but a lot of these tools mean around for a while and are pretty solid. It's just finding them and understanding what should be possible, so what might be out there.

(20:15):

And then on the productization part, I mean, again, million and one resources. There's two that we've really liked here. One is the MIT Intro to Deep Learning course. It's like a two week crash course that they give students who take it and they film the entire thing. It's just about the right level of depth. You don't need a ton of math, but a bit would help. You don't need a ton of computer science, but a bit would help. And then the other, which has already been mentioned was something that Bill said the, or maybe Roger, Andrej Karpathy, previous head of Autopilot at Tesla, he just released a video building up NanoGPT, which is a smaller version of ChatGPT that he builds himself from scratch and walks people through. And so finding resources like this, you can dive in pretty quickly.

Roger (21:06):

My answer there would be that I think understanding the underlying math can be valuable if you're applying it to understand why it might be giving you certain results. If you're not applying it directly, it may not be necessary. And I would equate it to understanding bits and how CPUs work in computers as being not necessary to apply a software as a service product. I'm more kind of verbally fluent than I am math fluent. And so for me, it's very interesting the notion of the chatbot going from a place where I probably could not build my own deep learning library despite having a technical background, but I can definitely figure out what prompts I can use to get what I want from the machine and potentially teach myself things that I would not otherwise know.

(21:56):

So I think for most people in the company, a baseline understanding of how to use prompts and how to apply it to your domain will be what they will need in the future. And the UI for that may not always be chatbots, it could be other forms, but generally speaking, knowing what prompts to put in order to get the expected output is important. That is a new field of engineering called prompt engineering. I think it's very likely in 10 years there'll be undergrad programs in prompt engineering.

(22:24):

The other thing I'll say is that what effect this will have in general is that knowledge work in many cases is somewhat duplicative across companies. So someone somewhere has solved that problem before and there will experience deflation in the value of duplicative work. So doing work that has occurred in the past will be less valuable than it is today. And by extension, doing work that hasn't been done in the past will be more valuable. You call it originating work. So to me, over time, it'll kind of pull a lot of duplicative work out of people's calendar and replace it with time to do originating work, which is better, again, for all the stakeholders of a business. And again, better for the people in the business, assuming they are proficient in either prompt engineering or potentially going deeper if they have those interests and abilities.

Samuel (23:18):

Yeah, those are awesome points, especially around prompt engineering and origination. I think a big takeaway, at least for me, is that most of us should focus on getting great at adapting our work to AI rather than going deep into how specific models work. That might not be true for everyone, but it is for most of us. I think that's all we had to go over. But thank you all for being here. This was so interesting. And if you're listening, thanks for listening through. I hope you're coming away with some new learnings and ideas for how to apply AI.

AI learning resources we recommend

While we mention a couple of resources in this roundtable, Roger, Bill and Adam have circulated several learning resources internally. We're sharing them here for anyone interested in learning more about the current AI landscape:

Roger

Reading:
Tim Urban (2015) — The AI Revolution: The Road to Superintelligence
Andrej Karpathy (2017) — Software 2.0
Gwern (2020) — The Scaling Hypothesis
Wikipedia — The History of Artificial Intelligence
Google Research, Vaswani et al. (2017) — Attention Is All You Need

Podcasts:
Sam Altman | Greymatter — AI for the Next Era
Connor Leahy | Machine Learning Street Talk — AI Alignment & AGI Fire Alarm
Demis Hassabis | Lex Fridman — Demis Hassabis: DeepMind
George Hotz | Lex Fridman — Comma.ai, OpenPilot, and Autonomous Vehicles
Ilya Sutskever | Lex Fridman — Deep Learning
Emad Mostaque | Good Time Show — Founder of Stable Diffusion on AI ethics, religion, India’s AI future and open source
Ben Goertzel | Machine Learning Street Talk — Artificial General Intelligence
Andrej Karpathy | Lex Fridman — Tesla AI, Self-Driving, Optimus, Aliens, and AGI

Video:
Eliezer Yudkowsky (2007) — Introducing the Singularity: Three Major Schools of Thought
Eliezer Yudkowsky (2017) — Difficulties of Artificial General Intelligence Alignment
Eric Elliott (2020) — What it’s like to be a Computer: An Interview with GPT-3
Ethan Caballero (2022) — Broken Neural Scaling Laws

Books:
Deep Learning (2016)
Superintelligence (2014)
The Singularity is Near (2005)

Movies:
AlphaGo (2017)
Her (2013)
2001: A Space Odyssey (1968)

Source

Adam

Reading and Papers:

Cramming: Training a Language Model on a Single GPU in One Day: This paper trended on twitter for a while because it bucked a trend of bigger and bigger models with more params and FLOPs, instead looking at what could be accomplished on a single consumer GPU in a day. They show that (i) architecture is important and (ii) you can get a high level of quality quickly - higher FLOPs lead to diminishing returns under current setups (which is potentially still necessary).
Data Distributional Properties Drive Emergent In-Context Learning in Transformers: Absolutely loved this paper - it blew my mind and in my opinion asks some pretty deep questions around the nature of language. tldr, the ability of LLMs built on Transformer arch. to generate responses formatted based only on context (e.g. without having to retrain weights) is a completely emergent property that wasn't intentionally trained for. More interestingly it is unique to Transformers - BUT - it also requires training data to be distributed with particular properties (burstiness, Zipfian distributed, etc)
Two additional papers have popped up on this topic: Transformers learn in-context by gradient descent and Why Can GPT Learn In-Context?
Towards Principled Methods for Training Generative Adversarial Networks: Not the original paper on GANs but a highly cited paper for modern GAN implementations. As a follow up read, the Wasserstein GAN is the basis of current state of the art models.
Pytorch library: Not a paper, but spent some time getting to know Torch as well - my takeaway is that it is better suited to low-level granular control over your networks and is optimized for speed. Tensorflow seems more high level/more finicky at these lower level tasks (although, not incapable)
Sparse GPT: Hints at the fact that LLM Transformer architecture is still highly inefficient. Researchers show that massive LLMs can be pruned to less than half their weights without material performance impacts.
The topic of finding more efficiencies in the architecture of our models is growing. The first item in this list also speaks to this, and this paper, 'Tuning LLMs via zero shot hyper parameter transfer shows that hyper-parameters can be learned/tuned on smaller (subset) models and then transferred in a zero-shot method to larger models.
Neuro-Symbolic AI: Some larges shops (IBM included) state this could be the path to AGI as it gives Transformers logic capabilities which today are their biggest weakness.
Wolfram + ChatGPT: Stephen Wolfram lays out proofs of concepts for how Wolfram can give ChatGPT a logic implant - effectively proposing that Wolfram is introduced into ChatGPT architecture with fine-tuning causing ChatGPT to consult Wolfram Alpha to provide additional context - correcting mathematical, logical and numerical mistakes.
Reinforcement Learning from Human Feedback: RLHF was a key component to making ChatGPT work and took some novel work in the field of reinforcement learning to optimize gradient descent (given parameter count) and in design of the reward function.
Wide FeedForward or RNNs are Gaussian Processes: I've referenced Greg Yang previously here (5.1 above), who is a mathematician at Microsoft working to formalize the theory of NNs. His work, and others in his field, will give us better understandings of the architecture so that we can more predictably design networks vs. how we do it today with, broadly, guess and check approaches. This paper shows that all feedfoward, or RNNs (large majority of the atomic components in ML) are Gaussian processes, effectively meaning that a randomly initialized NN can be thought of as a function which can take any random subset of the input and output a Gaussian distribution. Why is this important? We understand Gaussians really well - and although it does not remain Gaussian - it shows that the process of training is effectively nudging a Gaussian in a new direction - and moreover, it means that we can understand what the 'sum' of neural networks mean.