OpenAI shared in their Nov 5 dev day that 92% of Fortune 500s are leveraging the OpenAI API to build assistants tackling various business problems. It’s safe to say that using language models to help alleviate support debt, power knowledge sharing in relationships and providing co-pilots internally to accelerate workflows is now a strategic imperative in order to remain competitive. Anecdotally, speaking with friends and network peers, it is clear that it is not only the largest players contemplating how best to incorporate the power of these models into their businesses.
At Convictional we saw the potential power these models offered, and how we could begin to leverage them during late 2022, with our first retrieval augmented generation (RAG) chatbot powered by GPT-4 being launched to users in June 2023. This takes the vanilla ‘Chat GPT’ experience a step further by giving it (secure) access to documents uploaded and made available when a user asks a question. Since that time, we have been rapidly learning and iterating on our approach, allowing us to offer stable, reliable and trustworthy production-ready assistants. With thousands of hours invested in research and development, we have learned that the 80/20 rule rings as true as ever - you will spend 20% of your time getting to something 80% satisfactory for release, and the remaining 80% striving for that remaining 20% of performance.
The non-deterministic nature of generative AI, which makes it so powerful, can also be its Achilles heel when deploying to production. While users have quickly learned to accept some level of errors or mistakes in language model output, their patience wears thin quickly when repeated errors are encountered early in their ‘relationship’ with an assistant, particularly when their role depends on it.
With our material investment into R&D in this space, and lots of testing, we’ve released 4 modes (workflows) that our assistant can help users with: Support, Partner Onboarding, Insights and Compliance (read more about them below). If you’re interested in our journey, read on, or if you’d rather just check it out for yourself, you can sign up for a free account here.
Why it is so difficult to get the last 20%
In short, there are a growing number of offerings that make it easier to launch assistants that work; however, the challenge begins when you want to incorporate the technology into more business critical or automated situations requiring high degrees of accuracy and repeatability.
This process requires iterative and repeated testing across all the choices or parameters that become available along the way. The entire project quickly turns into an optimization problem, with even small changes to choices in how source documents are parsed or the choice of model temperature (a measure of how deterministic the model is) can sometimes have material impacts to output performance. Given the workflows we’ve built our assistants for, our focus has been on the accuracy of the information returned, and the repeatability - it doesn’t help if the assistant gives two team-members completely different answers to the same question.
Charting a course through the evolving AI landscape requires more than general tools—it demands specialized expertise. At Convictional, we’ve dedicated ourselves to fostering a deep understanding of AI that evolves with the technology. Our approach is proactive, by investing in knowledge and skills, we ensure that our offerings remain at the forefront of innovation.
The Optimization Problem
Toolkits offered by LangChain (an open source development library for building Assistants using foundation models like OpenAI), as well as the recent release of OpenAIs GPT-Builder and Assistants API, mean that standing up a simple prototype can be done with a few dozen (or less) lines of code in a python notebook. Further, a competent engineer can quickly build a web-serving app which can be incorporated into a frontend chat (typically another engineer). However, along the way you’ll have to make a lot of decisions:
- What model will you use? OpenAI’s GPT-4 is still the highest benchmarked model for most generalized tasks, however it comes with tradeoffs on cost, privacy, model willingness to reply and availability. Maybe you want to explore your own open-source hosted model (e.g. Llama2, we did - GPT-4 is still worth it in most cases), or maybe try other foundation models such as Anthropic’s offerings or Google’s PaLM/Bard (we also tried these, GPT-4 is still better in our testing).
- How will you get your documents? Sharing files is easy, and most users are used to doing this - but what if the content is on the web? Are you prepared to stand up technology just to retrieve this and keep it up to date? Furthermore, some tasks require generatively augmented documentation where documents are ‘pre-read’ by a language model and augmented with meta-data or additional generated content to better aid the model at the time of the user’s question.
- How will you store documents for retrieval? In order to make them available to your assistant when users ask a question, you’ll need to parse, chunk and vectorize the docs. You’ll need to decide which parser (document reader) to use for each doc type (here’s a short list of LangChain supported parsers, we have tested many), what chunking algorithm you’ll use (which breaks your documents up into smaller chunks with more focused topics), and finally the embedding algorithm you’ll use along with where you’ll store them (Embedding your chunks is required to perform a similarity search against a user’s question and can be stored in many ways: flat files, cloud providers such as Google’s Vector Search, or specific vector database providers such as Pinecone).
- How will you test? We’ve seen many offerings for building and testing chains - but they all seem to fall down for production workflows. We ultimately decided that we would build our own tests to evaluate the effects of changes to parameters, changes or supporting technology while still following language model output evaluation best practices.
- Prompt Engineering is real - I’ll admit that I was part of the crowd early on poking fun at the practice of prompt engineering, however it is clear that prompts are extremely powerful ways to tune and adjust the behaviour or outputs of the model, and the art of how you accomplish this is one grok’d through iteration and trials. Ultimately, the system prompts you provide become yet another (continuous and infinitely dimensional) parameter space you can explore.
- Finally, are you ready for the added tech debt? Like any production application, ongoing support and spend is required which is exacerbated by the specialized expertise needed to maintain and iterate on language modeling.
Putting this all together, it becomes clear that building a chain which reliably behaves and accurately replies, is akin to searching a parameter space like you would in any other machine learning optimization problem. However, given the need for user feedback, the iteration speed can be slow (although using a separate moderation agent goes a long way) meaning your search is limited to how quickly you can garner said feedback; it is not nearly as simple as hitting run on a bayesian parameter search algorithm and getting a coffee.
At Convictional, we have invested both people and money into searching these parameter spaces for our User’s problems. We’ve learned enough so far to introduce multiple, parameter specific, modes to support the Seller - Retailer relationship. Best of all, they’re all available to all users of Convictional - whether you’re looking for support in onboarding to Convictional or one of your partners, want to understand your Convictional data or are looking to understand the murky and confusing waters that is Retailer Compliance, you can do it with Convictional - while trusting that we’ve done and continue to do the hard work necessary to keep the experience state of the art.
An early lesson we learned was that the more specific we could define the task or problem, the more reliable the Assistant is when working in that mode. Learning from user feedback and behaviour, we have (so far) 4 modes that Users can select from to help them:
The first mode we launched, even when we were still in the stage of only offering it internally and via an integration with our Support platform via a completely custom retrieval chain from vectorizing ourselves and storing simple flat files in the cloud. It has come a long way, available in production since June, the Support mode helps users who need help getting things done in Convictional. With access to all of our support documentation, API documentation and other periphery guides (e.g. third party support documentation), users have been finding success without the need to wait for a support ticket and resolution.
Onboarding came next as a natural extension of the Support mode as both Retailer and Seller users wanted to add in their own documentation to better support the onboarding process. We allow users to easily upload files, which we then process behind the scenes in order to make them available to the Onboarding mode in a secure way - documents are only shared with connected partners in Convictional.
Insights mode was the toughest nut to crack. Rather than relying on RAG, Insights instead has access to tools which it can use to write SQL queries against the user company’s gated, and filtered Convictional data. Not only that, but users can save questions they want to see refreshed regularly (e.g. “Show me all of my delayed orders from the previous week), and export those results to CSV for downstream analysis. Think of insights mode as your own personal data analyst.
Navigating to the reporting section, users enter an analytics question which can then be saved so it will refresh daily (or ad-hoc). Users can also always click the familiar robot and choose ‘Insights’ to ask the same questions in a more conversational style
Compliance is our newest offering, and still very much in the ‘iterate and improve’ mode, where we are looking for companies willing to partner with us as we develop the future of B2B chargeback mitigation and dispute resolution. Compliance combines all we have learned about language modelling to retrieve current and up to date retailer merchandising requirement documents, intelligently chunk them and provide a Seller’s user with a portal to search or filter chunks based on their relevance to role, lifecycle stage, product type or any keyword. Within the ‘Guides’ section of the platform, the user can then select individual requirements in order to generate tasks lists specific to their PO, and leverage a built-in co-pilot along the way to help them understand and find relevant chunks.
I’d invite you to check out what we’re building and what else Convictional can do for your retail business here.
To Launch & Scale Your Digital Marketplace
Chat with us to learn more