Introduction and Topic Overview
Priya Balasubramanian
Welcome, everyone, and thank you for joining our webinar. My name is Priya Balasubramanian, and I'm going to be your host today. We are very excited for today's topic: developing with Gen AI from strategy to implementation.
And you know, this is almost the end of the year 2024, and pretty much every software company we know of has been announcing a Gen AI product. But what does it really take to develop one, right? What does it take to make it work well enough so that we can use it in a production environment?
What are the security and privacy considerations, and how do we take care of those? Does building with Gen AI even fit your product strategy? And if so, how do you go about choosing models? What are those considerations? What drives cost?
These are just some of the questions that we are going to be talking about today. To help us do that, we are joined by two of Yotascale's very own: Jim Meyer, our VP of Engineering, and Jeff Harris, our Director of Strategy and Operations. Jim and Jeff have been very instrumental in building Yotascale's Gen AI Copilot product.
Before I hand it over to them, a little housekeeping: if, during the webinar, you have a question, please type it into the chat, and I will ask the panel. We’re also going to have a Q&A at the end of the session, so if you want to wait until then, you’re absolutely welcome to do so.
With that, let me hand it over to Jeff. Jeff, why don’t you start by telling us a little bit more about yourself?
Speaker Introductions and Backgrounds
Jeff Harris
Great, thanks, Priya. Excellent to be here today. My name is Jeff Harris, and I have been working at Yotascale with our customers and products for over 7 years now, so I've had a lot of time in this space. Prior to Yotascale, I spent some time at Google Cloud and VMware, and a startup before that.
Most of my time at Yotascale has been spent working with our customers, helping them adopt the technology, and taking their insights to turn that into useful products. I'll pass it to Jim.
Jim Meyer
Hi, I’m Jim Meyer. I’ve been helping out here at Yotascale since early this year. Prior to that, I was with Salesforce and Slack for about 6 years, and I’ve spent about a decade in the B2B enterprise software space with folks like HP and, even further back, Rackspace, running a public cloud.
I'm excited to be here to talk about this burgeoning landscape of Gen AI and the things that folks are doing with it. I hope to help folks start to understand how to think about it and how to go about making it something practical, all while managing it within a budget—because I think that's something we all seek to do.
With that, why don’t we get started? Jeff, as the product lead for Yotascale’s Copilot implementation, I'd love to hear your thoughts on how you looked around at the landscape of AI technologies and how you thought about the problem you were trying to solve.
Developing Gen AI Products: Strategy and Tools
Jeff Harris
Yeah, I mean, we started by first trying to figure out, as a team, what exactly was happening with this technology. This probably happened about a year and a half ago, when our leadership team pointed us toward Gen AI, suggesting it was something we should explore. It was all pretty new and nascent, so there wasn't really an expert we could consult on this in a lot of ways.
We wanted to truly understand the technology, so we began by spinning up a small team to learn, experiment, and survey what was out there. What we discovered was that there are API models you can call directly, as well as options to host your own model and run inference on your own or cloud infrastructure. There’s also the possibility of training your own models and fine-tuning them, so there are many different directions you can go, depending on your needs and capabilities.
Jim Meyer
That matches my experience over the last 24 months as well, experimenting with both popular large models and some of the open-source models, even doing our own fine-tuning. I'm curious, as you explored this, what interesting technologies did you come across that felt promising initially but didn’t quite align with our needs? How did you approach deciding on the direction that ultimately felt right for us?
Jeff Harris
Yeah, that’s a good question.
When it comes to capabilities, it’s about understanding what resources you have and what’s the best fit for your needs. Initially, we wanted to understand the technology so we could look at our product and determine where it could genuinely be useful. We didn’t want to simply add a chatbot and say, “Hey, it can talk to our API, so it’s going to improve everything.” We wanted to be specific about areas where we thought Gen AI could add real value.
During our initial exploration, we encountered many tools. For example, LangChain was a popular one that we started experimenting with. It was great for prototyping, but we found it challenging to have full control over it. Ultimately, we realized we didn’t need to build our own model. What we were really trying to do was create a new user interface layer within the product, which didn’t require us to be at the cutting edge of model development.
Given our more generic use case, we felt the availability of APIs, coupled with the minimal maintenance required to access this technology, was a great balance for the use cases we had identified.
Build vs. Buy: Choosing the Right Gen AI Approach
Jim Meyer
You know, that’s a really good point. You touch on how a lot of folks are looking at Gen AI’s popularity and trying to make a good build-versus-buy decision. Last year, when we looked at this inside of Salesforce, there were similar considerations. Even if you have the budget to build and train your own models, it’s frequently not the right choice.
It’s incredibly costly. You have to find good training data, and there’s a lot of work involved to ensure your training dataset aligns with the types of problems you want to solve. You’ve probably seen articles about the literal billions of dollars being spent to bring a trained model to market. Even if you’re focusing on a narrower, less general-purpose model, you’re still looking at costs that easily reach into the millions.
It’s one of those choices where you need to be sure you’re going to get considerable impact and better results before you decide to invest. I’m curious, Jeff. Did Yotascale go through much of a deep consideration around this? Was there ever a serious question of whether you needed to train your own model, or did it quickly become clear it didn’t fit the strategy? How did you approach forming that strategy?
Jeff Harris
Yeah, it was a pretty quick “no” on training our own model. We saw some potential value in fine-tuning a model, but even at this stage, that doesn’t make sense for us. There’s a cost to that as well, and you still need training data. From what I’ve understood, the benefits aren’t significant—at least for the types of use cases we’re looking at. You can achieve similar outcomes by providing guidance within the prompt itself.
Fine-tuning is more useful when you need a model to respond in a specific way or tone, and we found that for our use cases, tuning through prompts has been sufficient.
Optimizing User Experience with AI Integration
Jeff Harris
Regarding our strategy, it was about finding points where we could interact with users in ways that reduce friction. For example, we looked at areas where interactions were complex—like trying to select through multiple filters or adding a filter for a specific tag value. If a user could simply type their question and the LLM translates it into an API call, understanding not only the question but also the cloud context, the business context, and the user’s role within the organization—that’s where we saw value.
Another area we identified was proactively reaching out to users to bring a delightful experience. For instance, contextualizing information based on the user’s role and organizational context. So, our strategy was really based around enhancing the user experience through Gen AI APIs.
Jim Meyer
That makes sense. You also touched on how Gen AI can help manage complexity without removing it, especially in fields like cloud cost management. There’s always a high level of detail, and users want the ability to drill down precisely where they need visibility. So, while you can’t remove complexity, Gen AI can help users navigate it.
I’m curious—what challenges did you face in helping users navigate that complexity? What tools did you use to guide users’ questions and ensure they get responses with the information they need?
Jeff Harris
Yeah, no, this is not straightforward. The first step was, let’s just see what happens when you ask it a question. Then we extended that, thinking, “What if we gave it the ability to create the JSON for an API call? What if we let it get the response to that API call and translate it?”
When we first started, the model didn’t understand the context of what the user was asking. It couldn’t consistently relate the questions back to the cloud computing world, the tag keys, or the attribute keys within that space. In Yotascale’s context, users can also enter their unique business contexts—business units, teams, applications, systems—which it also didn’t understand.
We had to inject that information at the right time, within the right context, for the right user. So, when they ask a question, the system knows what attributes are available to them. Initially, the responses weren’t very accurate because it couldn’t fully grasp what the user was asking. We had to think of ways to gradually chip away at this issue by providing context to it in an efficient and accurate way.
Jim Meyer
That makes sense. We’ve touched on a lot of individual pieces along this journey. I’d like to pull us back a bit to help listeners think about their strategy—what makes sense for them, their product, and their company, and how they might evaluate how deeply to go and where.
I’ll just share an observation: right now, I have a bunch of friends in startups who recognize that if “AI” doesn’t appear in your pitch deck, funding doesn’t appear in your bank account. There’s an overwhelming drive to say, “Hey, we’re an AI company.” But there’s also a deep need to understand how it actually fits for you. It’s easy to think, “Oh, we don’t do something that AI would help with,” only to find yourself six months or a year from now facing a market where you’re no longer relevant because others are solving the problems you aim to solve, only faster or better with AI.
From a strategic level, how would you guide someone in thinking about whether and where they should incorporate AI?
Jeff Harris
Yeah, that’s a good question. So, deciding whether you should implement AI and where it would be beneficial—it’s important to avoid starting from a place where you’re simply feeling pressured to add AI because the organization says, “Hey, we need AI in the product to gain traction.” If that’s your starting point, it might not be the best approach. I understand that pressure exists, but it’s better to focus on what you’re trying to accomplish and then see where AI can make that easier for the user.
Gen AI is often excellent at tasks like categorization. For instance, if you have something that doesn’t require a complex categorization algorithm and can be accurate 80-90% of the time, it could work well if slight inaccuracies are acceptable.
Before deciding to incorporate AI, it’s crucial to understand your specific use cases and user expectations. My personal view is that there’s significant opportunity in enhancing user experience and interaction. It’s not just about adding AI to everything possible; be strategic and think about the value it adds for the user. Otherwise, you’re just checking a box, and it might not deliver the value you’re aiming for.
Jim Meyer
I think that’s very true. You remind me of something from Brett Taylor, former Co-CEO of Salesforce, now leading Sierra, an AI startup aimed at helping support agents meet customer needs. It’s what I’d call an “art-directable” approach, where AI-enabled agents assist in resolving issues, allowing one person to operate at a much larger scale.
Brett’s observation was that chat-based Gen AI—essentially LLMs in conversation—represents as revolutionary a user interface as the graphical user interface did in the ’80s and ’90s. This isn’t just a matter of application; it’s a whole new interface and way of interacting with products. That’s an essential consideration, as you noted: what can Gen AI change about the user experience? How can it help users get what they want more quickly?
Another smart step is finding AI-enabled tools that fit your company and letting people use them. This experience helps develop intuition about where AI is effective. There are, of course, areas where these tools struggle. We’re all familiar with the concept of “hallucinations”—where the AI confidently provides answers that are 100% wrong.
For instance, I have a project with Claude where I ask about local stores. It sometimes recommends stores that don’t exist, but often it helps me discover new places I can actually visit. And as you mentioned, Jeff, if it’s right 80-90% of the time, that remaining 10% might require a quick Google Maps check. Gaining this experience helps you look at your product and see where AI would make sense, rather than just “AI-washing” it—slapping an AI label on without real value.
Customers will quickly recognize if AI just looks shiny without adding any real benefit. They’ll know it’s adjunct or useless if it doesn’t improve the experience.
Jeff Harris
Yeah.
Jim Meyer
Any other advice for someone considering incorporating Gen AI into their product? If not, we can talk about cost next.
Jeff Harris
I do want to add one more thing, which I think is essential: the “human-in-the-loop” strategy. This approach is critical, especially since AI isn’t 100% accurate but still helpful. It’s about advancing the workflow in ways that make it easier for the user to proceed.
For instance, AI can offer suggestions that are mostly relevant, saying, “We think these items should be grouped together. Do you agree?” Even if the user disagrees on some items, they can still approve the ones that make sense. It encourages actions they might not have taken otherwise, like grouping items without needing to navigate to a specific section of the product. Finding these opportunities can be a huge enabler.
Jim Meyer
Yeah, that makes a lot of sense. I often notice that when technology can handle repetitive work, like grouping items, it’s fantastic. But you never want to eliminate human judgment—it’s something we haven’t automated, and I hope we never will.
Providing a judgment tool that suggests groupings, while allowing users to adjust, is valuable. Plus, you can feed that data back into the system, improving future assumptions and making the AI even more effective at suggesting accurate groupings.
Jeff Harris
Yep.
Managing AI Complexity and Costs
Jim Meyer
Well, let’s switch topics a bit. So, you’ve just implemented your brand-new, shiny AI-driven features, and then you get your bill. Suddenly, you’re faced with understanding how everything is priced and how to use AI efficiently. I know you have a lot of experience in this area, so I’d love for you to talk about that.
Jeff Harris
Yeah, this is where Yotascale’s background crosses over with cloud computing and usage-based costs for AI APIs. When we started, we thought, “Let’s keep an eye on the cost, but we don’t have to worry too much.” We weren’t operating at a large scale yet. But as we began rolling it out in production, we wanted to monitor costs more closely to see if we were doing things efficiently.
There's not much guidance on how these APIs work if you’re unfamiliar with them, especially those from providers like Anthropic or OpenAI. They charge based on token usage, but token usage is not as straightforward as just paying for the message sent and received. There are many components that can add to the cost, depending on how you use the APIs. OpenAI, for example, offers many additional services that can significantly increase costs.
It’s not only about model choice; what you send and receive impacts cost. I’ll share a graphic to illustrate this concept. Think about asking a question like, “How much does an elephant weigh?” Beneath that question, there are system instructions going to the LLM, message history, and possibly other data if you’re using a retrieval-augmented generation (RAG) system, where stored data is accessed to add context to the prompt. Each of these elements costs money, as does the storage and retrieval of data.
There’s a helpful tool called Tiktokenizer that breaks down how token counting works. It shows you how each message—including the system message, chat history, and additional context—accumulates tokens. Your question might be only a few tokens long, but the system message and context can add three to five times that amount, plus the cost of the response itself.
For pricing, OpenAI makes it simple by charging $2.50 per million tokens. But how many conversations fit into a million tokens? That depends heavily on how you’re using the LLM, and tracking token usage at granular levels isn’t easy. There are tools that can report back the total tokens used, but tracking these things in detail is still challenging.
GPT-4, especially its “Mini” version, is among the cheaper options. Still, choosing between models like GPT-4 or the “Mini” version requires a strategic approach. Sometimes we need the full GPT-4 model, but often, we can work with the Mini or an older version, which is nearly 10 times cheaper.
Then, you start to think more about efficiency: Are we sending too much information in the prompts? Can we get by with a one-time prompt instead of maintaining a full message history? The details of each API call and the information in the context matter a lot when optimizing costs.
The tool I mentioned, Tiktokenizer, is quite useful. You can type in a prompt, see the token count, and break down how many tokens each message uses. It’s intense how complex this can get, but we tend to ignore some of it initially to get things done, then focus on efficiency once it’s in production.
Jim Meyer
Yeah, in engineering, we often talk about “shift left,” where we aim to move processes earlier in the workflow. A common example is testing. Instead of testing at the UI level, which is intensive and fragile, we try to shift it further back in the stack, so we can more efficiently assess software quality. This approach with token usage feels similar. How much of the conversation can we move into a system prompt to avoid carrying the full conversation history and reduce token usage?
I also like your iceberg illustration as a clear representation of how much is hidden below the surface—and the water is indeed a bit murky. For example, GPT-4 charges around $2.50 per million tokens, but GPT-4 Turbo is closer to $10. One of my go-to system prompts is: “Give me brief responses. Don’t provide code or new text until I say please do.” That way, it limits verbosity, which can add up in terms of tokens and cost, especially when you’re using the API to build an app rather than chatting directly in the interface.
Jeff Harris
Yeah, and another cost tip: don’t try all the new features without checking the price. We learned that the hard way when we activated the live voice conversation preview, which turned out to be quite expensive. It’s interesting how many services they showcase, like in ChatGPT, which come with hefty price tags.
Jim Meyer
That makes sense—they’re trying to recover the costs from training these models, so they’re adding features across different interfaces, whether that’s text, voice, or images. We’ve spent a lot of time discussing Yotascale’s Copilot. Would you mind giving us a quick demo to show what we’ve built?
Jeff Harris
Yeah, happy to. Let’s take a look at some of these features. I’ll share my screen and show a few places where we’ve integrated Copilot within the product.
One of these areas is the Cost Analytics view. Here in our demo tenant, SAS Tech, the finance team has a specific view based on business units, teams, and services. This contextual information is provided to the assistant, so it already knows about our organization’s structure and cost allocation. It also understands the nomenclature for AWS, GCP, and Azure.
For example, I might ask, “Which business units have been driving costs over the last three months?” The assistant responds with a straightforward answer, listing departments like Product Development and Research and Development. These business units are directly tied to how our finance team sees the world, and they’re kept up-to-date programmatically through APIs. We also provide supporting charts and details, including specifics on the API call and the data used to construct this answer, which helps address data trust and hallucination concerns. We’ve architected this so that the data presented is derived directly from API calls, effectively eliminating the chance for hallucinations.
Additionally, I can follow up with questions. For instance, if I ask, “Which teams are driving costs within Product Development?” the assistant interprets my question, even if it’s slightly misspelled or abbreviated, and provides an accurate response listing the teams within that business unit. This allows for a more conversational flow to obtain deeper insights.
Another area we’ve integrated Gen AI capabilities is within our Lens Allocation view. A “lens” in Yotascale is a cost breakdown, which can be overwhelming if you’re unfamiliar with it. Here, the assistant can help by answering requests like, “Can I see my costs by environment and region?” It automatically creates a hierarchy based on our organization’s tags and the regions where those tags are found. This feature allows users to set budgets, monitor anomalies, and get visibility into costs across environments without manually creating rules. It’s a significant time saver.
Ensuring Data Privacy, Security, and Compliance
Jim Meyer
That’s a great example of using the tool to navigate complexity. You can see the environments and teams and sort things to get the specific view you need. It makes a lot of sense. You also touched on an interesting feature—when companies add AI, customers often have concerns about security and privacy. Questions like, “Where’s my data going?” and “What are the compliance levels?” come up frequently. Would you talk about how we’ve architected this to ensure data security and privacy, especially regarding not sending customer data down the pipeline?
Jeff Harris
Absolutely. As an enterprise SaaS company, we take data privacy very seriously. One of the first questions customers ask when they see Gen AI in the product is, “Who’s getting my information, and will it be used for training?” We have a strict policy: your data is not used for training. We’re using inference models purely for the user interface and not for data computation.
Our setup means that the AI generates an API call structure, which then internally calls Yotascale APIs to retrieve data. The LLM then provides a wrapper explaining that data back to the user. This approach ensures that role-based access control (RBAC) is inherently maintained. There’s no risk of data crossover, where one customer’s data could be accessed by another or viewed by an unauthorized user. By keeping the LLM as just the user interface, we’ve effectively isolated sensitive operations from the AI layer.
Jim Meyer
Yeah. What I like about that is it makes it very easy to address customer concerns. Their data never actually leaves our “four walls.” We teach the LLM how to ask us questions, and it then responds by filling in the blanks, almost like a Mad Lib. So, from a user’s perspective, they get data in the right place and an accurate answer sourced directly from our APIs. We didn’t have to send cost data out, nor did we risk issues with how LLMs sometimes handle numbers.
There was a funny example recently—a toy calculator that looked like a basic 10-key promo calculator, which converted math equations to words. The answers it provided were often amusingly off. Though these models are improving, as Sam Altman himself has said, we’re currently dealing with the “stupidest models we’ll ever have.” They’ll only get smarter as we identify issues and progress, though it might come with a higher cost. Fortunately, economizing methods can help manage that.
Priya Balasubramanian
We have a question that just came in on this topic: How do you think about Gen AI features from a compliance and governance perspective?
Jim Meyer
Oh, that’s a great question. I’d start by saying that if you haven’t already automated most of your compliance processes—meaning, if you’re not in a state of continuous compliance and using smart tools for it—I strongly recommend that. When you bring in tools like Gen AI, you need to feed data into those compliance tools to monitor and ensure continued compliance.
Much of compliance is managed in the architecture, as we just discussed. Make sure the promises around data privacy, data separation, and security are being met within the product’s design. That includes implementing these features, running tests in your software development lifecycle, and maintaining checks in your production environment to confirm data stays where it should.
It’s essential to build security and compliance into the architecture so it’s maintained by default, only straying from compliance in extraordinary circumstances. Anything you’d add, Jeff?
Jeff Harris
You know, that covers it pretty well.
Jim Meyer
Fair enough. Priya, do we have any other questions? We’ve kind of reached the end of the main points we planned to discuss.
Measuring ROI and Budgeting for AI Development
Priya Balasubramanian
Yes, we do have some more general questions, especially around ROI—like why someone would want to adopt AI and what monetary gains they might expect. One question we received was: How do you think about ROI for your Gen AI features?
Jeff Harris
Yeah, it’s a great question. From the product side, specifically, we focus on engagement. We’re aiming to drive more engagement and adoption of features within the product, making it easier for people to get answers. So right now, we’re looking at metrics like the number of questions asked and interactions with Gen AI to see if people are using it more.
We also do some qualitative analysis to ensure those interactions lead to positive outcomes for users. Then there’s tracking the cost—monitoring what we’re actually spending on providing these features. So far, it’s been worth it; we’ve been surprised by how efficiently we can deliver answers at a relatively low cost once the features are live.
Priya Balasubramanian
OK, thanks. The next question was: How do you budget for development and production costs?
Jeff Harris
Yeah, that’s probably more of a question for Jim. During the development phase, we monitored costs and had a general idea of the limits we didn’t want to exceed. But after the exploration phase, we handed it to Dev, and Jim handled the implementation, so he can explain their approach to budgeting on that side.
Jim Meyer
Well, when you’re busy experimenting, you set a sandbox, right? You decide, “We’re going to try to stay within this budget,” but you’re learning as much about the costs as you are about how to use the tool effectively. You might get an outcome and then ask, “Is there a way to achieve that same outcome at a lower cost?”—maybe by managing tokens better or optimizing usage.
When you project a budget, everyone understands that nothing goes into production exactly as expected. So, you immediately set up monitoring, watch costs carefully, adjust expectations, and look for optimization opportunities. There’s always a trade-off: spend engineering time optimizing or spend it extending the feature. You balance that based on your needs, and sometimes, you even welcome a surge in engagement, even if it pushes costs higher than anticipated. It’s a great problem to have.
Priya Balasubramanian
Mm-hmm. That segues nicely into the next question: When do you know you’re done—ready to stop experimenting and go into production?
Jeff Harris
Well, that’s maybe a different question, because we’re never really “done.” But as for going into production, that’s about building confidence through testing. When we first released the Copilot feature, we focused on use cases we felt confident it could handle well. At first, it was about answering cost questions based on the user’s context. Once we got that initial use case working, we decided it was good enough to release—it solved a real pain point.
From there, we looked at what additional use cases we wanted to tackle and what users were asking that it couldn’t yet handle, then continued to build from that foundation. So, it’s about identifying one solid use case, gaining confidence in its performance, and then releasing it to solve that immediate need.
Jim Meyer
And that’s exactly it. It’s about focusing on the value you want to deliver. I always go with Reid Hoffman’s definition: if you’re not a little bit embarrassed when you launch, you waited too long. You need to get feedback and let customers pull you in the direction it needs to go.
Challenges, Lessons, and Future Opportunities in Gen AI
Priya Balasubramanian
OK. The next question is: What’s the most surprising discovery you’ve made as you’ve progressed in this journey? Looking back, what would you have done differently?
Jim Meyer
That’s a good one. Jeff, why don’t you start us off on that, and I’ll follow up.
Jeff Harris
Yeah, I mean, I think the most surprising thing was how quickly you could get something really basic out there, and then how much work it took to tune it to a point where you were happy. It’s amazing how fast you can get something that works. At first, we thought, “Hey, we might be able to get this out in a month.” But then, as you start refining it, you realize, “Oh man, there’s so much more to do here.”
You get confident with the initial response, but then you start finding little issues around the edges that need fixing. We ended up spending more time than expected refining it after that initial quick success.
Jim Meyer
Yeah. And as for the second part—what we’d do differently—I think I’d have planned differently for iterative follow-ons. You want feedback to guide you, but once you find enough value, it’s about extending that value across your product ecosystem. I think we could have done a better job preparing to spread its wings a bit faster, balancing that with deepening what we already had.
Overall, I’m happy with what we’ve achieved, but if I were to do it again, I might focus on breadth as fast as possible.
Priya Balasubramanian
OK. And here’s another interesting question that came in: How do you hire or develop good prompt engineers or prompt engineering skills?
Jeff Harris
Jim, what do you think? You handle the hiring side much more than I do.
Jim Meyer
Well, that’s the thing—it’s all new, and each time a model iterates, the core principles of prompt engineering stay the same, but what’s most effective can shift based on how the new model responds to different prompt structures. I think the main approach is similar to any young practice: get involved in communities where people are actively discussing prompt engineering.
There are numerous communities where people swap prompts and learn from each other’s experiences. Getting involved in those spaces helps with understanding current trends, and it’s also where talent gathers. Being part of those conversations, sharing what you’re trying to achieve, and seeing who gravitates toward it is valuable. It’s very similar to the approach of hiring for specialized knowledge in a startup—find where the expertise is, be present in that community, and become a known entity.
Also, it’s important to recognize that prompt engineering is a unique type of engineering—it’s strategic and valuable but not quite like traditional coding. It requires a toolchain mindset, where you treat prompts almost like code by versioning and organizing them. You need people who are comfortable with some ambiguity because you’re often at the frontier, trying things out and seeing what works best.
So, you’re looking for individuals who can work methodically in research, then translate that into effective development practices. Sorry, that was a bit lengthy!
Priya Balasubramanian
No, thank you—that makes sense.
Jeff Harris
Yeah, and just to add to that, I’d say an experimentation mindset is crucial. Jim alluded to this too. You need to form a hypothesis, test it, and be ready to iterate. Since these models are non-deterministic, even minor changes to wording can lead to different responses, so having that experimental approach is essential.
We don’t have a dedicated prompt engineer; it’s a combination of engineers who know Yotascale well and product input from our side. On the product side, we can tweak prompts easily, which lets us add value to prompt engineering efforts without needing a specific role for it. But yes, experimentation is key.
Jim Meyer
And you touched on two other important points. First, once you have a working prompt, you’ll want to think about optimizing it—maybe by reducing tokens or refining the prompt to get the same result more efficiently. And second, having domain knowledge is invaluable. Our engineers, as you might imagine, have a deep understanding of cost management and FinOps. That expertise allows them to craft better prompts than someone who’s just a prompt engineer without domain knowledge. This depth helps us hit the target faster.
Priya Balasubramanian
You’ve highlighted several challenges—ambiguity, the newness of the field. Are there any other challenges around prompt engineering that we haven’t covered?
Jeff Harris
I don’t know how others approach it, but when I first started working on prompts, it felt like just a wall of text. Over time, though, I realized you can inject dynamic information into the prompt, and it behaves differently depending on what’s included. So, you have a lot of control beyond just the static text—considering what context to add, for instance.
Sometimes, you may even need a chain of prompts. We use this approach for some cases: if a user asks a question, the system might first need to clarify an attribute before it can answer accurately. So, you can get creative with setting up prompts and designing the process from the moment a user asks a question to when the LLM receives the prompt.
Jim Meyer
Yeah, and there’s an interesting future opportunity here. In programming, when you want to achieve a certain outcome, there are often multiple ways to express it. Compilers in mature programming languages can recognize these variations and optimize code behind the scenes, making it faster and more efficient.
I think prompt engineering might see similar advancements. For instance, if the system recognizes that certain phrases yield better results, it could automatically translate user inputs into optimized prompts for more accurate answers. This kind of behind-the-scenes prompt optimization could improve user experience by fine-tuning responses based on subtle language adjustments.
Priya Balasubramanian
Hmm, interesting. It reminds me of the early days of Google, when we had to phrase questions correctly to get the right answers. Now it sounds like there’s a similar need for “translation” in prompt engineering.
Jim Meyer
Technology is training us better and better every day.
Priya Balasubramanian
And one last question—this one’s from me! We have all these different foundational models, and I’ve always been curious: are certain models suited for specific use cases, or are they all pretty much the same?
Jeff Harris
I think most popular models today are quite generic, especially the advanced ones—they’re versatile enough for a wide range of applications. For example, OpenAI’s GPT-4 has an image generation API as well, so it can handle various tasks.
However, there are more specialized models out there, such as models tuned specifically for coding, image generation, or diffusion-based tasks. These are generative in nature but tailored for particular outputs. Another thing to consider is that you might not always need the “Cadillac” model—a more basic “Toyota” model can sometimes suffice.
For smaller tasks, like basic categorization, deploying a local model, such as a LLaMA-based LLM, might be more efficient than relying on large-scale APIs. So, yes, there are definitely specific models for certain use cases.
Priya Balasubramanian
OK, thanks. That brings us to the end of the questions we have. Jim, Jeff, thank you very much. This has been both educational and enlightening. I found it very interesting. For everyone watching, I hope you enjoyed this as much as we did putting it together for you.
If you enjoyed this webinar, please follow us on LinkedIn, where you’ll find all the interesting content we have coming your way. Jeff, do you want to share contact information for you and Jim in case people have any questions?
Jeff Harris
On the screen here doing that.
Priya Balasubramanian
Great, we’re putting it on the screen, so if you have any questions, please feel free to reach out to them, and they’d be happy to answer. So that’s Jeff Harris and Jim Meyer.
Thank you both very much. And, of course, if you’d like to see Yotascale’s Copilot in more detail, feel free to contact us at Yotascale.com to schedule a demo. Other than that, thank you, everyone, and see you in our next webinar!