Microsoft’s Build developer conference has pulled back the curtain on how it wants developers to add custom content and application integration to its Copilot applications. It’s an approach that should make them more relevant and less likely to go off the rails, focusing their output on specific tasks.
It’s important to understand that once trained, a large language model like GPT-4 needs additional data to keep focused. That’s why Microsoft’s various Copilots are built on top of its own data sources: GitHub, Power Platform, Microsoft Graph, and, most obviously, Bing. It’s a mostly successful approach that reduces the risk of hallucinations and prompt overrun, but it’s still putting Microsoft-defined limits on its AI platform.
As it stands, Bing’s Copilot can only answer questions about Bing’s search database. And while that’s massive, it’s unable to answer questions about data inside a user’s firewall or from applications they want to use. The service is unable to take those responses and feed them into other applications using additional results to either format its output or run an interaction on the user’s behalf. Users can ask Bing Chat to name the best restaurants in New Orleans or give them an itinerary for a three-day trip, but it won’t book them a table.
Jump to:
That’s where plugins can help, providing additional data sources and new interactions. Users can already use plugins that have been built for ChatGPT, and Microsoft is building on the same plugin architecture for its new Bing plugins. Initially, it’s offering OpenTable and Wolfram Alpha support, with plugins from services including Expedia, Instacart, Zillow, TripAdvisor and more to follow. So, for example, if someone is using the Instacart plugin, they can quickly turn a menu from Bing into a shopping list and then into a delivery order for ingredients that are not in their cupboard. Amusingly, these plugins will include one for ChatGPT itself.
Microsoft is going further: That common plugin model is also being used for Microsoft 365’s Copilot and AI tooling in Microsoft’s Edge browser. Having a common model for LLM plugins makes a lot of sense. It allows code to be written once and reused across all of the users’ different applications.
Working with a standard plugin architecture allows a user to offer their code to other users and organizations, so if they have built a tool that can integrate a Salesforce app with Bing Chat, they can sell it as a product or make it open source and share it.
So how do users build a ChatGPT plugin? Plugins are interfaces between existing application APIs and ChatGPT, with manifest and OpenAPI specifications for the APIs they’re using. The Bing Chat service acts as an orchestration tool, calling the APIs as needed and formatting responses using its natural language tools.
With these tools, users can ask, “Can you tell me all the deals that closed in the first quarter?” and have Bing Chat connect to their customer relationship management system and pull the required information from their sales data, displaying it as a chat response. They can then follow up, asking if they need to order more raw materials, with another plugin linking to an enterprise resource planning platform, checking stock levels and then asking if they approve ordering any required materials and components.
The result here is to support users working with the applications they normally use, orchestrating interactions and turning what could be complex tasks into microwork, allowing them to work on other tasks in depth.
Building extensions on existing API definitions and a standard definition format should simplify development. If a user has not built an OpenAPI definition of a REST API, they can use tools like Postman to generate one automatically. The description fields of the OpenAPI definition can help Bing or ChatGPT generate text around their queries and help them choose which API to use. The resulting plugin definition is added to the LLM’s prompt (hidden from the chat UI) but still counting against its context and using up tokens. It’s important to remember that plugins need to be called directly by users; they’re not available to all queries.
The first thing to do is to build a manifest for their plugin in YAML or JSON. The user will host it themselves in a specific folder at the top of their domain with a pre-defined name, so it’s easy for the GPT host to find it. Usefully, the OpenAI plugin specification includes ways of handling authentication so they can ensure that only authenticated users have access to internal APIs. Using OpenAPI descriptions allows users to restrict GPT access to aspects of their APIs as they can edit the API definition to hide calls they don’t want it to make. For example, someone could only allow reads on an API that has update and delete capabilities.
Plugins don’t add data to Bing or ChatGPT; they add direction and focus to its output, only running when requested by a user and only returning data that is part of a response to the original query. Users need to avoid returning natural language responses — the GPT model will generate its own responses around the data from their API.
One useful feature of the plugin manifest is a “description for model” attribute that allows users to refine the prompt that is generated from the API description, providing a place to add more instructions. As users test their plugin, this is how they can add additional control to how it gets used. ChatGPT provides a way to debug plugins by showing the requests and responses, usually in JSON format. This helps them understand what data from their applications is used by the AI, if not exactly how it’s used or how the original request was generated.
More complex plugins can work with vector databases to extract and use documents. This approach is likely best used for applications that need to work with a user’s document stores, which can be pre-processed with embeddings and indexed with a vector search to speed up accessing complex business information that can generate documents based on responses from other applications, using the most relevant content to structure any generated text.
Another interesting option is using existing Teams message extensions with the Microsoft 365 Copilot. This approach can simplify quickly adding AI to existing Teams bots, linking a user’s web services to the Copilot via the bot framework. What’s most important here is ensuring the app description and the skill parameters are used to construct the Copilot LLM prompt along with any content requests in the extension. Outputs are delivered as adaptive cards embedded in chat sessions. There’s even the option of modifying an extension to make it a fully conversational system, working through the GPT-4 model that underlies most Microsoft Copilots.
Microsoft’s approach to extending Bing and its other Copilots is a good one for now. It is still the early days of generative AI, so having a standard plugin format makes a lot of sense, allowing APIs to support more than one AI platform and reducing the need to build the same plugin many different times. Code that works with ChatGPT will work in Bing Chat and Microsoft 365 and anywhere else Microsoft adds Copilot functionality in the future.