It’s a game of 2 halves

This article ended up being even longer than my previous one and so I have decided to split it in 2.

This post deals with all the conceptual stuff. I think this is important, so I would urge you to read it, as it explains the rationale for the techy configuration part that is to follow.

However, if you just can’t wait, or if you simply want to jump straight into the write up of the technical configuration, head on over to Calling Azure AI APIs from a Function App: Pt2 (Technical Configuration)

Setting the Scene

In a previous post, I outlined my submission for the 2026 SharePoint Hackathon, K-Docs Publish (screenshot above), and promised to follow up with some detail on how I built it.

K-Docs Publish is a knowledge base solution that is hosted in a SharePoint site. The idea is that content managers maintain information in source Word documents (which will likely be hosted in some other site) and publish them as HTML article, viewable in SharePoint Site Pages. These articles can then be assembled into structured navigation (the tree view navigator on the left in the above screenshot).

When users select an article, the document content, the HTML loaded in the main section of the page with an in-document navigator and search/ask tools on the right, as shown below:

Everything is “AI” right now and I thought, cool as K-Docs Publish is, it would be sub-zero if I could leverage AI so that users can ask questions of an AI Agent which provides useful responses firmly grounded on the contents of the knowledge base.

That was the idea, and it worked and formed a core part of my submission for the Hackathon. Sadly, this submission for Hackathon failed to rank, but that’s ok.

“…meet with Triumph and Disaster, and treat those two impostors just the same”

Being a glass is half-full kind of guy, I am not deterred, as I think this solution has real potential to be useful in many business scenarios.

Why not just use Copilot?

Well, I could have built a Copilot agent for this (or at least I think I could) and it would probably work just fine, but I didn’t take this path because:

Not everyone can afford Copilot licensing for all their users (it ain’t cheap).
I envisage use cases where organisations might want to share knowledge with external guests and partner organisations and there can be no guarantee that they would be licensed for Copilot.
I figured that going directly to the AI services would give me more options (as indeed it does).

In short, I didn’t want a Copilot dependency. I’m sure Microsoft wouldn’t be so petty as to hold that against me. After all, they still charge customers to call their AI APIs, it’s just that they get paid based on usage rather than charging a flat license fee per seat, where some users would never use Copilot at all.

I also wanted to learn how to do this stuff – after all SharePoint is a journey not a destination – feeling sick anyone?

The Key Concepts

In my previous (rather long, but ever so important) article, Calling an Azure Function App from the SharePoint Framework, I told only half the story of my journey of discovery on how to AI enable a SharePoint Framework (SPFx) solution.

In that first article I explained that you can’t call the Azure AI services (provisioned in the Azure AI Foundry) directly but rather they must be called indirectly from an Azure Function App.

A Function App is not the only way I could have gone, I did consider a custom managed Azure API, but that sounded even more complex than setting up a Function App that was secured by Entra ID and if you read my previous post you will soon appreciate that this process was convoluted enough. Maybe I’ll explore the managed API option in a future post but for now I’ll stick with the Function App approach.

The Function App acts as a kind of broker in the middle and the process is like a figure-of-8 dance.

The SPFx solution (Web Part, Command Extension etc.) calls functions, defined in the Azure Function App and passes it data, harvested from the SPFx solution.
The function passes that data on to an AI API call.
The AI model processes the input and sends the response back to the function.
The function relays the response back to SPFx.

The data required to feed the AI API depends on which AI model you will be calling and I’ll cover that in the last 2 sections of this post.

The Second Twirl

The process is deceptively, and conceptually, simple right? And it is, but as always the devil is in the detail.

In my previous post I explained how to set up the Function App in Azure and secure it with Entra ID. You will want to do this for a production system as building unauthenticated Function Apps is a bad idea. A malicious agent could quite easily discover your function endpoint URL and call it anonymously and potentially rack up a huge bill by making spoof and bogus calls to your AI models – and we wouldn’t want that, would we.

So the first article was all about how to set up a secure Function App, one that can only be called by an authenticated user from an approved SPFx application. But in that article, I only covered half of the figure-of-eight. We did a simple spin of steps 1 and 4 and established that system worked. I used an SPFx web part as a kind of test harness to confirm that we could indeed call a function defined in an Azure Function App secured by Entra ID.

Between the rest of this post and Pt2, I will address the 2nd part of the dance, how to:

Provision AI models in the Azure AI Foundry and pass useful information from an SPFx solution via functions in the Function App.
Handle the response sent back from the AI model and return that back to the SPFx solution, via the calling functions.

In other words, we’re going to tackle steps 2 and 3 in my diagram, the second twirl if you will, and so finish the dance.

Before we get on to that I need to talk about the AI models we need to call because there are lots to choose from.

However, in this article I’m just going to focus on the two that I needed to build K-Docs Publish.

Text Embedding
Response Augmented Generation (RAG)

Text Embedding

Before I embarked on this journey, I had never stopped to think about how an AI would process the input we might provide it (in a chat box say) and figure out the semantics – my meaning. So that if I ask the AI agent, “Hey I’ve forgotten my password, what do I do?”, how it determines that this is semantically close to “How do I reset my password” but semantically distant to, “Why do cats sleep so much?”.

It turns out that this where an AI text embedding model comes in. The model can accept any text input and convert it into a vector of numbers. In our solution, that vector will simply be an array of numbers.

The length of the vector is determined by the model we choose and does not vary, for that model. The text-embdding-3-small model, which is the one I will be setting up, always returns 1536 numbers.

Now, we are encouraged not to think of this vector as simply and array of numbers (though that’s what they are) but rather to think of them as “dimensions”. So a vector is actually a point in 1536-dimensional space – IKR, it blows your mind! I image it as a 3D model of a digital landscape where each number is a node with an X, Y and Z coordinate and imaginary lines between nodes sketch out the terrain. Hey, that’s just me – give figure out your own way of dreaming electric sheep!

Let’s get back to a simpler but essential concept, and you will see why this is important.

Our knowledge base consists of distinct articles, now converted to HTML from their original MS Word format. When a user provides input (asks a question) we need to somehow determine, in a semantic way, which of these article are relevant and so should be used to generate a useful response.

If we generate a vector for the user input we can compare that (in terms of semantic relevance) to the vector for the articles in the knowledge base, using what is called Cosine similarity. In my head this is a way of looking at my digital landscapes and picking the ones that most closely resemble the landscape generated by the user input. I won’t get bogged down here with the maths, mainly because I don’t understand it, but suffice it to say, it is a tried and proven method for establishing the semantic relevance between vectors.

Now, the embedding model can accept text of arbitrary length and always transforms that into a vector of 1536 dimensions. So if we generate a vector for the user input and compare that with the vector for each article, we can (using Cosine similarity) determine a ranked order of article relevance.

We would of course need to generate the vector for user input at run time, because we can’t know in advance what input they will provide, although we could pre-calculate and the save vectors for Frequently Asked Questions (FAQs) – we’ll come back to that later.

However, we wouldn’t want to generate the vector for each article at run time – that would be way too slow (and expensive). What we would do is calculate the vector when the document is published and recalculate it only when the document is updated and republished subsequently, and then we’d store that vector somewhere in SharePoint, such as in a custom column we might set up on the Site Pages library.

Nice idea, but the problem here is one of granularity. Generating a search vector at the article level would be fine if we were implementing a search engine where each search result is an article. But for a knowledge base we’re going to need smaller chunks to process. This is because there might very well be parts of an article which are highly relevant to the user input, but which could get drowned out by surrounding, less relevant text.

Fortunately, most document are already structured into chunks by way of headers. For K-Docs Publish, the headings of the source Word document get translated into HTML header tags <h1>, <h2> etc. Unless the document is appallingly written, it is safe to assume that the content within the same header section is likely to be related to, or at least have some relevance to, the context of the information within that section of the document as a whole.

So if we can separate our articles into smaller consumable chunks, based on headers, we have our Unit of Knowledge (UoK) – hey I just made that TLA up, I wonder if it will stick? TLA stands for Three Letter Abbreviation in case you were wondering.

Each UoK (which I have just decide to pronounce oik) is then a chunk of a source article defined as either:

From the start of the document until it encounters the start of the first header.
From the start of a header until we encounter the start of the next header or reaches the end of the document.

There might be refinements on this, such as to discard tables of content and other such distracting artefacts, but you get the basic idea, we are chunking up each article into UoKs, and it is these UoKs for which we need to generate a search vector.

Hang on, if we’re going to generate a search vector for each UoK, we can’t easily store them in a column for each site page. That strategy works fine if there is a one-to-one mapping between site page articles and search vector, but when we chunk the document up, that’s no longer what we have. Now we have a one-to-N mapping where N is the (variable) number of UoKs we have extracted from each article.

The answer I settled on, to resolve this challenge, was to chunk up my documents into UoKs when they are published and then save them as separate list items in a list in the knowledge base site for this purpose. I call this the Extracts list and it means that we can use it to:

Store the extracted the raw text from each header section as an UoK
Generate a search vector for each UoK
Save that vector, and UoK in the same list item
Store a reference to the parent article
Store a reference to the parent book
Store a header reference (DOM Id)
Store an article summary

Now you might be thinking, what the hell is the “parent book”, let me explain.

In K-Docs Publish, I envisaged that the knowledge base would be defined with 3 distinct scopes:

Article Scope: The scope of relevance is bound to a specific article and therefore the only UoK that should be considered are those extracted from that source article alone.
Book Scope: A book is an arbitrary collection of articles which are in some way logically related to each other. In K-Docs Publish all the articles defined in the same tree navigation structure are related, or else you wouldn’t have assembled them in the same tree navigation. We call this a book, and it provides a scope that extends beyond the current article to include all other articles within the same book (navigation).
Knowledge Base Scope: A knowledge base can logically consist of several books, and it could be that any specific article might be included in more than one book. When we step the scope out to the entire KB we need to consider all UoKs extracted from all articles.

Right about now, you might be starting to worry about scale – I was!

Let’s say my KB will contain as many as 10 books and each book has twenty articles, on average. And say each article has 15 header sections (again on average), that makes for 3000 UoK extracts, well within the capacity limits for a SharePoint list, but that’s not what worries me.

Every time a user submits input I need to query the UoKs within the scope and find (say) the 10 most relevant and send that as context text to the RAG AI (see next section). Now when the scope is a single article, that’s just 15 or so UoKs, no problem. When it’s a book, I need to Cosine compare 300 UoKs. Ok, that’s getting up there but probably still manageable.

The problem comes when the scope is the entire KB as that means the vectors of all 3K UoKs would need to be Cosine compared and that is likely to cause an unacceptable delay in the processing pipeline. If it takes 20 seconds to just find the top 10 relevant UoKs that’s unlikely to be an acceptable response time – given that we haven’t even sent it to the RAG yet for evaluation!

The solution I settled upon was to build a system with 2-levels of vector. I decided to create a vector for each article as a whole, as well as a vector for each UoK chunk.

The idea is that when my scope is at the KB, I’d do a first pass and fetch say just the top 10 ranking articles, comparing the user input with the article-level vectors. Then I’d fetch the UoK vectors for the UoKs in just these top ranked articles.

In summary, the approach is to:

Perform an initial Cosine comparison at the article-level vectors, all 200 of them (10 books x 20 article in each)
Fetch the top 10 most relevant articles and discard the rest, leaving 150 UoKs to evaluate (10 best articles x 15 chunks in each)

This double pass methods does mean that I might potentially lose some quality but I’m thinking that this (hopefully modest) degradation will be a small price to pay for a responsive system

When outlining the attributes for the Extract list I indicated my intention to store the reference to the header in the parent article HTML and you may be wondering why. I can’t simply rely on headers in the source document being tagged with a unique identifier, but because I process the HTML before I store it, I can attach a unique Id attribute to every header and save that along with the raw text, and vectors.

But why would I want to do this? The answer is that this provides me with the possibility of generating deep links to header sections with the articles. When I send the UoKs to the RAG model, these essentially amount to article references and when I process the response I can then include a list of clickable references to the exact point in the article which is most relevant and not just the start of the article.

The Extracts list also contains a field for an article summary. We don’t need a summary for each UoK but a summary for each article would be very useful as it means that we might provide users with a pre-defined summary of each article, without the need to call the RAG AI model. K-Docs Publish also provides a simple traditional search capability and it turns out that having a ready-made article summary is useful for that, as it automatically provides the equivalent of a search result snippet.

In summary, what we have is a system which chunks up information stored within a repository of documents (HTML renditions of source Word documents) into Units of Knowledge (UoKs). Each UoK is processed by an AI driven Text Embedding model to generate a search vector is returned for each. When a user provides input, the system will find the most relevant UoKs and send them to the RAG.

One final point, when a document is republished we will need to regenerate a new vector for the article, as something will have changed in the document as a whole. But we most likely don’t need to update the search vector for each UoK. When we regenerate the UoKs we can compare the raw text of each UoK and only update those which have changed and so need to be reprocessed. This speeds up processing and reduces costs – winner, winner, chicken dinner!

Response Augmented Generation (RAG)

A RAG model is used to facilitate the communication between AI and the engaged user. This might be to generate a quick response to a simple question, or it might be a protracted dialogue as most of us are now used too when, using ChatGPT or Copilot.

We use the Text Embedding model to provide us with a means to assemble relevant context information so that we can submit it to a RAG model. If we didn’t do this, our RAG would have no awareness of our knowledge base, of if it did, it would not attach any specific important or relevancy of that information source over any other information source it can access i.e. the entire Internet.

When calling a RAG AI model such as a GPT variant, we will pass in the following:

User input: The user provided query or instructions.
Thread History: What has previously been supplied as user input, together with the AI generated response to that input, so as to maintain the ongoing discussion context.
Grounding Text: To tell the model how to behave and what is expected from it such as:
- “You are helpful and friendly agent called Brian” – you can name your assistant, why not?
- “You are a cat” – add a bit of fun by giving you assistant a personality and some character – you’ll be amazed as to how the AI will play along.
- “Only provide answers based on the context text” – tell it not to be a stray cat.
- “If you don’t know the answer just respond with ‘Sorry, I can’t answer that from the current information context’” – tell it what to do if it doesn’t have a reasonable answer.
- “Do not hallucinate” – don’t make shit up!
Context Data. Text from the relevant parts of the knowledge base that are most relevant to the user input. This is what we need the UoKs and their search vectors for.
Format Instructions. We need to tell the AI model whether to return the response in a table, summary dot-points, headers with a bullet list, in Markdown or as HTML etc.

We might be tempted to send the entire KB as the context data, but this would be a bad idea, because:

There will likely be too much of it and so processing will take too long.
The more context data you send the more AI tokens you use and so the costs go up.
If we use the above described text-embedding technique, so that we send just relevant data, not only does it improve performance and reduce costs, but it also makes the system far less likely to hallucinate – sadly just telling our assistant not to hallucinate as an instruction in their grounding text, does not guarantee that our assistant won’t misbehave.

Sending through the thread history is important, as this provides the means by which the AI assistant retains the context within a session. It is important to include both sides of the conversation, i.e. the user input as well as the AI responses.

However, it turns out that we don’t need to repeatedly send back the entire thread every time, just the last half dozen or so questions and responses will be sufficient. This also cuts down the payload, improves performance and helps to reduce costs.

What’s next?

This article has described (in some detail) what we need to do to AI enable an SPFx solution. In Part 2, I move on to provide the walk through on how to actually set the Text Embedding and RAG models up in the Azure AI Foundry and how to integrate them in the necessary figure-of-8 dance between an SPFx client, Azure Function App and AI model.

Calling Azure AI APIs from a Function App: Pt1 (Key Concepts)

It’s a game of 2 halves

Setting the Scene

Why not just use Copilot?

The Key Concepts

The Second Twirl

Text Embedding

Response Augmented Generation (RAG)

What’s next?

Like this:

Leave a ReplyCancel reply

It’s a game of 2 halves

Setting the Scene

Why not just use Copilot?

The Key Concepts

The Second Twirl

Text Embedding

Response Augmented Generation (RAG)

What’s next?

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Innovations in SharePoint