SharePoint platform as a source for AI modeling

Sergii Bielskyi
4 min readNov 5, 2023

--

It is not a secret that SharePoint is a great place to keep own/corporate information including documents, presentations, images, and notes. That means historically you/your company have a huge amount of data in one place.

In our days a lot of companies are working with optimizing processes by integrating AI part inside the company. And the question is how SharePoint can help with it. In other words, how we can use SharePoint as a main data source to connect with AI tools?

Right now Microsoft provides Azure OpenAI as a main AI service to build AI tools by using ChatGPT. For example, chat playground to build own AI assistant.

Reference is here Data, privacy, and security for Azure OpenAI Service — Azure AI services | Microsoft Learn

The benefit of this you can add your own DataSource to the base model and use the data. There several types of connections can be added:

  • Blob storage
  • Cognitive search

How about SharePoint? In this case, we can use a Cognitive search service to connect our SharePoint platform. But to do this we need to configure Azure Cognitive search service to use. You can find a nice article about the configuration of SharePoint indexer (preview) — Azure Cognitive Search | Microsoft Learn but in my case, I had different cloud tenants and subscriptions using SharePoint and Azure services and this particular approach did not work for me. I will explain this issue later. Regarding this article, you need to configure App Registration first to have all the necessary permissions. SharePoint indexer (preview) — Azure Cognitive Search | Microsoft Learn. Then, you need to create a source, index, and indexer. It should be done by using REST API requests.

DataSource

POST https://{your name}.search.windows.net/indexers?api-version=2020–06–30-Preview

Authorization api-key — {your key}

{
"name" : "sharepoint-datasource",
"type" : "sharepoint",
"credentials" : { "connectionString" :
"SharePointOnlineEndpoint=https://{your tenant}.sharepoint.com/sites/site;
ApplicationId={ClientID};
ApplicationSecret={ClientSecret};TenantId={TenantID}" },
"container" : { "name" : "allSiteLibraries", "query" : null }
}

Index

POST https://{your name}.search.windows.net/indexers?api-version=2020–06–30-Preview

Authorization api-key — {your key}

{
"name" : "sharepoint-index",
"fields": [
{ "name": "id", "type": "Edm.String", "key": true, "searchable": false },
{ "name": "metadata_spo_item_name", "type": "Edm.String", "key": false, "searchable": true, "filterable": false, "sortable": false, "facetable": false },
{ "name": "metadata_spo_item_title", "type": "Edm.String", "key": false, "searchable": true, "filterable": false, "sortable": false, "facetable": false },
{ "name": "metadata_spo_item_path", "type": "Edm.String", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
{ "name": "metadata_spo_item_content_type", "type": "Edm.String", "key": false, "searchable": false, "filterable": true, "sortable": false, "facetable": true },
{ "name": "metadata_spo_item_last_modified", "type": "Edm.DateTimeOffset", "key": false, "searchable": false, "filterable": false, "sortable": true, "facetable": false },
{ "name": "metadata_spo_item_size", "type": "Edm.Int64", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
{ "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
]
}

Indexer

POST https://{your name}.search.windows.net/indexers?api-version=2020–06–30-Preview

Authorization api-key — {your key}

{
"name" : "sharepoint-indexer",
"dataSourceName" : "sharepoint-datasource",
"targetIndexName" : "sharepoint-index",
"parameters": {
"batchSize": null,
"maxFailedItems": null,
"maxFailedItemsPerBatch": null,
"base64EncodeKeys": null,
"configuration": {
"indexedFileNameExtensions" : ".pdf, .docx, .pptx, ppt",
"excludedFileNameExtensions" : ".png, .jpg",
"dataToExtract": "contentAndMetadata"
}
},
"schedule" : { },
"fieldMappings" : [
{
"sourceFieldName" : "metadata_spo_site_library_item_id",
"targetFieldName" : "id",
"mappingFunction" : {
"name" : "base64Encode"
}
}
]
}

One important thing you need to do just to be sure you will not get an error such as “Tenant does not have a SPO license” when trying to access drive items with GraphAPI”. You need to grant access (Contributor)to cognitive service for the SharePoint account that you use. After providing access you can execute the last REST API request to create indexer.

As a result, you can see all created resources (source, index, and indexer) inside the cognitive service.

index
indexer
data source

Few last steps you need to configure before you will start setuping AI tools. You need to enable semantic search into index resource.

semantic search

That is pretty all. Now you can open Azure AI studio and create datasource to use in your AI chat bot playground.

wizard to configure new data source for AI tool

After completing these steps you can play with your chatbot using not only chatGPT but also exploring your data by using your personal SharePoint space.

Happy coding.

--

--

Sergii Bielskyi

Cloud is more that you imagine… Microsoft Azure MVP | Founder of IoT community in Ukraine | MCPD | MCTS | MCP