Sharing files from SharePoint to Azure Blob Storage using Python

Let’s see how we can share files between two cloud storage solutions - SharePoint and Azure Blob Storage using Python. We will be simply copying all the files from a sharepoint site to an Azure Blob Storage container.

Sharepoint sites will have a drive associated with it and we can access the files in the drive using the Graph API. We will be using the requests library to interact with the Graph API.

Create an application in Microsoft Entra ID

Go to the Entra website and create a new application.
Copy the client_id, client_secret, and tenant_id from the application.

use common as tenant_id for multitenant apps, which needs additional approval from tenant admin to use the app.

These credentials will be used to retrieve the access token which will be used to send requests to graph api.

Make sure the application has permission to Sites.Read.All and Files.ReadWrite.All.

Obtaining Access token

We will be using the requests library to interact with the Graph API. We will be sending a POST request to the https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token endpoint to get the access token.

# Obtain the access token using the client credentials
def obtain_access_token(tenant_id, client_id, client_secret):
    url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
    data = {
        'grant_type': 'client_credentials',
        'client_id': client_id,
        'client_secret': client_secret,
        'scope': 'https://graph.microsoft.com/.default'
    }
    response = requests.post(url, data=data)
    return response.json()['access_token']

Finding the Site ID

We need to find the site ID of the SharePoint site we want to copy files from. We can get the site ID by sending a GET request to the https://graph.microsoft.com/v1.0/sites/{site_path} endpoint.

let’s write a function to transform the site URL to the site path as well,

# Transform the site URL to site path
# Eg: https://example.sharepoint.com/sites/test -> example.sharepoint.com:/sites/test
def transform_url(url):
    if not url.startswith(('http://', 'https://')):
        url = 'https://' + url
    parsed_url = urlparse(url)
    netloc = parsed_url.netloc
    path = parsed_url.path
    return f"{netloc}:{path}"

# Get the site ID of the SharePoint site
# It will be used to get the files in the site
def get_site_id(access_token, site_url):
    site_path = transform_url(site_url)

    url = f"https://graph.microsoft.com/v1.0/sites/{site_path}"
    headers = {
        'Authorization': f'Bearer {access_token}'
    }
    response = requests.get(url, headers=headers)
    return response.json()['id']

Retrieving the Files

We can retrieve the items in the SharePoint site’s root by sending a GET request to the https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/root/children endpoint.

The Items can be files or folders. We can check if the item is a folder by checking the folder key in the item.

Also, The response will be paginated and we can get the next page by sending a GET request to the @odata.nextLink url.

Let’s write a function to reccursively retrieve all the files in the SharePoint site.

# let's store the files in a list
files = []

# Get the items in the SharePoint site reccursively
def get_sharepoint_site_files(access_token, site_id, path='root'):

    url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drive/{path}/children"
    headers = {
        'Authorization': f'Bearer {access_token}'
    }

    while url:
        response = requests.get(url, headers=headers)
        data = response.json()
        items = data['value']

        for item in items:
            if 'folder' in item:
                folder_id = item['id']
                get_sharepoint_site_files(access_token, site_id, f"items/{folder_id}")
            else:
                download_url = item['@microsoft.graph.downloadUrl']
                files.append(item)

        url = data.get('@odata.nextLink')

The files will have a download URL which can be used to download the file. We can download the file by sending a GET request to the download URL.

Uploading the Files to Azure Blob Storage

Uploading to azure blob storage requires a connection string. You can get the connection string from the Azure Portal.

There is a client library called azure-storage-blob which can be used to interact with the Azure Blob Storage.

pip install azure-storage-blob

Let’s write a function to upload the files to Azure Blob Storage (stream)

# To determine the path of the file in the SharePoint site
# Eg: /drive/root:/folder1/folder2/file.txt -> folder1/folder2/file.txt
def get_item_path(item_data):
    item_name = item_data.get("name")
    item_path = item_data.get("parentReference").get("path")
    prefix = "/drive/root:"
    item_path = item_path.replace(prefix, "")
    final_path = os.path.join(item_path, item_name).lstrip("/")
    return final_path

# Download the file from the download URL and upload it to Azure Blob Storage
# The file will be uploaded to the container with the same path as in the SharePoint site
def upload_file_to_azure(item_data, connection_string, container_name):
    blob_service_client = BlobServiceClient.from_connection_string(connection_string)
    container_client = blob_service_client.get_container_client(container_name)
    if not container_client.exists():
        container_client.create_container()
    download_url = item_data['@microsoft.graph.downloadUrl']
    file_path = get_item_path(item_data)
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=file_path)

    with requests.get(download_url, stream=True) as response:
        blob_client.upload_blob(response.content, overwrite=True)


## Usage
for file in files:
    upload_file_to_azure(file, connection_string, container_name)

Final Code

Overall our final program might look something like this

import os
import requests
from urllib.parse import urlparse
from azure.storage.blob import BlobServiceClient


# Obtain the access token using the client credentials
def obtain_access_token(tenant_id, client_id, client_secret):
    url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
    data = {
        'grant_type': 'client_credentials',
        'client_id': client_id,
        'client_secret': client_secret,
        'scope': 'https://graph.microsoft.com/.default'
    }
    response = requests.post(url, data=data)
    return response.json()['access_token']


# Transform the site URL to site path
# Eg: https://example.sharepoint.com/sites/test -> example.sharepoint.com:/sites/test
def transform_url(url):
    if not url.startswith(('http://', 'https://')):
        url = 'https://' + url
    parsed_url = urlparse(url)
    netloc = parsed_url.netloc
    path = parsed_url.path
    return f"{netloc}:{path}"


# Get the site ID of the SharePoint site
# It will be used to get the files in the site
def get_site_id(access_token, site_url):
    site_path = transform_url(site_url)

    url = f"https://graph.microsoft.com/v1.0/sites/{site_path}"
    headers = {
        'Authorization': f'Bearer {access_token}'
    }

    response = requests.get(url, headers=headers)
    return response.json()['id']


# Get the items in the SharePoint site reccursively
def get_sharepoint_site_files(access_token, site_id, path='root'):
    sharepoint_files = []

    url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drive/{path}/children"
    headers = {
        'Authorization': f'Bearer {access_token}'
    }

    while url:
        response = requests.get(url, headers=headers)
        data = response.json()
        items = data['value']

        for item in items:
            if 'folder' in item:
                folder_id = item['id']
                folder_items = get_sharepoint_site_files(access_token, site_id, f"items/{folder_id}")
                sharepoint_files.extend(folder_items)
            else:
                download_url = item['@microsoft.graph.downloadUrl']
                sharepoint_files.append(item)

        url = data.get('@odata.nextLink')

    return sharepoint_files


# To determine the path of the file in the SharePoint site
# Eg: /drive/root:/folder1/folder2/file.txt -> folder1/folder2/file.txt
def get_item_path(item_data):
    item_name = item_data.get("name")
    item_path = item_data.get("parentReference").get("path")
    prefix = "/drive/root:"
    item_path = item_path.replace(prefix, "")
    final_path = os.path.join(item_path, item_name).lstrip("/")
    return final_path


# Download the file from the download URL and upload it to Azure Blob Storage
# The file will be uploaded to the container with the same path as in the SharePoint site
def upload_file_to_azure(item_data, connection_string, container_name):
    blob_service_client = BlobServiceClient.from_connection_string(connection_string)
    container_client = blob_service_client.get_container_client(container_name)
    if not container_client.exists():
        container_client.create_container()
    download_url = item_data['@microsoft.graph.downloadUrl']
    file_path = get_item_path(item_data)
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=file_path)

    with requests.get(download_url, stream=True) as response:
        blob_client.upload_blob(response.content, overwrite=True)


# Main function
def main():
    tenant_id = "your_tenant_id"
    client_id = "your_client_id"
    client_secret = "your_client_secret"
    site_url = "https://example.sharepoint.com/mysite_name"
    connection_string = "your_connection_string"
    # container name should be lowercase
    # it shouldn't contain any special characters but hyphen
    container_name = "your-container-name"

    access_token = obtain_access_token(tenant_id, client_id, client_secret)
    site_id = get_site_id(access_token, site_url)
    sharepoint_files = get_sharepoint_site_files(access_token, site_id)

    for file in sharepoint_files:
        print(f"Uploading {file['name']} to Azure Blob Storage")
        upload_file_to_azure(file, connection_string, container_name)
        print(f"Done.")

if __name__ == "__main__":
    main()

This is how you can share files between SharePoint and Azure Blob Storage using Python. You can modify the code to suit your requirements and use case.

Create an application in Microsoft Entra ID#

Obtaining Access token#

Finding the Site ID#

Retrieving the Files#

Uploading the Files to Azure Blob Storage#

Final Code#