Connect Sharepoint Online from Azure Data Factory via API Calls
In this post I will show you how to connect Sharepoint Online from Azure Data Factory via REST API calls, it would be worth mentioning here that as of now there is no direct connector for Sharepoint Online available in Data Factory and hence these APIs come in handy.
This method uses the App only registration but there are other ways of doing this as well like using an App registered in the AAD to request an access token from Graph API and using the same for authenticating into the Sharepoint Online Site. Will cover those in separate posts.
So lets get going!!
Generate the Client ID and Client Secret of the Sharepoint Online Site that you wish to connect:
Head on to the Sharepoint site which you wish to connect
the Url would look something like this:
Now go to Settings ->Site Contents:
Now in the Url of this page replace viewlsts.aspx and anything after that with appregnew.aspx and click enter:
Now generate a Client ID and a Client Secret for this site and also fill in the other details like Title, App Domain and Redirect URI (They can be anything) and click create:
Note down and save this Client ID and Secret as we will need them very soon.
Now that we have created a new principal, we need to now give this principal permissions from the O365 Admin Center.
For navigating to the admin portal just replace "contoso" with your Sharepoint account name
Once there just lookup for the App ID you just created in the last step (ps. I might use the terms App ID and Client ID interchangeably):
As shown above in the Permission Request XML textbox just add the following xml code to provide the principal the required permissions:
<AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="FullControl" />
Click Create and if you dont have any trust issues with your principal just click Trust It and you will be done:
Now Next Step is to get the Tenant ID of Azure Active Directory, there are multiple ways to do this but we will do it via a dummy API call through postman tool because we also need another piece of information along with the Tenant ID to make the actual API call.
you can download postman tool from here for free:
Now in Postman create a new request with GET Method to the following URL
https://<Your SP Account Name>.sharepoint.com/sites/<Your Site name>/_vti_bin/client.svc/
and add the flowing Key and value in the Header of the request:
Now click on Send
Now the request will fail due to unauthorized access but we are just interested in the Headers of the response.
So in the response go to headers, you will find several headers but look out for the key "WWW-Authenticate"
Copy its entire value in a notepad and extract the realm and the Client ID(We will call this as resource ID from now on to differentiate it from the actual client ID that we have for our SPO Site) field values from it as shown below:
Now just to review, we have the Client ID, Client Secret , Tenant ID and the Resource ID with us which gets us in a position to make a POST request to https://accounts.accesscontrol.windows.net/ in order to get an access token which can be used to authenticate our Rest API call to the Sharepoint Site.
So now will head on to our Azure Data Factory to make these couple of API calls and copy the data from a list in this site and load into a file in ADLS, here we go....
Create a new pipeline and add a web activity to it and name it as "Get Token"..
Below are the settings of the Web Activity:
Url : https://accounts.accesscontrol.windows.net/<Your Tenant ID>/tokens/OAuth/2
Method : POST
Name- Content-Type ;Value- application/x-www-form-urlencoded
grant_type=client_credentials&client_id=<Your Client ID>@<Your Tenant ID>&client_secret=<Your Client Secret>&resource=<Your Resource ID>/nishantsethi.sharepoint.com@<Your Tenant ID>
Now create a new linked service of REST type with AutoResolveIntegrationRuntime IR and the below properties:
Base URL: https://<Your Sharepoint Account>.sharepoint.com/sites/<Your Site Name>/_api/lists/
Authentication Type: Anonymous
Now create a new REST Dataset with the above linked service and give its relative URL as below:
getbytitle('<Your Sharepoint List Name>')/items?$top=10
Now go back to the earlier pipeline where we had created the web activity and add a new copy activity in it with the following settings:
Source Dataset: The REST Dataset you just created
Request Method: GET
Value- @concat(string(activity('Get Token').output.token_type),' ',string(activity('Get Token').output.access_token))
Rest leave all default:
Create a new dataset of type json with a linked service connecting to your Azure Data Lake Store or Blob storage, where ever you want to dump the SPO List data.
In my case I am loading it into a json file in my ADLS.
Thats it!!! we are done, now lets go ahead and run the beast(I mean the pipeline) to see if the data from my Sharepoint List called AzureConnect gets loaded into the specified ADLS location or not.
And the Pipeline ran successfully without any errors, now lets check the json file on the ADLS which should contain the List data:
So to summarize, we first registered the Sharepoint Online site that we need to connect and generated its Client ID and Client Secret, then we granted this principal permissions from the O365 Admin center.
We then used these credentials along with the AAD tenant ID to request an access token which we then use to access our Sharepoint Online Site by making a REST API Call to it and in this way we were able to extract all the data of the specified list using plainly Azure Data Factory.
In the next post I will show you how to do the same thing not by the App registration method but by requesting a token from the Graph API for which we will have to register an App in the Azure Active directory and provide Graph API access permissions to it.
Data Engineer, KPMG India