Many businesses and organisations use Microsoft SharePoint in one guise or another to store company-centric information. Most of this information is locked in and it can be difficult to extract, and yet much of it could be used to report on the health of the business, usage, and trends.
Getting SharePoint information using standard web services is normally only for the realms of a developer. Shown below is one method of utilising Talend to extract items in a list by using a simple tSOAP call. The same or similar techniques can be used to pull down documents and to add items into an existing or new list.
Extract List Data
1. Create a New Talend MDM Job and drag in a tSOAP and a tLogRow component.
2. Make sure that you have access to a SharePoint list with some items in it. In this example, the “TITLE” and “ID” fields are going to be extracted.
3. The list’s web service is found at the SharePoint site or subsite URL:
4. Double-click the tSOAP to get to the code. Find out about tSOAP here.
5. In the Authentication area, check the NTLM box and enter the user, password and domain details of the user account that has access to the List.
6. Enter the URL of the site that contains the list you are interested in plus the /_vti_bin/Lists.asmx call. For example:
7. Enter the SOAP action as “http://schemas.microsoft.com/sharepoint/soap/GetListItems“. You can find a reference to GetListItems here and the SOAP Version used in this example is 1.1.
8. Enter the following code into the SOAP message section:
“<soapenv:Envelope xmlns:soapenv=’http://schemas.xmlsoap.org/soap/envelope/’><soapenv:Body><GetListItems xmlns=’http://schemas.microsoft.com/sharepoint/soap/’>
<FieldRef Name=’ID’ />
<FieldRef Name=’Title’ />
The tSOAP_1 component settings will end up looking similar to this:
9. Connect the two components by right-clicking and dragging a line between their centres with the mouse.
10. Group them and add a title for presentation.
11. Double click the tLogRow and click on the Sync Columns button. Choose Basic or Table, as you wish.
12. Run your job; you should find that the list items “ID” and “TITLE” fields have been printed out.
The results are:
Formatting the Data
The results can be formatted and exported to a file or another system using standard Talend techniques such as a tXMLMap to extract the fields, alternatively use a tExtractXMLField. An example is below.
1. Add a tXMLMap and double-click to open the settings.
2. Import the schema.
3. Right click the x:row and set it as a loop item.
4. Add variables and map them to the required fields.
5. Add a tLogRow and sync the Schema.
6. When the code is run, the following will be output:
The method shown above is ideal to extract, transform and load SharePoint data. Although this is a simple scenario, quite intensive copy, read, update and delete (CRUD) operations could be handled using Talend SOAP or REST components. Forthcoming blogs in this series will explain authentication for SharePoint 2013 On-Line which has its own intricacies.