How to Extract List Items from SharePoint On-Premise into Talend

by | Jun 9, 2015 | BlogPosts, Tech Tips | 5 comments

Many businesses and organisations use Microsoft SharePoint in one guise or another to store company-centric information. Most of this information is locked in and it can be difficult to extract, and yet much of it could be used to report on the health of the business, usage, and trends.

Getting SharePoint information using standard web services is normally only for the realms of a developer. Shown below is one method of utilising Talend to extract items in a list by using a simple tSOAP call. The same or similar techniques can be used to pull down documents and to add items into an existing or new list.

Extract List Data

1. Create a New Talend MDM Job and drag in a tSOAP and a tLogRow component.

2. Make sure that you have access to a SharePoint list with some items in it. In this example, the “TITLE” and “ID” fields are going to be extracted.

3. The list’s web service is found at the SharePoint site or subsite URL:

http(s)://<servername>/<site>/(<subsite>)/_vti_bin/Lists.asmx?wsdl

4. Double-click the tSOAP to get to the code. Find out about tSOAP here.

5. In the Authentication area, check the NTLM box and enter the user, password and domain details of the user account that has access to the List.

6. Enter the URL of the site that contains the list you are interested in plus the /_vti_bin/Lists.asmx call. For example:

“http://<FQDN>/sites/test/_vti_bin/Lists.asmx”

7. Enter the SOAP action as “http://schemas.microsoft.com/sharepoint/soap/GetListItems“. You can find a reference to GetListItems here and the SOAP Version used in this example is 1.1.

8. Enter the following code into the SOAP message section:

“<soapenv:Envelope xmlns:soapenv=’http://schemas.xmlsoap.org/soap/envelope/’><soapenv:Body><GetListItems xmlns=’http://schemas.microsoft.com/sharepoint/soap/’>

<listName>testlist</listName>

<query><Query></Query></query>

           <viewFields>

             <ViewFields>

               <FieldRef Name=’ID’ />

               <FieldRef Name=’Title’ />

             </ViewFields>

           </viewFields>

         </GetListItems>

       </soapenv:Body>

     </soapenv:Envelope>”

The tSOAP_1 component settings will end up looking similar to this:

2015-05 Blog - Extract List Items - Image 1

9. Connect the two components by right-clicking and dragging a line between their centres with the mouse.

10. Group them and add a title for presentation.

2015-05 Blog - Extract List Items - Image 2

11. Double click the tLogRow and click on the Sync Columns button. Choose Basic or Table, as you wish.

12. Run your job; you should find that the list items “ID” and “TITLE” fields have been printed out.

2015-05 Blog - Extract List Items - Image 3

The results are:

2015-05 Blog - Extract List Items - Image 4

Formatting the Data

The results can be formatted and exported to a file or another system using standard Talend techniques such as a tXMLMap to extract the fields, alternatively use a tExtractXMLField. An example is below.

1. Add a tXMLMap and double-click to open the settings.

2. Import the schema.

3. Right click the x:row and set it as a loop item.

4. Add variables and map them to the required fields.

2015-05 Blog - Extract List Items - Image 5

5. Add a tLogRow and sync the Schema.

6. When the code is run, the following will be output:

2015-05 Blog - Extract List Items - Image 6

The method shown above is ideal to extract, transform and load SharePoint data. Although this is a simple scenario, quite intensive copy, read, update and delete (CRUD) operations could be handled using Talend SOAP or REST components. Forthcoming blogs in this series will explain authentication for SharePoint 2013 On-Line which has its own intricacies.

5 Comments

  1. I’ve followed your instructions, but I receive this error. Any thoughts? I changed the SOAP envelope contain my list name and 2 of my list item names, so I’m stumped, also, don’t know anything about how SOAP works.

    ​Below is the error I receive:

    com.sun.xml.internal.messaging.saaj.client.p2p.HttpSOAPConnection post
    SEVERE: SAAJ0008: Bad Response; Bad Request
    Exception in component tSOAP_1
    com.sun.xml.internal.messaging.saaj.SOAPExceptionImpl: com.sun.xml.internal.messaging.saaj.SOAPExceptionImpl: Bad response: (400Bad Request
    at com.sun.xml.internal.messaging.saaj.client.p2p.HttpSOAPConnection.call(Unknown Source)

    Reply
    • The error 400 is a common http connection error and the exact cause can be pinpointed by downloading and using fiddler or SOAPUI.

      Error 400 explanation:
      http://www.checkupdown.com/status/E400.html

  2. If you are trying this and not having success connecting to tXMLMap try ticking the box “Output in document” in tSoap to get an xml document as output

    Reply
  3. Hi,

    I am not able to import schema in tXMLMap_1 component. Getting error : connot find the declaration of element soap:Envelope

    Please let me know what went wrong or How to generate and import schema in tXMLMap.

    Thanks

    Reply
  4. Hi,
    I have the following error:

    soap:ServerException of type ‘Microsoft.SharePoint.SoapServer.SoapServerException’ was thrown.Value cannot be null.
    Parameter name: SPContext|

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *