Skip to content

Datasources🔗

A datasource define from where comes the data to ingest and how to fetch it.

There is a few types of supported datasources, each with its specifics configurations:

  • Webservice REST & SOAP
  • ODBC
  • PostgreSQL
  • Oracle
  • Oracle Stored Procedure & Stored Function
  • CSV
  • XML
  • RabbitMQ
  • Collection

Configurations are written in a JSON document.

Webservice REST & SOAP🔗

Webservice REST & SOAP datasources allows Slurpers to fetch data from a webservice using an HTTP request.

A JSON response is expected from a REST webservice, while an XML response is expected from SOAP webservices.

Note for SOAP datasources in the context of multiple-datasources Slurper

In such a case, when accessing the data of the SOAP datasource, you will face a list of one element instead of that element itself. Make sure you take the first element of that list using [0] to get your data. In SOAP, the element is probably a dict.

def slurp(ctx, data_map):
    users = data_map.get("Users")[0]
    posts = data_map.get("Posts")[0]
    return ...

Configurations🔗

Configurations options are:

  • url : url of the endpoint to query, with or without query parameters.

    The url can be generated dynamically with date and time parameters. Check templating for more information.

  • method : request method / verb, GET or POST

  • auth : object defining authentication method, if applicable
    • type : one of [basic, bearer, oauth2_client_credentials, oauth2_password]
    • If type is basic, the Authorization header will be filled according to the Basic HTTP authentication scheme
      • username : username to use in basic authentication header
      • password : password to use in basic authentication header
    • If type is bearer, the Authorization header will contains the Bearer token
      • token: token to use in bearer authentication header
    • If type is oauth2_client_credentials, a request to the authorization server will be made to get the OAuth2 token to use in the call to the webservice, using Client Credentials grant
      • oauth2_token_url: url to get the token on the authorization server
      • oauth2_client_id: OAuth2 client id
      • oauth2_client_secret: OAuth2 client secret
    • If type is oauth2_password, a request to the authorization server will be made to get the OAuth2 token to use in the call to the webservice, using Password grant
      • oauth2_token_url : url to get the token on the authorization server
      • oauth2_client_id : OAuth2 client id
      • oauth2_client_secret : OAuth2 client secret
      • username : username to use in the Password grant flow
      • password : password to use in the Password grant flow
  • timeout : timeout in seconds before cancelling request to the endpoint
  • headers : object defining custom headers to add to the HTTP request to the webservice
  • req_body : request body to send when using method POST as a string, " character must be escaped whith \", see example below

    The request body can be generated dynamically with date and time parameters. Check templating for more information.

    If you want to use a json body (opposed to a string for example), please also set req_body_type with the value json. Otherwise, templating might not work as expected.

  • pagination: pagination of the data source. See Data source pagination for more information

  • data_path: If the data you want is not at the root of your request, you can specify a JSONPath expression. This only works for REST datasource at the moment.
  • parameters : an array of parameters to pass from the endpoint (direct) to the datasource, each having the following properties
    • name : the name of the parameter
    • location : query or body, to inject the parameter in the request query params or the request body respectively

Examples🔗

Whithout authentication, with timeout

{
  "url": "https://my.webservice-example.org/api/v1/?limit=1000",
  "timeout" : 30
}

With Basic authentication and custom headers

{
  "url": "https://my.webservice-example.org/api/v1/?limit=1000",
  "auth": {
    "type": "basic",
    "username": "alice",
    "password": "Password!"
  },
  "headers": {
    "Accept-Language": "fr-CH"
  }
}

With Bearer authentication

{
  "url": "https://my.webservice-example.org/api/v1/?limit=1000",
  "auth": {
    "type": "bearer",
    "token": "very-sercret-token"
  }
}

With OAuth2.0 Client Credentials grant

{
  "url": "https://my.webservice-example.org/api/v1/?limit=1000",
  "auth": {
    "type": "oauth2_client_credentials",
    "oauth2_token_url": "https://auth.webservice-example.org/oauth2/token",
    "oauth2_client_id": "client-id",
    "oauth2_client_secret": "very-sercret-client-secret"
  },
}

With OAuth2.0 Password grant

{
  "url": "https://my.webservice-example.org/api/v1/?limit=1000",
  "auth": {
    "type": "oauth2_password",
    "oauth2_token_url": "https://auth.webservice-example.org/oauth2/token",
    "oauth2_client_id": "client-id",
    "oauth2_client_secret": "very-sercret-client-secret",
    "username": "alice",
    "password": "Password!"
  },
}

Using POST request with a body

{
  "url": "https://my.webservice-example.org/api/v1/?limit=1000",
  "method": "POST",
  "req_body": "{\"key\": \"value\"}"
}

Using parameters in query

{
  "url": "https://jsonplaceholder.typicode.com/posts",
  "parameters": [
    {
      "name": "userId",
      "location": "query"
    },
    {
      "name": "id",
      "location": "query"
    }
  ]
}

When calling the endpoint with $ds_param:userId=1 and $ds_param:id=1 query parameters, the URL called by Slurpy will be https://jsonplaceholder.typicode.com/posts?userId=1&id=1.

Using parameters in request body

{
  "url": "https://jsonplaceholder.typicode.com/posts",
  "req_body": "My user ID is {{ userId }}",
  "parameters": [
    {
      "name": "userId",
      "location": "body"
    }
  ]
}

When calling the endpoint with $ds_param:userId=1 query parameters, the request body send by Slurpy will be My user ID is 1.

ODBC🔗

ODBC datasources allows Slurpers to fetch data from an ODBC compatible database server using a SQL query.

Complex queries such as joins and subqueries are totally supported.

Configurations🔗

Configurations options are:

  • connstr : An ODBC connection string
  • init_query : A SQL query executed before the main query, can be ignored
  • query : A SELECT SQL query
  • fetch_many_size : if set, it will fetch only fetch_many_size rows from the result of the query, call the slurper with this subset, save the partial result, and repeat the operation until the cursor is exhausted.Warning: the slurper will will not have all the data at once !

The Driver value in the connection string must reference a driver installed on the Slurpy server. Available values are:

  • PostgreSQL ANSI
  • PostgreSQL Unicode
  • ODBC Driver 17 for SQL Server
  • ODBC Driver 18 for SQL Server

Examples🔗

Select all columns from a table of a PostgreSQL database

{
  "query": "SELECT * FROM mytable",
  "connstr": "Driver={PostgreSQL Unicode};Server=db.psql-example.org;Port=5432;Database=mydatabase;Uid=myusername;Pwd=mypassword;"
}

Select some columns from a table of a MS SQL Server database

{
  "query": "SELECT [id], [username] FROM [dbo].[users]",
  "connstr": "Driver={ODBC Driver 17 for SQL Server};Server=db.mssql-example.org;UID=myusername;PWD=mypassword;"
}

PostgreSQL🔗

PostgreSQL datasources allows Slurpers to fetch data from a PostgreSQL database server using a SQL query.

Complex queries such as joins and subqueries are totally supported.

Configurations🔗

Configurations options are:

  • connstr : A PostgreSQL connection string (postgresql://user:password@server:port/database)
  • init_query : A SQL query executed before the main query, can be ignored
  • query : A SELECT SQL query
  • fetch_many_size : if set, it will fetch only fetch_many_size rows from the result of the query, call the slurper with this subset, save the partial result, and repeat the operation until the cursor is exhausted.Warning: the slurper will will not have all the data at once !

Examples🔗

Select all columns from a table of a database

{
  "query": "SELECT * FROM invoices",
  "connstr": "postgresql://postgres:xyzABC123$!#@8.8.8.8:5432/sales"
}

Oracle🔗

Oracle datasources allows Slurpers to fetch data from a Oracle database server using a SQL query.

Complex queries such as joins and subqueries are totally supported.

Configurations🔗

Configurations options are:

  • connstr : An Oracle data source name or DSN (127.0.0.1:1521/EXAMPLE)
  • username : Username used for connection
  • password : Password used for connection
  • init_query : A SQL query executed before the main query, can be ignored
  • query : A SELECT SQL query
  • fetch_many_size : if set, it will fetch only fetch_many_size rows from the result of the query, call the slurper with this subset, save the partial result, and repeat the operation until the cursor is exhausted.Warning: the slurper will will not have all the data at once !

Examples🔗

Select all columns from a table of a database

{
  "query": "SELECT * FROM mytable",
  "connstr": "127.0.0.1:1521/EXAMPLE",
  "username": "test",
  "password": "secret"
}

Oracle Stored Procedure & Stored Function🔗

Oracle Stored Procedure & Stored Function datasources allows Slurpers to call stored procedure and stored function on an Oracle database server.

Configurations🔗

Configurations options are:

  • connstr : An Oracle data source name or DSN (127.0.0.1:1521/EXAMPLE)
  • username : Username used for connection
  • password : Password used for connection
  • parameters : an array of parameters for the procedure or function, each having the following properties
    • name : the name of the parameter
    • named : boolean to switch from positional or named parameters
    • direction : in or out
  • For Stored Procedure:
    • procedure_name : the name of the procedure to call
  • For Stored Function:
    • function_name : the name of the function to call
    • return_type : the return type of the function, one of str, int, float or bool

Examples🔗

Stored Procedure🔗

Call the stored procedure TEST_SLURPY_IN_OUT, using two "in" and one "out" positional parameters.

{
  "connstr": "localhost:1521/EXAMPLE",
  "username": "test",
  "password": "secret",
  "parameters": [
    {
      "name": "p_in_1",
      "named": false,
      "direction": "in"
    },
    {
      "name": "p_in_2",
      "named": false,
      "direction": "in"
    },
    {
      "name": "p_out_1",
      "named": false,
      "direction": "out"
    }
  ],
  "procedure_name": "TEST_SLURPY_IN_OUT"
}

Stored Function🔗

Call the stored function TEST_SLURPY_IN_OUT_FCT, using two "in" positional parameters.

{
  "connstr": "localhost:1521/EXAMPLE",
  "username": "test",
  "password": "secret",
  "parameters": [
    {
      "name": "p_in_1",
      "named": false
    },
    {
      "name": "p_in_2",
      "named": false
    }
  ],
  "function_name": "TEST_SLURPY_IN_OUT_FCT"
}

CSV🔗

CSV datasource allows Slurpers to read data from a CSV file, either uploaded via the datasource create / update form or by providing a path to the file in the configurations.

Configurations🔗

  • path : unix path to the CSV file to read data from, can be omitted if the file is uploaded directly
  • has_header : boolean to tell if the CSV has a header, if omitted it defaults to true, if false a header in the form ["col1", "col2", "col3", …] is generated
  • replace_periods_with_underscores : boolean to tell if . should be replaced by _ in the header keys for storage in MongoDB. It defaults to false. The replacement allows filtering on such keys later.

Examples🔗

Absolute path to the CSV file without header

{
  "path": "/path/to/my_csv_file.csv",
  "has_header": false
}

An uploaded file (no configurations needed)

null

Uploaded file with . to _ replacement in header keys

{
  "replace_periods_with_underscores": true
}

XML🔗

XML datasource allows Slurpers to read data from a XML file, either uploaded via the datasource create / update form or by providing a path to the file in the configurations.

Configurations🔗

  • path : unix path to the XML file to read data from, can be omitted if the file is uploaded directly

Examples🔗

Absolute path to the XML file

{
  "path": "/path/to/my_xml_file.xml"
}

An uploaded file (no configurations needed)

null

RabbitMQ🔗

RabbitMQ datasource allows Slurpers to read data from a queue in RabbitMQ.

Check this link for more information on queues.

Configurations🔗

  • connstr: Connection string to RabbitMQ including virtualhost: amqp://USERNAME:PASSWORD@HOST/VIRTUAL_HOST
  • exchange_name: Name of the exchange on which the queue is bound.
  • exchange_durable: Boolean indicating if the exchange is durable. Default to False.
  • queue_name: Name of the queue
  • queue_durable: Boolean indicating if the queue is durable. Default to False.
  • queue_type: Type of the queue.
  • routing_key: Routing key used to route messages in the queue. More information can be found here.
  • skip_ack: Use this to avoid sending the ACK signal to RabbitMQ when reading a message. This way, the message is not erased from the queue. Usefull for testing.
  • timeout: Used in conjuction with skip_ack. As we're normally reading all messages in the queue, skipping ACK makes the queue infinite (message are never removed). This parameter defines for how long we should pick messages in the queue before exiting. By default 1s.

Examples🔗

{
  "connstr": "amqp://USERNAME:PASSWORD@HOST/VIRTUAL_HOST",
  "queue_name": "SensorsUplink",
  "queue_type": "quorum",
  "routing_key": "amqp.things.#",
  "exchange_name": "SensorsExchange",
  "queue_durable": true,
  "exchange_durable": true
}

Collection🔗

Collection datasource allows Slurpers to read data from a collection of data or one of its versions.

Collections are the result of an async job.

Configurations🔗

  • collection_id: Id of the collection to read from. Latest version will be used if not specified using mongo_collection_id.
  • mongo_collection_id: Unique id of the mongo collection. This will correspond to the version of a collection.
  • batch_size: Instead of loading the whole collection at once, load batch of this size. Use this to improve performances.

Warning

You cannot use mongo_collection_id if it refers to a "version" of a Normal collection, as the mongo_collection_id of such collection will change at each execution.

Examples🔗

{
  "mongo_collection_id": "66aabe11-468b-49ee-b05f-f515a6776645",
  "batch_size": 5000
}