Blog

Large Data Searches with the Kustomer API

  • 19 April 2023
  • 0 replies
  • 188 views

Userlevel 4
Badge +4

Large Data Searches with the Kustomer API

 

Kustomer allows businesses to access their customer data using an API (Application Programming Interface) that enables them to integrate the platform's features into their own applications. One of the challenges businesses face is downloading large datasets from Kustomer, especially when they need to filter the data by a specific date range. In this blog post, we will explore how to use Kustomer's API to download large datasets using date range searches and pagination. We will dive into the technical details of how to query the Kustomer API to retrieve the data you need and discuss best practices for managing large datasets. By the end of this post, you will have a clear understanding of how to efficiently download large datasets from Kustomer, making it easier for you to analyze your customer data and improve your business operations.

 

Creating searches via API

 

Kustomer includes a powerful search functionality that allows users to filter data based on specific criteria, such as date ranges and other customer and message attributes. Additionally, our sorting function enables users to sort the data by any column, making it easy to find the information they need. Once the data is filtered and sorted, users can export the results to a CSV file for further analysis in their preferred data analysis tool.

Not only is the search functionality available within the app, but it's also accessible through our API. By using an API, businesses can automate the data retrieval process and incorporate it into their workflows. This enhanced automation capability can save businesses significant time and effort by eliminating the need for manual data extraction. Additionally, using a scripting language like Python or JavaScript with our API can unlock even more automation capabilities, allowing businesses to perform complex data manipulations and analyses with ease. The flexibility of our search function and API allows businesses to tailor their data retrieval process to their specific needs, whether it be a one-time download or an ongoing, automated process.

 

From the GUI to the API

 

The following search will show customers objects that have had any activity in the past hour.

 

lfv3qGYtlGQ2e8ig-BLwTt94xqdj4h3eeAkPc3_2-9pvMMY-4OcjDzRNbJTyoyVGbnJxG6TKmRWG-bQOLhguFeQfAB5E6ZxIFab7RBg1ksxr-R85rnaH21qtdzRV1F99Qp7dVwEsg_n8fj3Y526bdMs

 

The code examples will use Python, since it's a popular and beginner-friendly programming language and the `requests` library that is commonly used when working with APIs.

 

Below is the same search using the API from a Python script.

 

```Python

import requests

 

INSTANCE_NAME  = '<YOUR-INSTANCE-NAME>'

API_KEY = '<YOUR-API-KEY>'

 

url = "https://%s.api.kustomerapp.com/v1/customers/search" % INSTANCE_NAME

 

headers = {

   "accept": "application/json",

   "content-type": "application/json",

   "authorization": "Bearer %s" % API_KEY

}

 

payload = {

   "and": {"customer_last_activity_at": { "gte": "now-1h" }},

   "queryContext": "customer",

   "includeDeleted": False,

   "timeZone": "GMT"

}

 

response = requests.post(url, json=payload, headers=headers)

 

json = response.json()

print(len(json['data']))

```

 

The `payload` object contains the parameters for the search selected in the GUI, detailed as follow:

- **and**: this object contains the filters that correspond to the **ALL of the following** filters.

   - `customer_last_activity_at`: since search filters can be based on attributes of different objects, they are prefixed with object name in the API. This corresponds to `Last Activity At`.

   - `gte`: means *Greater than or equals*. This corresponds to a "on or after" which is a more friendly description when the attribute is a date.

   - `now-1h`: This expression indicates current date time minus 1 hour. This corresponds to *Relative - 1 hours ago*.

 

- **queryContext**: this parameter indicates which object will be returned as the results of the search. This option makes it possible to have a search to use filters based on conversation attributes and get the customer objects, or filter by message attributes and get conversations objects.

- **includeDeleted**: searches created in the GUI do not include deleted objects. When using the Kustomer API it is possible to change this behavior by specifying `True`. The default value is `False` and this value is included only for example purposes.

- **timeZone**: this specifies the time zone the date expressions will use, since `GMT` is specified `now` will be on GMT timezone.

 

 

To make requests to the `/customers/search` endpoint an API Key with the proper permissions is needed. Also adding the instance name as a prefix to the URL is required for the API Key created from the Security / API Keys. See here: (https://developer.kustomer.com/kustomer-api-docs/reference/customersearch) for additional information.

 

Large search results

 

Due to performance considerations, it's not possible to obtain large search results with a single request. Instead, multiple requests with filters are needed to retrieve the results. The search endpoint for Kustomer's API has a limit of 10,000 results per search. This means that to retrieve larger datasets, multiple requests with different filters need to be made to obtain all of the required data. While this approach may seem tedious, it's necessary to ensure that the API performance remains stable and responsive. By breaking up large requests into smaller ones, businesses can avoid overloading the system and ensure a smooth data retrieval process.

In some cases, the limit of 10,000 results per search may not be sufficient to retrieve all of the desired data. For example, if the search query returns more than 10,000 records, the API limit would be exceeded, and multiple searches with filters would be required. Additionally, in the UI, the search results will stop counting after reaching 10,000 results, even if more results are available.

 

fkaocXlm07Hz1b9oIniSdas8Nj9eOvYksFm17xGDQ1AGL-IAU7F-Oxp9Ff7rmWmayJpdkg-YCgpO_-AsQ26mxs90Pm1Fj0TItJ5Xw5H1trH3X9Hs1VLzZrXW-m9fbeO3WTP-aAM8P2UWnhHf_HQzKIw

 

To address the limit of 10,000 records, a possible solution is to add a filter based on the "createdAt" property of the object and split the results into multiple searches based on a date range. This approach allows businesses to retrieve data in smaller, more manageable chunks while ensuring that no records are missed. By specifying a date range filter in the API query, businesses can retrieve data within a specific timeframe, such as a month or a quarter, and then split the results into multiple requests to retrieve all of the desired data. This approach can help businesses avoid hitting the API limit and ensure a smooth and efficient data retrieval process. In the following example, the filter for 3 months will reduce the results to 3,647 conversations.

 

xNLXcidu_iqQcY4mqr6GrS3puQIszDjh5M1D9qkYDRyeShv1ytK8HT5K-jsBShnUpbrKu7Dl1oHY_cuAjzoJkIchi_1Uq0YUzobW9QdouIVMrZu7x2mhS3FgPHJpZhNdX-ceHSw3GuRmk0DqTutbNK4

 

The equivalent Python script will be:

 

```Python

# the first part of the script it's the same

 

payload = {

   "and": {

   "conversation_channels": { "equals": "chat" },

   "conversation_created_at": { "between": ["2022-01-01T00:00:00.000Z","2022-03-31T23:59:59.999Z"] }

   },

   "queryContext": "conversation",

   "includeDeleted": False,

   "timeZone": "GMT"

}

 

response = requests.post(url, json=payload, headers=headers)

 

json = response.json()

print(json['meta'])

print(len(json['data']))

```

 

The object `meta` will give useful details about pagination of the search results.

 

Search pagination

 

When submitting a search request, there is a limit on the number of results that can be returned per page and the total number of results. In this case, the total limit of results is 10,000, but the endpoint only allows a maximum of 500 results per page.

 

To obtain the complete set of search results, you will need to fetch multiple pages. Since each page can display up to 500 results, you may need to fetch up to 20 pages (10,000 results / 500 results per page) to retrieve all possible results.

 

To fetch the subsequent pages, you use `pagination` parameters such as `page` and `pageSize` in the search request URL. These parameters indicate which portion of the results you want to retrieve. For example, if you set the 'page' parameter to 2, you will fetch the second set of 500 results, and so on.

 

By iterating through the pages and fetching the results accordingly, you can ensure that you receive the complete set of search results, up to the 10,000-result limit, as shown in the following Python script.

 

```Python

payload = {

   "and": {

   "conversation_channels": { "equals": "chat" },

   "conversation_created_at": { "between": ["2022-01-01T00:00:00.000Z","2022-03-31T23:59:59.999Z"] }

   },

   "queryContext": "conversation",

   "includeDeleted": False,

   "timeZone": "GMT"

}

 

base_url = "https://%s.api.kustomerapp.com/v1/customers/search" % INSTANCE_NAME

 

page = 1

pageSize = 500 # this is max for the /customers/search

results = []

 

while True:

   url = base_url + "?page=%s&pageSize=%s" % (page, pageSize)

   response = requests.post(url, json=payload, headers=headers)

   json = response.json()

   print(json['meta'])

   print(len(json['data']))

   results.extend(json['data'])

 

   if json['meta']['page'] == json['meta']['totalPages']:

       break;

   page += 1

```

 

Conclusions

 

In conclusion, managing large data searches with the Kustomer API can be efficiently achieved by leveraging the platform's powerful search functionality, pagination, and date range filters. By understanding the limitations of the API, such as the maximum number of results per search and per page, businesses can tailor their data retrieval process to ensure smooth and efficient access to large datasets. Utilizing an API also enables automation and integration with other applications, saving time and effort while providing a more streamlined workflow. By following the best practices outlined in this post, businesses can effectively analyze their customer data, improve operations, and make informed decisions.


0 replies

Be the first to reply!

Reply