Page Comparison

Introduction

One-click connectors are an easy way to fetch data from different sources without any configuration on the user side. By using OAuth2 authentication and providing pre-defined custom mappings, the whole loading process is very intuitive, easy and it is almost just ONE CLICK.

Excerpt
This tutorial goes step by step through building a custom one-click connector. It describes in detail the whole process from setting up the environment to upload the custom plugin. By providing code from already-built examples it gives an insight into how particular parts actually work, and it shows the best practices of building one-click connectors.

Prerequisites

...

Introduction

One-click connectors are an easy way to fetch data from different sources without any configuration on the user side. By using OAuth2 authentication and providing pre-defined custom mappings, the whole loading process is very intuitive, easy and it is almost just ONE CLICK.

Prerequisites

To get started, install the Squirro Toolbox. It can be done it either using Toolbox or https://squirro.atlassian.net/wiki/spaces/BOX.

...

Example of file tree for OneDrive connector:

File	Required?	Purpose
__init__.py		marks it as Python package
README.md		escribes plugin installation steps, like OAuth2 app configuration
auth.py		deals with OAuth2 or other authorization process
dataloader_plugin.json	Yes	contains references to other files and plugin name
facets.json	OCC only	describes common selection of facets
icon.png	Yes	plugin icon
mappings.json	Yes	mapping of dataloader Items to Squirro fields
onedrive_plugin.py	Yes	Core code
pipeline_workflow.json	OCC only	onedrive_plugin/pipeline_workflow.json
requirements.txt		Describes python dependencies
scheduling_options.json	OCC only	Defines sane scheduling defaults

When creating your own plugin replace onedrive from file/folder names with name corresponding to connector target service. Use alphanumeric, lowercase characters.

...

The file specifies general information about a plugin. Title, description or category are described here. It also specifies which files should be loaded to authorization, scheduling etc.

...

mappings.json

Code Block

language	json

{
    "createdBy.user.displayNamemap_id": "id",
{    "map_title": "name",
    "namemap_created_at": "creatorcreatedDateTime",
  
     "displaymap_file_name": "Creatorname",
        "visible": true,
"map_file_mime": "file.mimeType",
       "searchable"map_file_data": true"content",
        "typeahead"map_url": true"webUrl",
        "analyzed"facets_file": true
    }
}

It creates and specifies what facets should be use in the plugin. More information about facets: Managing Facets

Pay close attention on how you set up the facets.

Each added facets increases the index size and slows down query times.
Use only a few which can be useful for the end-user and be consistent across plugins.
If a facet is only needed for filtering or simple aggregations, set analyzed to false to reduce size / performance impact.
Check if similar facets were already used in existing plugins and try to use the same naming convention e.g. Author or Owner facet in new plugin could be treated also as Creator in others

mappings.json

Code Block

language	json

{"facets.json"
}

The file is used to set the mapping of various fields coming from source to corresponding Squirro item fields. In that place also a file containing the facets can be specified.

facets.json

Code Block

language	json

{
    "createdBy.user.displayName": {
        "map_idname": "idcreator",
        "mapdisplay_titlename": "nameCreator",
    "map_created_at    "visible": "createdDateTime",true,
        "map_file_namesearchable": "name",true,
        "map_file_mimetypeahead": "file.mimeType"true,
    "map_file_data": "content",     "map_urlanalyzed": true
"webUrl",     "facets_file": "facets.json"
}

...

}
}

It creates and specifies what facets should be use in the plugin. More information about facets: Managing Facets

Pay close attention on how you set up the facets.

Each added facets increases the index size and slows down query times.
Use only a few which can be useful for the end-user and be consistent across plugins.
If a facet is only needed for filtering or simple aggregations, set analyzed to false to reduce size / performance impact.
Check if similar facets were already used in existing plugins and try to use the same naming convention e.g. Author or Owner facet in new plugin could be treated also as Creator in others

pipeline_workflow.json

Code Block

language	json

{
    "steps": [
        {
            "config": {
                "policy": "replace"
            },
            "id": "deduplication",
            "name": "Deduplication",
            "type": "deduplication"
        },
        {
            "config": {
                "fetch_link_content": false
            },
            "id": "content-augmentation",
            "name": "Content Augmentation",
            "type": "content-augmentation"
        },
        {
            "config": {},
            "id": "content-conversion",
            "name": "Content Extraction",
            "type": "content-conversion"
        },
        {
            "id": "language-detection",
            "name": "Language Detection",
            "type": "language-detection"
        },
        {
            "id": "cleanup",
            "name": "Content Standardization",
            "type": "cleanup"
        },
        {
            "id": "index",
            "name": "Indexing",
            "type": "index"
        },
        {
            "id": "cache",
            "name": "Cache Cleaning",
            "type": "cache"
        }
    ]
}

...

The file lists all the required packages, with one package dependency per line. More information about data loader dependencies: Data loader plugin dependencies

...

e file:

Code Block

language	py

config = get_injected("config")
auth_tools.configure_oauth2_lib(config)

client_id = config.get("dataloader", "onedrive_client_id", fallback=None)
client_secret = config.get("dataloader", "onedrive_client_secret", fallback=None)

if not client_id or not client_secret:
    log.warning("Client keys are missing in %s plugin", target_name)

...

Code Block

language	py

def getJobId(self) -> str:
    """Generate a stable ID that changes with the main parameters."""
    m = hashlib.blake2b(digest_size=20)
    for v in (
        __plugin_name__,
        __version__,
        self.arg_index_all,
        self.arg_file_size_limit,
        self.arg_batch_size_limit,
        self.arg_download_media_files,
        self.arg_access_id,
    ):
        m.update(repr(v).encode())
    job_id = base64.urlsafe_b64encode(m.digest()).rstrip(b"=").decode()
    log.debug("Job ID: %r", job_id)
    return job_id

In case of many Since the end users can configure multiple data sources each of them should be somehow distinguishwith this plugin, we need to be able to distinguish them. The key reasons is to keep state and caching information of each instance separated. Therefore, this method is used to define a unique ID for the actual job. To provide that unique ID as many custom parameters as possible should be used. The common approach is used all arguments which can be set up by the user. In case of one click-connector passing refresh token or access token (arg_access_id in the above code) is one of the best option to provide unique argument to generate stable ID.

...

Versions Compared

Old Version 17

New Version 18

Key

Introduction

Prerequisites

Introduction

Prerequisites

mappings.json

mappings.json

facets.json

pipeline_workflow.json

e file: