# Confluence

## Overview

The Confluence Plugin for Omnata enables ingestion of Confluence data to Snowflake. It supports inbound syncs from Confluence Cloud with multiple data streams including pages, spaces, blog posts, comments, attachments, labels, and users.

## Authentication

#### Atlassian Cloud, API Token

To set up authentication:

1. Visit <https://id.atlassian.com/manage-profile/security/api-tokens>
2. Create an API token
3. Use it with your email address (typically your Atlassian account email)

**Configuration Fields**

* **Confluence Domain Name**: `site-name.atlassian.net` or `confluence.my-custom-domain.com`
* **User Email Address**: The email address associated with your API token
* **API Token**: Your generated Atlassian API token (stored securely)

## Inbound Syncs

The plugin supports the following streams for inbound syncs:

* [Pages](#pages)
* [Spaces](#spaces)
* [Blog Posts](#blog-posts)
* [Comments](#comments)
* [Attachments](#attachments)
* [Labels](#labels)
* [Users](#users)
* [Pages Content](#pages-content)

#### Pages

* **Sync Strategies**: Full Refresh, Incremental
* **Primary Key**: `id`
* **Cursor**: `lastModifiedAt` (for incremental sync)
* **Features**:
  * Confluence Query Language (CQL) filtering support
  * Full and incremental
  * Supports both Confluence API v1 and v2 formats
  * Extracts page body, version history, timestamps, and author information

#### Spaces

* **Sync Strategies**: Full Refresh
* **Primary Key**: `id`
* **Features**:
  * Syncs all spaces (global and personal)
  * Extracts space metadata, description, and homepage information

#### Blog Posts

* **Sync Strategies**: Full Refresh, Incremental
* **Primary Key**: `id`
* **Cursor**: `createdAt`

#### Comments

* **Sync Strategies**: Full Refresh
* **Primary Key**: `id`
* **Features**:
  * Syncs inline comments and footer comments from pages
  * Fetches nested comment replies
  * Captures comment metadata and author information

#### Attachments

* **Sync Strategies**: Full Refresh
* **Primary Key**: `id`
* **Features**:
  * Syncs file attachments from pages
  * Includes file metadata and download links

#### Labels

* **Sync Strategies**: Full Refresh
* **Primary Key**: `id`
* **Features**:
  * Syncs all labels in the Confluence instance
  * Includes label metadata and usage information

#### Users

* **Sync Strategies**: Full Refresh
* **Primary Key**: `id`
* **Features**:
  * Syncs user profiles from Confluence
  * Captures user details and account information

#### Pages Content

* **Sync Strategies**: Full Refresh
* **Primary Key**: `id`
* **Features**:
  * Syncs detailed page body content in storage format
  * Useful for full-text search and content analysis

## Configuration Parameters

#### CQL Clause for Pages

Filter pages using Confluence Query Language (optional):

**Parameter**: `cql_pages_clause`

**Examples**:

```
space = MYSPACE
label = "documentation"
creator = john.doe@example.com
space = DEV AND label = "important"
```

**Notes**:

* Do not include `ORDER BY` clauses - Omnata controls ordering for pagination
* Leave empty to sync all pages (default behaviour)

## Performance Notes

#### Initial Sync

The initial sync may take significant time depending on the volume of data:

* **Pages & Comments**: Comment fetching may add overhead as each page may require individual API calls to fetch related comments
* **Attachments**: Fetching attachment data for large files may increase sync duration
* Consider increasing sync timeout settings for large instances

#### Incremental Syncs

Incremental syncs use modification timestamps and are more efficient:

* Only fetches pages modified since the last sync
* Uses the `lastModifiedAt` cursor field
* Duplicate records on time boundaries are filtered client-side
* Recommended for regular, frequent syncs

#### API Rate Limiting

The plugin implements retry logic and rate limiting to respect Confluence API limits:

* Automatic retries with exponential backoff
* Respects API rate limit headers
* May require longer sync times during high load periods

## Troubleshooting

#### Connection Issues

Verify the following:

* Confluence domain name is correct
* API token is valid and not expired
* User email address matches the API token owner
* User has permissions to access the Confluence instance

#### Missing Data

If expected data is not appearing:

* Check CQL clause syntax (if using filters)
* Verify user permissions for affected content
* Ensure sufficient sync timeout for large datasets
* Review error logs for specific failure details
