Skip to content

Cognition integration provider #167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 143 commits into from
Jul 4, 2025
Merged
Show file tree
Hide file tree
Changes from 120 commits
Commits
Show all changes
143 commits
Select commit Hold shift + click to select a range
551c995
build: third party integration first commit
andhreljaKern May 13, 2025
e115cac
chore: update enums
andhreljaKern May 14, 2025
4d9970a
perf: add integration acess
andhreljaKern May 15, 2025
80712fc
perf: rename to integration
andhreljaKern May 15, 2025
fbd5a85
perf: add last_extraction column to integration
andhreljaKern May 16, 2025
0dbad39
perf: update integration delta
andhreljaKern May 16, 2025
458e4dc
perf: update integration access to list types
andhreljaKern May 16, 2025
7d70cd4
perf: add integration_types to integration access
andhreljaKern May 16, 2025
bbf643e
perf: add integration_types to integration access
andhreljaKern May 16, 2025
cc57cb4
perf: rename last_extraction to extract_history
andhreljaKern May 16, 2025
2da4e15
fix: store enum.value instead of enum
andhreljaKern May 16, 2025
edad963
fix: integration.project_id nullable
andhreljaKern May 16, 2025
4e52cb8
fix: nulable column instead of foreignkey
andhreljaKern May 16, 2025
03842fd
fix: enum values
andhreljaKern May 16, 2025
0386f6c
perf: task cancellation
andhreljaKern May 16, 2025
90debd4
fix: keyword arguments
andhreljaKern May 19, 2025
d099974
perf: integration record
andhreljaKern May 19, 2025
6fd3656
perf: add tokenizer
andhreljaKern May 20, 2025
76fd2ff
perf: add update integration access
andhreljaKern May 20, 2025
23099d3
perf: update integration endpoints
andhreljaKern May 20, 2025
5d2d503
perf: add get endpoint
andhreljaKern May 20, 2025
a2aa966
Oidc field in the users table
lumburovskalina May 21, 2025
4cd321f
perf: add org_id to integration provider
andhreljaKern May 26, 2025
0b6a9fa
Merge branch 'dev' into cognition-integration-provider
andhreljaKern May 26, 2025
0467f29
perf: add org_id support to integration
andhreljaKern May 26, 2025
888a542
perf: add record delta criteria
andhreljaKern May 26, 2025
8af9e39
fix: task execution finish on failed integration
andhreljaKern May 26, 2025
99494f8
perf: add integration finished_at
andhreljaKern May 26, 2025
5989801
perf: add started_at
andhreljaKern May 26, 2025
bec5f20
fix: started_at - finished_at syntax error
andhreljaKern May 26, 2025
8898a3c
perf: add integration records
andhreljaKern May 27, 2025
4aeea83
Merge branch 'cognition-integration-provider' of github.com:code-kern…
andhreljaKern May 27, 2025
66e8a0c
perf: add integration tables
andhreljaKern May 27, 2025
2ec00db
perf: update integrations delta
andhreljaKern May 27, 2025
614d706
perf: add sharepoint integration
andhreljaKern May 27, 2025
d7c8d6b
perf: update integration objects
andhreljaKern May 29, 2025
359187e
perf: expand IntegrationSharepoint
andhreljaKern May 29, 2025
3f774d1
fix: integration.started_at
andhreljaKern May 29, 2025
03169de
perf: integration data types
andhreljaKern May 29, 2025
18990bb
perf: unique constraint names
andhreljaKern May 29, 2025
25b039a
Reset finished at for new integrations
lumburovskalina Jun 2, 2025
aa4416e
perf: update integration objects
andhreljaKern Jun 3, 2025
1093257
Merge branch 'cognition-integration-provider' of https://github.com/c…
andhreljaKern Jun 3, 2025
40cbf2f
perf: add integration delta deletion
andhreljaKern Jun 3, 2025
ac935a0
Merge branch 'dev' into cognition-integration-provider
andhreljaKern Jun 3, 2025
6b7bc0b
Merge branch 'dev' into cognition-integration-provider
andhreljaKern Jun 3, 2025
515e01a
basic models
LennartSchmidtKern Jun 3, 2025
6c6f4de
perf: last_synced_at integration column
andhreljaKern Jun 3, 2025
5e52d26
Merge branch 'cognition-integration-provider' of https://github.com/c…
andhreljaKern Jun 3, 2025
c18a6eb
perf: add is_synced column
andhreljaKern Jun 3, 2025
d5afe9b
chore: add typing
andhreljaKern Jun 3, 2025
dc1e5e8
perf: add sync columns
andhreljaKern Jun 3, 2025
5a35116
fix model
LennartSchmidtKern Jun 3, 2025
00c46d6
perf: add get_all integrations
andhreljaKern Jun 3, 2025
08e04a6
chore: add todo comment
andhreljaKern Jun 3, 2025
d0d5bd3
Merge remote-tracking branch 'origin/dev' into access-management
LennartSchmidtKern Jun 3, 2025
299a4aa
permission test
LennartSchmidtKern Jun 4, 2025
995612b
projects with access management
LennartSchmidtKern Jun 4, 2025
fad5cc4
perf: add sharepoint db bo
andhreljaKern Jun 4, 2025
264d0fe
deactivate mock up
LennartSchmidtKern Jun 4, 2025
f8db2cf
get by user id
LennartSchmidtKern Jun 4, 2025
2b86f5e
enable list payload
LennartSchmidtKern Jun 4, 2025
19768dd
perf: integration update
andhreljaKern Jun 4, 2025
7231952
perf: tech discussion feedback
andhreljaKern Jun 5, 2025
91fe108
merge integration
LennartSchmidtKern Jun 6, 2025
7e88d90
perf: get integrations updates
andhreljaKern Jun 9, 2025
57bfe5d
perf: integration updates
andhreljaKern Jun 10, 2025
1fd2127
Merge branch 'cognition-integration-provider' of https://github.com/c…
LennartSchmidtKern Jun 11, 2025
c67bae4
perf: introduce managers
andhreljaKern Jun 11, 2025
0373a58
chore: typing
andhreljaKern Jun 11, 2025
0bdebd5
perf: access + check for updates
andhreljaKern Jun 11, 2025
ffa35bf
perf: update integration
andhreljaKern Jun 13, 2025
342b221
perf: add delta url to sharepoint integration
andhreljaKern Jun 13, 2025
c9f4791
fix: move delta_url to cognitionintegration
andhreljaKern Jun 13, 2025
02ad8cc
perf: integration updates
andhreljaKern Jun 13, 2025
7c023ee
perf: add updated_by + delta_criteria
andhreljaKern Jun 16, 2025
9db6027
Merge remote-tracking branch 'origin/cognition-integration-provider' …
LennartSchmidtKern Jun 16, 2025
5969ab9
perf: add delta_criteria field
andhreljaKern Jun 17, 2025
a6f6ba2
perf: check for status improvement
andhreljaKern Jun 17, 2025
789e79e
Merge remote-tracking branch 'origin/cognition-integration-provider' …
LennartSchmidtKern Jun 18, 2025
32e689f
add meta_data group
LennartSchmidtKern Jun 20, 2025
4cd9462
format
LennartSchmidtKern Jun 20, 2025
3009ba7
Merge branch 'access-management' of https://github.com/code-kern-ai/r…
LennartSchmidtKern Jun 20, 2025
926fc4c
model
LennartSchmidtKern Jun 20, 2025
06b4fbd
Merge branch 'access-management' of github.com:code-kern-ai/refinery-…
LennartSchmidtKern Jun 20, 2025
5bbdf87
sync action
LennartSchmidtKern Jun 20, 2025
9f61c3c
rename action
LennartSchmidtKern Jun 20, 2025
b9ad6b6
perf: dynamic 'by' record grouping
andhreljaKern Jun 20, 2025
2e769dd
style: arguments newlines in function definitions
andhreljaKern Jun 24, 2025
419811b
new getters
LennartSchmidtKern Jun 24, 2025
21a3234
Merge remote-tracking branch 'origin/cognition-integration-provider' …
LennartSchmidtKern Jun 24, 2025
a20ea01
delete groups
LennartSchmidtKern Jun 24, 2025
93bfb51
Merge branch 'access-management' of github.com:code-kern-ai/refinery-…
LennartSchmidtKern Jun 24, 2025
09b63b5
fix: rm get project by name and org id
andhreljaKern Jun 24, 2025
0822467
perf: pr review comments
andhreljaKern Jun 25, 2025
24954fd
Merge branch 'cognition-integration-provider' of https://github.com/c…
andhreljaKern Jun 25, 2025
39b710f
perf: pr review comments
andhreljaKern Jun 25, 2025
5ffe2bf
fix: paginated query
andhreljaKern Jun 25, 2025
be01221
perf: update integrations model
andhreljaKern Jun 25, 2025
4807248
Merge branch 'cognition-integration-provider' of https://github.com/c…
andhreljaKern Jun 25, 2025
bfe2e28
chore: merge dev into cognition-integration-provider
andhreljaKern Jun 25, 2025
c389dc6
perf: update to_snake_case regex compilation
andhreljaKern Jun 25, 2025
4243fd5
Merge branch 'cognition-integration-provider' of github.com:code-kern…
andhreljaKern Jun 25, 2025
44f58c3
chore: resolve merge conflict
andhreljaKern Jun 25, 2025
2c35db6
Merge branch 'cognition-integration-provider' of github.com:code-kern…
andhreljaKern Jun 25, 2025
0be3fc5
access management
LennartSchmidtKern Jun 25, 2025
f1b40eb
perf: db model update
andhreljaKern Jun 25, 2025
744ee36
perf: pr comments
andhreljaKern Jun 25, 2025
c1ae061
perf: remove unnecessary checks
andhreljaKern Jun 25, 2025
88caad0
fix: nullable error message
andhreljaKern Jun 25, 2025
4fa69bc
perf: pr comments
andhreljaKern Jun 25, 2025
f44b56d
merge
LennartSchmidtKern Jun 26, 2025
74008a9
delete
LennartSchmidtKern Jun 26, 2025
3020f36
perf: move IntegrationMetadata enum to integration_objects.helper
andhreljaKern Jun 26, 2025
cdb143e
Merge branch 'access-management' of github.com:code-kern-ai/refinery-…
andhreljaKern Jun 26, 2025
a0420da
chore: resolve merge conflicts
andhreljaKern Jun 26, 2025
2fe5e2d
Oidc field in the users table (#170)
lumburovskalina Jun 26, 2025
9a7026f
fix: metadata helper function
andhreljaKern Jun 26, 2025
2a4899f
perf: update integration task type name
andhreljaKern Jun 26, 2025
d57104e
style: formatting
andhreljaKern Jun 26, 2025
db484cf
perf: pr review comments
andhreljaKern Jun 27, 2025
42b09fd
perf: add REFINERY_ATTRIBUTE_ACCESS constants
andhreljaKern Jun 27, 2025
e0d1fb4
fix: query builder for record_ids
andhreljaKern Jun 27, 2025
fafeccd
perf: monitor.set_integration_task_to_failed with state
andhreljaKern Jun 30, 2025
90ac0d5
Merge branch 'cognition-integration-provider' of github.com:code-kern…
andhreljaKern Jun 30, 2025
535e992
perf: add early exit for deleted integrations
andhreljaKern Jul 1, 2025
ab273ca
perf: add early exit for task execution
andhreljaKern Jul 1, 2025
cf5cdd8
Merge branch 'cognition-integration-provider' of https://github.com/c…
andhreljaKern Jul 1, 2025
209fa94
sharepoint active queue
LennartSchmidtKern Jul 1, 2025
5004048
unique by name and integration
LennartSchmidtKern Jul 1, 2025
1ae14b5
typing
LennartSchmidtKern Jul 1, 2025
4d81dff
improve sql
LennartSchmidtKern Jul 1, 2025
91cdd20
merge
LennartSchmidtKern Jul 2, 2025
36a48c7
model
LennartSchmidtKern Jul 2, 2025
5ef1483
fix group
LennartSchmidtKern Jul 2, 2025
c9ea2e2
Merge branch 'dev' into cognition-integration-provider
andhreljaKern Jul 2, 2025
e38a007
remove unique
LennartSchmidtKern Jul 3, 2025
71c9603
perf: add file_properties integration column
andhreljaKern Jul 3, 2025
602baf9
perf: update default state for set_integration_task_to_failed
andhreljaKern Jul 3, 2025
6d5ed14
chore: resolve conflict
andhreljaKern Jul 3, 2025
06ffea3
chore: resolve conflicts
andhreljaKern Jul 3, 2025
06f7509
Adds option filter for pid
JWittmeyer Jul 3, 2025
9c2f2a1
PR comments
JWittmeyer Jul 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions business_objects/embedding.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,8 +321,11 @@ def __build_payload_selector(
if (
data_type != enums.DataTypes.TEXT.value
and data_type != enums.DataTypes.LLM_RESPONSE.value
and data_type != enums.DataTypes.PERMISSION.value
):
payload_selector += f"'{attr}', (r.\"data\"->>'{attr}')::{data_type}"
if data_type == enums.DataTypes.PERMISSION.value:
payload_selector += f"'{attr}', r.\"data\"->'{attr}'"
else:
payload_selector += f"'{attr}', r.\"data\"->>'{attr}'"
payload_selector = f"json_build_object({payload_selector}) payload"
Expand Down
11 changes: 11 additions & 0 deletions business_objects/monitor.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
markdown_file as markdown_file_db_bo,
file_extraction as file_extraction_db_bo,
file_transformation as file_transformation_db_bo,
integration as integration_db_bo,
)

FILE_CACHING_IN_PROGRESS_STATES = [
Expand Down Expand Up @@ -197,6 +198,16 @@ def set_parse_cognition_file_task_to_failed(
general.commit()


def set_integration_task_to_failed(
integration_id: str,
with_commit: bool = False,
) -> None:
integration = integration_db_bo.get_by_id(integration_id)
if integration:
integration.state = enums.CognitionMarkdownFileState.FAILED.value
general.flush_or_commit(with_commit)


def __select_running_information_source_payloads(
project_id: Optional[str] = None,
only_running: bool = False,
Expand Down
38 changes: 34 additions & 4 deletions business_objects/project.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,7 @@

from .. import enums
from ..session import session
from ..models import (
Project,
Record,
)
from ..models import Project, Record, Attribute
from ..util import prevent_sql_injection

QUEUE_PROJECT_NAME = "@@HIDDEN_QUEUE_PROJECT@@"
Expand Down Expand Up @@ -155,6 +152,39 @@ def get_all(organization_id: str) -> List[Project]:
)


def get_all_with_access_management(organization_id: str) -> List[Project]:
return (
session.query(Project)
.join(Attribute, Project.id == Attribute.project_id)
.filter(
Project.organization_id == organization_id,
Attribute.name.in_(["__ACCESS_GROUPS", "__ACCESS_USERS"]), #
Attribute.name.in_(["__ACCESS_GROUPS", "__ACCESS_USERS"]),
Attribute.user_created == False,
Attribute.data_type == enums.DataTypes.PERMISSION.value,
Attribute.state == enums.AttributeState.AUTOMATICALLY_CREATED.value,
)
.distinct()
.all()
)


def check_access_management_active(project_id: str) -> bool:
return (
session.query(Project)
.join(Attribute, Project.id == Attribute.project_id)
.filter(
Project.id == project_id,
Attribute.name.in_(["__ACCESS_GROUPS", "__ACCESS_USERS"]),
Attribute.user_created == False,
Attribute.data_type == enums.DataTypes.PERMISSION.value,
Attribute.state == enums.AttributeState.AUTOMATICALLY_CREATED.value,
)
.count()
> 0
)


def get_all_by_user_organization_id(organization_id: str) -> List[Project]:
projects = (
session.query(Project).filter(Project.organization_id == organization_id).all()
Expand Down
39 changes: 37 additions & 2 deletions business_objects/record.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from __future__ import with_statement
from typing import List, Dict, Any, Optional, Tuple, Iterable
from sqlalchemy import cast, Text
from sqlalchemy import cast, Text, String
from sqlalchemy.orm.attributes import flag_modified
from sqlalchemy.sql.expression import bindparam
from sqlalchemy import update
Expand Down Expand Up @@ -609,7 +609,7 @@ def count_missing_tokenized_records(project_id: str) -> int:
query = f"""
SELECT COUNT(*)
FROM (
{get_records_without_tokenization(project_id, None, query_only = True)}
{get_records_without_tokenization(project_id, None, query_only=True)}
) record_query
"""
return general.execute_first(query)[0]
Expand Down Expand Up @@ -807,6 +807,25 @@ def delete_user_created_attribute(
general.flush_or_commit(with_commit)


def delete_access_management_attributes(
project_id: str, with_commit: bool = True
) -> None:
access_groups_attribute_item = attribute.get_by_name(project_id, "__ACCESS_GROUPS")
access_users_attribute_item = attribute.get_by_name(project_id, "__ACCESS_USERS")

if access_users_attribute_item and access_groups_attribute_item:
record_items = get_all(project_id=project_id)
for i, record_item in enumerate(record_items):
if record_item.data.get(access_groups_attribute_item.name):
del record_item.data[access_groups_attribute_item.name]
if record_item.data.get(access_users_attribute_item.name):
del record_item.data[access_users_attribute_item.name]
flag_modified(record_item, "data")
if (i + 1) % 1000 == 0:
general.flush_or_commit(with_commit)
general.flush_or_commit(with_commit)


def delete_duplicated_rats(with_commit: bool = False) -> None:
# no project so run for all to prevent expensive join with record table
query = """
Expand Down Expand Up @@ -925,3 +944,19 @@ def get_first_no_text_column(project_id: str, record_id: str) -> str:
WHERE r.project_id = '{project_id}' AND r.id = '{record_id}'
"""
return general.execute_first(query)[0]


def get_record_ids_by_running_ids(project_id: str, running_ids: List[int]) -> List[str]:
return [
row[0]
for row in (
session.query(cast(Record.id, String))
.filter(
Record.project_id == project_id,
Record.data[attribute.get_running_id_name(project_id)]
.as_integer()
.in_(running_ids),
)
.all()
)
]
123 changes: 123 additions & 0 deletions cognition_objects/group.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
from datetime import datetime
from typing import List, Optional
from ..business_objects import general
from ..session import session
from ..models import CognitionGroup


def get(group_id: str) -> CognitionGroup:
return session.query(CognitionGroup).filter(CognitionGroup.id == group_id).first()


def get_with_organization_id(organization_id: str, group_id: str) -> CognitionGroup:
return (
session.query(CognitionGroup)
.filter(
CognitionGroup.organization_id == organization_id,
CognitionGroup.id == group_id,
)
.first()
)


def get_all(organization_id: str) -> List[CognitionGroup]:
return (
session.query(CognitionGroup)
.filter(CognitionGroup.organization_id == organization_id)
.order_by(CognitionGroup.name.asc())
.all()
)


def get_all_by_integration_id(
organization_id: str, integration_id: str
) -> List[CognitionGroup]:
integration_id_json = CognitionGroup.meta_data.op("->>")("integration_id")

return (
session.query(CognitionGroup)
.filter(
CognitionGroup.organization_id == organization_id,
integration_id_json == integration_id,
)
.order_by(CognitionGroup.name.asc())
.all()
)


def get_all_by_integration_id_permission_grouped(
organization_id: str, integration_id: str
) -> List[CognitionGroup]:
integration_id_json = CognitionGroup.meta_data.op("->>")("integration_id")

integration_groups = (
session.query(CognitionGroup)
.filter(
CognitionGroup.organization_id == organization_id,
integration_id_json == integration_id,
)
.all()
)
integration_groups_by_permission = {}
for group in integration_groups:
permission_id = group.meta_data.get("permission_id")
integration_groups_by_permission[permission_id] = group
return integration_groups_by_permission


def get_by_name(organization_id: str, name: str):
return (
session.query(CognitionGroup)
.filter(
CognitionGroup.organization_id == organization_id,
CognitionGroup.name == name,
)
.first()
)


def create_group(
organization_id: str,
name: str,
description: str,
created_by: str,
created_at: Optional[datetime] = None,
with_commit: bool = True,
meta_data: Optional[dict] = None,
) -> CognitionGroup:
group = CognitionGroup(
organization_id=organization_id,
name=name,
description=description,
created_by=created_by,
created_at=created_at,
meta_data=meta_data,
)
general.add(group, with_commit)
return group


def update_group(
group_id: str,
name: Optional[str] = None,
description: Optional[str] = None,
with_commit: bool = True,
meta_data: Optional[dict] = None,
) -> CognitionGroup:
group = get(group_id)

if name is not None:
group.name = name
if description is not None:
group.description = description
if meta_data is not None:
group.meta_data = meta_data
general.flush_or_commit(with_commit)

return group


def delete(organization_id: str, group_id: str, with_commit: bool = True) -> None:
group = get_with_organization_id(organization_id, group_id)
if group:
general.delete(group, with_commit)
103 changes: 103 additions & 0 deletions cognition_objects/group_member.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
from datetime import datetime
from typing import Optional
from ..business_objects import general, user
from . import group
from ..session import session
from ..models import CognitionGroupMember


def get(group_id: str, id: str):
return (
session.query(CognitionGroupMember)
.filter(
CognitionGroupMember.group_id == group_id, CognitionGroupMember.id == id
)
.first()
)


def get_by_group_and_user(group_id: str, user_id: str) -> CognitionGroupMember:
return (
session.query(CognitionGroupMember)
.filter(
CognitionGroupMember.group_id == group_id,
CognitionGroupMember.user_id == user_id,
)
.first()
)


def get_by_user_id(user_id: str) -> list:
return (
session.query(CognitionGroupMember)
.filter(CognitionGroupMember.user_id == user_id)
.all()
)


def get_all_by_group(group_id: str) -> list:
return (
session.query(CognitionGroupMember)
.filter(CognitionGroupMember.group_id == group_id)
.all()
)


def get_all_by_group_count(group_id: str) -> int:
return (
session.query(CognitionGroupMember)
.filter(CognitionGroupMember.group_id == group_id)
.count()
)


def create(
group_id: str,
user_id: str,
created_at: Optional[datetime] = None,
with_commit: bool = True,
) -> CognitionGroupMember:
already_exist = get_by_group_and_user(group_id=group_id, user_id=user_id)
if already_exist:
return already_exist

group_item = group.get(group_id)
user_item = user.get(user_id)
if not group_item or not user_item:
raise Exception("Group or user not found")
if group_item.organization_id != user_item.organization_id:
raise Exception("User not in the same organization as the group")

group_member = CognitionGroupMember(
group_id=group_id,
user_id=user_id,
created_at=created_at,
)
general.add(group_member, with_commit)
return group_member


def delete_by_group_and_user_id(
group_id: str, user_id: str, with_commit: bool = True
) -> None:
group_member = get_by_group_and_user(group_id, user_id)
if group_member:
general.delete(group_member, with_commit)


def delete_by_user_id(user_id: str, with_commit: bool = True) -> None:
group_memberships = (
session.query(CognitionGroupMember)
.filter(CognitionGroupMember.user_id == user_id)
.all()
)
for membership in group_memberships:
general.delete(membership, with_commit=False)
general.flush_or_commit(with_commit)


def clear_by_group_id(group_id: str, with_commit: bool = True) -> None:
group_memberships = get_all_by_group(group_id)
for membership in group_memberships:
general.delete(membership, with_commit=False)
general.flush_or_commit(with_commit)
Loading