-
Notifications
You must be signed in to change notification settings - Fork 74
Airbyte Integration Test Suite #1037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Airbyte Integration Test Suite #1037
Conversation
WalkthroughA new comprehensive Airbyte integration test suite has been introduced, including a dedicated environment configuration file and detailed documentation. The Changes
Sequence Diagram(s)sequenceDiagram
participant Tester
participant AirbyteAPI
Tester->>AirbyteAPI: Create Workspace
AirbyteAPI-->>Tester: Workspace ID
loop For each Destination
Tester->>AirbyteAPI: Create Destination
AirbyteAPI-->>Tester: Destination ID
end
loop For each Source
Tester->>AirbyteAPI: Create Source
AirbyteAPI-->>Tester: Source ID
end
loop For each Source-Destination Pair
Tester->>AirbyteAPI: Get Source Schema Catalog
AirbyteAPI-->>Tester: Catalog
Tester->>AirbyteAPI: Create Connection (Full Refresh Overwrite)
AirbyteAPI-->>Tester: Connection ID
end
loop For each Connection
Tester->>AirbyteAPI: Run Sync
AirbyteAPI-->>Tester: Sync Job Status (polling until complete)
Tester->>AirbyteAPI: Fetch Sync History and Logs
AirbyteAPI-->>Tester: Sync History/Logs
end
Tester->>AirbyteAPI: Delete Connections, Sources, Destinations, Workspace
AirbyteAPI-->>Tester: Confirmation
Assessment against linked issues
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (7)
.env.airbyte_integration_test (2)
4-8: Consider leaving the test-suite disabled by default in committed templatesCommitting the template with
AIRBYTE_TEST_ENABLED=falseis good; however, once a developer flips the flag totrueand forgets to revert it, the CI pipeline may accidentally create real resources on every run.
A safer pattern is to omit the variable (or comment it out) and have the test‑suite default tofalsewhen the variable is missing, forcing an explicit opt‑in per run.
19-25: Mask credentials & add explicit ‘_EXAMPLE’ suffixesEven though the values are local‑only, it is best to ship templates with obviously fake placeholders (e.g.
PG_PASSWORD=changeme) or suffix each key with_EXAMPLE.
That prevents casual copy‑paste of real secrets into VC and makes secret‑scanning tools happier.README.md (2)
248-252: Clarify comma‑separated values must be lowercaseThe test‑suite performs a
str.lower()comparison when matching connector names.
Mentioning thatAIRBYTE_TEST_SOURCES/DESTINATIONSshould be lowercase avoids mysterious “Skipping unknown …” log lines.
269-273: Add warning about production data safetyThe note is helpful, but emphasising that the tests use destructive operations (
overwritesync mode, workspace deletion) will prevent accidental runs against customer instances.
Suggest adding “DO NOT run against production workspaces” in bold.ddpui/tests/integration_tests/test_airbyte_full_integration.py (3)
169-188: Workspace leak on early failureIf
create_workspacesucceeds butsetup_classlater raises (e.g. during destination creation) the workspace id is stored only on the class, yetteardown_classwill still run. Good.
However, if the process dies (Ctrl‑C) before teardown, the workspace remains.
Add atry/finallyaround resource creation or tag the workspace name withCI_BUILD_IDso that a nightly cleanup job can purge leftovers.
372-390: Polling loop can exceed global timeoutYou sleep
AIRBYTE_SYNC_WAIT_INTERVAL(default 10 s) for up toAIRBYTE_MAX_WAIT_CYCLES(default 30) → 5 minutes max, regardless ofAIRBYTE_TEST_TIMEOUT.
Either honourAIRBYTE_TEST_TIMEOUTor document the precedence.
429-430: Do not execute tests on importThe
__main__section runspytestrecursively, which is surprising behaviour if someone imports the module in another script.
Consider removing it; developers can still runpytest -k airbyte_full_integration.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.env.airbyte_integration_test(1 hunks)README.md(1 hunks)ddpui/tests/integration_tests/test_airbyte_full_integration.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
ddpui/tests/integration_tests/test_airbyte_full_integration.py (2)
ddpui/ddpairbyte/airbyte_service.py (12)
create_workspace(156-168)delete_workspace(171-174)create_destination(588-611)create_source(331-366)get_source_definitions(199-221)get_destination_definitions(501-510)sync_connection(920-929)get_jobs_for_connection(950-976)get_job_info(932-938)get_logs_for_job(1009-1028)delete_destination(580-585)get_source_schema_catalog(459-498)ddpui/ddpairbyte/schema.py (1)
AirbyteConnectionCreate(59-66)
🪛 Ruff (0.8.2)
ddpui/tests/integration_tests/test_airbyte_full_integration.py
20-20: typing.List imported but unused
Remove unused import
(F401)
20-20: typing.Dict imported but unused
Remove unused import
(F401)
20-20: typing.Optional imported but unused
Remove unused import
(F401)
🔇 Additional comments (2)
ddpui/tests/integration_tests/test_airbyte_full_integration.py (2)
141-150: Hard‑coded/healthendpoint may be wrong for Airbyte > 0.50Recent Airbyte versions expose
/health/(note the trailing slash) and return JSON with"available":true.
Consider checking both variants or parsing the JSON for backward compatibility.
282-321:get_source_definitionsmay return dict, code expects list
get_source_definitionsreturns a dict with key"sourceDefinitions".
You buildsource_def_mapcorrectly from it, but later when skipping you rely onsource_def_key in keywhich is lenient.
The bigger issue appears in test 05 (see below).
.env.airbyte_integration_test
Outdated
| AIRBYTE_TEST_S3_BUCKET=test-bucket | ||
| AIRBYTE_TEST_S3_PATH=test-path | ||
| AIRBYTE_TEST_S3_REGION=us-east-1 | ||
| AIRBYTE_TEST_S3_ACCESS_KEY= | ||
| AIRBYTE_TEST_S3_SECRET_KEY= No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing space may break some dotenv loaders
Line 48 ends with a space after the = which is preserved by some implementations and leads to authentication failures.
Remove trailing whitespace from all keys that expect secret values.
-AIRBYTE_TEST_S3_SECRET_KEY=
+AIRBYTE_TEST_S3_SECRET_KEY=|
incorporated coderabbit feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
ddpui/tests/integration_tests/test_airbyte_full_integration.py (2)
385-395: Correct Airbyte catalog payload – good job!The re‑written list‐comprehension now meets the API contract by embedding
streamandconfigkeys. This addresses the earlier review feedback.
480-481: Correct argument order forget_logs_for_job– fixedUsing
0as the attempt index matches the helper’s signature and resolves the earlier bug.
🧹 Nitpick comments (4)
ddpui/tests/integration_tests/test_airbyte_full_integration.py (4)
54-59: Strip whitespace when parsing env‐driven connector lists
split(",")keeps surrounding whitespace, so values like"postgres, s3"will contain" s3"and silently skip the matching definition a few lines later.
A smallstrip()makes the test suite less brittle to human editing errors.-AIRBYTE_TEST_DESTINATIONS = os.getenv("AIRBYTE_TEST_DESTINATIONS", "postgres").split(",") -AIRBYTE_TEST_SOURCES = os.getenv("AIRBYTE_TEST_SOURCES", "file").split(",") +AIRBYTE_TEST_DESTINATIONS = [d.strip() for d in os.getenv("AIRBYTE_TEST_DESTINATIONS", "postgres").split(",") if d.strip()] +AIRBYTE_TEST_SOURCES = [s.strip() for s in os.getenv("AIRBYTE_TEST_SOURCES", "file").split(",") if s.strip()]
63-68:MAX_SYNC_WAIT_TIMEbecomes stale after you shrinkAIRBYTE_MAX_WAIT_CYCLESYou recalculate
AIRBYTE_MAX_WAIT_CYCLESto honour the global timeout but leave the previously computedMAX_SYNC_WAIT_TIMEunchanged.
Re‑compute it (or avoid the extra variable altogether) so the later while‑loop reflects the corrected limit.- AIRBYTE_MAX_WAIT_CYCLES = AIRBYTE_TEST_TIMEOUT // AIRBYTE_SYNC_WAIT_INTERVAL - logger.warning( - f"Adjusted max wait cycles to {AIRBYTE_MAX_WAIT_CYCLES} to respect timeout of {AIRBYTE_TEST_TIMEOUT}s" - ) + AIRBYTE_MAX_WAIT_CYCLES = AIRBYTE_TEST_TIMEOUT // AIRBYTE_SYNC_WAIT_INTERVAL + MAX_SYNC_WAIT_TIME = AIRBYTE_SYNC_WAIT_INTERVAL * AIRBYTE_MAX_WAIT_CYCLES + logger.warning( + "Adjusted max wait cycles to %s to respect timeout of %ss", + AIRBYTE_MAX_WAIT_CYCLES, + AIRBYTE_TEST_TIMEOUT, + )
208-231:try/finallycomment is misleading – and cleanup can be simplerThe outer
tryadvertises afinallythat does not exist, which is confusing to future maintainers.
You can both clarify intent and shrink the code by usingcontextlib.suppressfor the best‑effortdelete_workspacein the error path.- try: - # Use try/finally to ensure cleanup even if setup fails partially - try: - workspace_result = create_workspace(workspace_name) - cls.workspace_id = workspace_result["workspaceId"] - logger.info(f"Created workspace with ID: {cls.workspace_id}") - except HttpError as e: - logger.error(f"Failed to create workspace: {str(e)}") - pytest.skip(f"Could not create workspace: {str(e)}") - return - except Exception as e: - logger.error(f"Failed to create workspace: {str(e)}") - pytest.skip(f"Unexpected error: {str(e)}") - return - except Exception as e: - # This catch-all ensures we don't leak workspaces even on unexpected errors - logger.error(f"Error during setup: {str(e)}") - # Try to clean up if we did create a workspace - if cls.workspace_id: - try: - delete_workspace(cls.workspace_id) - except Exception: - pass - raise + try: + workspace_result = create_workspace(workspace_name) + cls.workspace_id = workspace_result["workspaceId"] + logger.info("Created workspace with ID: %s", cls.workspace_id) + except HttpError as e: + logger.error("Failed to create workspace: %s", e) + pytest.skip(f"Could not create workspace: {e}") + except Exception as e: + logger.error("Unexpected error creating workspace: %s", e) + with contextlib.suppress(Exception): + if cls.workspace_id: + delete_workspace(cls.workspace_id) + raiseRemember to
import contextlibat the top.🧰 Tools
🪛 Ruff (0.8.2)
227-230: Use
contextlib.suppress(Exception)instead oftry-except-passReplace with
contextlib.suppress(Exception)(SIM105)
282-287:dest_typecomparison could be case/space robustMinor, but normalising both sides (
lower().strip()) avoids surprises from environment variables such as" Postgres ".No diff provided – applies equally to the source loop below.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.env.airbyte_integration_test(1 hunks)README.md(1 hunks)ddpui/tests/integration_tests/test_airbyte_full_integration.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- README.md
- .env.airbyte_integration_test
🧰 Additional context used
🧬 Code Graph Analysis (1)
ddpui/tests/integration_tests/test_airbyte_full_integration.py (3)
ddpui/ddpairbyte/airbyte_service.py (12)
create_workspace(156-168)delete_workspace(171-174)create_destination(588-611)create_source(331-366)get_source_definitions(199-221)get_destination_definitions(501-510)sync_connection(920-929)get_jobs_for_connection(950-976)get_job_info(932-938)get_logs_for_job(1009-1028)delete_destination(580-585)get_source_schema_catalog(459-498)ddpui/ddpairbyte/schema.py (1)
AirbyteConnectionCreate(59-66)ddpui/utils/custom_logger.py (3)
warning(49-53)info(25-29)error(31-35)
🪛 Ruff (0.8.2)
ddpui/tests/integration_tests/test_airbyte_full_integration.py
227-230: Use contextlib.suppress(Exception) instead of try-except-pass
Replace with contextlib.suppress(Exception)
(SIM105)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
ddpui/tests/integration_tests/test_airbyte_full_integration.py (2)
380-380: Move import statement to top of fileThis import statement should be moved to the top of the file with the other imports rather than inside the function body.
- from ddpui.ddpairbyte.airbyte_service import get_source_schema_catalogAdd this to the imports at the top of the file (around line 44):
+ get_source_schema_catalog,
416-454: Consider implementing exponential backoff for sync status pollingThe current implementation uses fixed interval polling which works but may not be optimal for varying job durations. Consider implementing exponential backoff to reduce API calls while improving responsiveness for short-running jobs.
- while (status not in ["succeeded", "failed", "cancelled"] and - wait_cycles < AIRBYTE_MAX_WAIT_CYCLES and - (time.time() - start_time) < AIRBYTE_TEST_TIMEOUT): - time.sleep(AIRBYTE_SYNC_WAIT_INTERVAL) + wait_interval = AIRBYTE_SYNC_WAIT_INTERVAL + while (status not in ["succeeded", "failed", "cancelled"] and + wait_cycles < AIRBYTE_MAX_WAIT_CYCLES and + (time.time() - start_time) < AIRBYTE_TEST_TIMEOUT): + time.sleep(wait_interval) + # Apply exponential backoff with a cap + wait_interval = min(wait_interval * 1.5, 60)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ddpui/tests/integration_tests/test_airbyte_full_integration.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
ddpui/tests/integration_tests/test_airbyte_full_integration.py (3)
ddpui/ddpairbyte/airbyte_service.py (12)
create_workspace(156-168)delete_workspace(171-174)create_destination(588-611)create_source(331-366)get_source_definitions(199-221)get_destination_definitions(501-510)sync_connection(920-929)get_jobs_for_connection(950-976)get_job_info(932-938)get_logs_for_job(1009-1028)delete_destination(580-585)get_source_schema_catalog(459-498)ddpui/ddpairbyte/schema.py (1)
AirbyteConnectionCreate(59-66)ddpui/utils/custom_logger.py (3)
warning(49-53)info(25-29)error(31-35)
🔇 Additional comments (6)
ddpui/tests/integration_tests/test_airbyte_full_integration.py (6)
154-184: Well-implemented server availability check with version compatibilityThe
check_airbyte_available()function intelligently handles both older and newer Airbyte API versions by trying endpoints with and without trailing slashes. This future-proofs the tests and makes them more robust.
199-226: Good error handling and resource management in setupThe
setup_classimplementation properly:
- Checks Airbyte availability before attempting setup
- Creates a unique workspace name with timestamp
- Implements proper error handling with graceful skipping
- Attempts cleanup when unexpected errors occur
This ensures tests don't fail with cryptic errors when prerequisites aren't met.
227-264: Comprehensive teardown with robust error handlingThe teardown process properly cleans up all resources in the correct order (connections, sources, destinations, workspace) and uses
contextlib.suppresswith detailed warning logs to continue cleanup even if individual deletions fail.
368-415: Well-structured connection creation with proper API payload formattingThe connection creation uses the proper Airbyte API structure for stream configuration, correctly setting up sync modes. This implementation resolves a previous issue where stream configurations were improperly structured.
54-73: Good environment variable handling with validationThe configuration section properly:
- Parses boolean flags with lowercasing
- Strips whitespace from lists for robustness
- Validates and adjusts wait cycles against timeout
- Provides clear logging when adjustments are made
This ensures configuration errors are caught early with helpful feedback.
455-484: Fixed get_logs_for_job parameter usageThe implementation correctly uses index 0 for the attempt number parameter to
get_logs_for_job(), fixing a previous issue where an attempt ID was incorrectly passed.
|
@Ishankoradia kindly review and let me know. |
|
@fatchat @Ishankoradia reminder! |
|
Hi @SarthakB11 this looks like a solid first attempt. Have you been able to run the test suite ? Could you share your findings. |
|
@Ishankoradia are we merging or closing |
|
@Ishankoradia hi, could you guide me in running the test suite? That would be helpful! |
Overview
Implementation of comprehensive Airbyte integration test suite (Issue #1014) that enables end-to-end testing of Airbyte operations within Dalgo, providing confidence during upgrades and maintenance.
Changes Made i have made
Files Added/Modified:
ddpui/tests/integration_tests/test_airbyte_full_integration.py(new).env.airbyte_integration_test(new)README.md(modified)Key Features
Testing Instructions
Local testing:
Production testing:
.envAIRBYTE_TEST_ENABLED=trueIndividual tests:
The test suite is ready for CI/CD integration or manual runs during Airbyte upgrades.
Closes #1014
Summary by CodeRabbit
New Features
Documentation