-
Notifications
You must be signed in to change notification settings - Fork 3.3k
feat(docs): Adding models + APIs for context base V1 #15191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(docs): Adding models + APIs for context base V1 #15191
Conversation
abedatahub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to land; a few minor comments.
| /** | ||
| * Information about the external source of this document. | ||
| * Only populated for third-party documents ingested from external systems. | ||
| * If null, the document is first-party (created directly in DataHub). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the convention then you can remove the sourceType field in DocumentSource
|
|
||
| /** | ||
| * Returns true if the current user is able to create Knowledge Articles. This is true if the user | ||
| * has the 'Create Entity' privilege for Knowledge Articles or 'Manage Knowledge Articles' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there plans for fine grained (per-document view, create-child, edit) privileges?
|
|
||
| return true; | ||
| } catch (Exception e) { | ||
| log.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to log here since the caller can log the exception which has all the needed debugging info.
| implements DataFetcher<CompletableFuture<List<DocumentChange>>> { | ||
|
|
||
| private final TimelineService _timelineService; | ||
| private static final long DEFAULT_LOOKBACK_MILLIS = 30L * 24 * 60 * 60 * 1000; // 30 days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimeUnit.DAYS.toMillis(30)
| long endTime = endTimeMillis != null ? endTimeMillis : System.currentTimeMillis(); | ||
| long startTime = | ||
| startTimeMillis != null ? startTimeMillis : (endTime - DEFAULT_LOOKBACK_MILLIS); | ||
| int maxResults = limit != null ? limit : DEFAULT_LIMIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use the java.time APIs to do time math.
| // Batch ingest all proposals | ||
| entityClient.batchIngestProposals(opContext, mcps, false); | ||
|
|
||
| log.info("Updated contents for document {}", documentUrn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more like a debug log.
| log.error( | ||
| "Failed to clear entity references for Document with URN {}: {}", | ||
| documentUrn, | ||
| e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want the stacktrace?
| } | ||
| } catch (Exception e) { | ||
| // If we can't get parent info, assume no cycle for safety | ||
| log.warn("Failed to check parent info for {}: {}", currentParent, e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log the exception object
| infoProposal.setAspect(GenericRecordUtils.serializeAspect(publishedInfo)); | ||
| entityClient.ingestProposal(opContext, infoProposal, false); | ||
|
|
||
| log.info("Merged draft {} into published document {}", draftUrn, publishedUrn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and others look like they should be debug level. And we can turn up log levels for specific modules at runtime
Introducing Documents in DataHub (Context)
This PR introduces a new Document entity to DataHub, enabling users to create, manage, and organize first-party knowledge base content directly within the platform. Documents can be hierarchically organized, linked to data assets, and managed through a complete lifecycle including draft/publish workflows.
Core Data Models
Introduces comprehensive metadata models for the Document entity in DataHub:
Entity Definition
documententity with key aspectdocumentKeyand search capabilitiesCore Aspects (PDL Models)
DocumentKey- Unique identifier for documentsDocumentInfo- Primary aspect containing:draftOffieldDocumentContents- Text content storageDocumentStatus&DocumentState- Publication state managementDocumentSource- Tracking external sources for third-party integrationsParentDocument,RelatedAsset,RelatedDocument- Relationship modelsDraftOf- Draft-to-published document linkingGraphQL APIs
Comprehensive GraphQL API surface in
knowledge.graphql:Mutations
createDocument- Create new documents with content, relationships, and hierarchyupdateDocumentContents- Update document text and titleupdateDocumentRelatedEntities- Manage relationships to assets and other documentsmoveDocument- Relocate documents within the hierarchydeleteDocument- Remove documents and their referencesupdateDocumentStatus- Toggle between PUBLISHED/UNPUBLISHED statesmergeDraft- Merge draft content into published document with optional draft deletionQueries
document(urn)- Fetch document by URN with full metadatasearchDocuments- Hybrid semantic search with rich filtering:Special Features
draftsfield - Lists all draft versions of a published documentchangeHistoryfield - Chronological audit log of document modifications with support for: Content changes, Parent changes (moves), Relationship changes, State changes, etc.Authorization & Privileges
New Platform Privilege
MANAGE_DOCUMENTS- Platform-level privilege for managing all documentsEntity-Level Privileges
Documents support standard DataHub entity privileges:
VIEW_ENTITY_PAGE/GET_ENTITY- View documentEDIT_ENTITY_DOCS/EDIT_ENTITY- Edit document contentCREATE_ENTITY- Create documentsEDIT_ENTITY_OWNERS- Manage ownershipEDIT_ENTITY_DOMAINS- Assign domainsSHARE_ENTITY- Share documentsEDIT_ENTITY_PROPERTIES- Edit structured propertiesAuthorization Logic
canCreateDocument()- RequiresCREATE_ENTITYfor documents orMANAGE_DOCUMENTScanEditDocument()- RequiresEDIT_ENTITY_DOCS,EDIT_ENTITY, orMANAGE_DOCUMENTScanGetDocument()- RequiresVIEW_ENTITY_PAGEorMANAGE_DOCUMENTScanDeleteDocument()- Requires delete authorization orMANAGE_DOCUMENTSBackend Services
DocumentService
Complete service layer implementation in
metadata-service/services:Timeline Support
DocumentInfoChangeEventGenerator- Generates change events for audit historyFactory Beans
DocumentServiceFactory- Spring factory for service instantiationTest Coverage
Smoke Tests
document_test.py(410 lines) - End-to-end document lifecycle testsdocument_draft_test.py(326 lines) - Draft creation, merging, and workflowsdocument_change_history_test.py(281 lines) - Timeline and change trackingUnit Tests
DocumentServiceTest.java(486 lines) - Service layer business logicDocumentMapperTest.java- Type mapping validationDocumentInfoChangeEventGeneratorTest.java- Timeline event generationKey Features & Use Cases
This PR lays the foundation for DataHub to become a central knowledge hub, combining first-party documentation with data asset management in a unified platform.
Coming in a followup PR:
Status
Ready for review.