-
Notifications
You must be signed in to change notification settings - Fork 0
basic files API #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,8 +99,8 @@ EX: | |
| "id": 997, | ||
| "acronym": "root", | ||
| "name": "root", | ||
| "color_hex_triplet": "FFFFFF", # <- if this is used? | ||
| "parent_structure_id": null, # | ||
| "color_hex_triplet": "FFFFFF", | ||
| "parent_structure_id": null, # shouldn't be necessary, might be removed | ||
| "children": [ | ||
| { "id": 997, "acronym": "root", "name": "root", ..., "children": []}, | ||
| ... | ||
|
|
@@ -115,17 +115,70 @@ GET /brain-regions/{id}: could do this, but not sure it's useful: the full | |
| Future work would include using ltree from postgresql to make doing lookups and such easier: https://www.postgresql.org/docs/current/ltree.html | ||
|
|
||
|
|
||
| To be looked at more: | ||
| ``` | ||
| files/ | ||
| experimental-data/_count | ||
| model-data/_count | ||
| ``` | ||
|
|
||
| # Authorization: | ||
| Current model is to have things be either public, or private to a lab/project. | ||
| As such, results returned will be gated by this, based on the logged in user. | ||
| The frontend will have to supply the current user's Bearer token, as well as the current lab and project. | ||
| The service will check that the user does indeed belong to this lab and project, and then filter the results to include only public ones, along with those in the lab and project. | ||
| These will have to be passed as headers in the request. | ||
| The service will check that the user does indeed belong to this lab and project, and then filter the results to include only public ones, along with those in the lab and project. | ||
|
|
||
|
|
||
| # Files: | ||
|
|
||
| For files, there are two main use cases: | ||
| 1) Downloading them via a web API | ||
| 2) Mounting them in some fashion on the cluster | ||
|
|
||
| The former is likely to be smaller files (such as morphologies, ephys traces, images and reports \[ie: pdf\], etc). | ||
| The later is things like circuit data, which can be large, and may not include only individual files, rather a directory. | ||
|
|
||
| ## Semantics | ||
|
|
||
| * An entity (eg: morphology), can be associated with multiple files | ||
| * A file is associated with only one Entity, this also means that a `Folder` owns all the contents, and no other Entity points to anything within the folder | ||
| * A file has a type associated with it; at first approximation these are MIME types, however there are many vendor specific ones that will have to be handled | ||
| * Once registered, a file is immutable; it cannot be changed or replaced | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this might turn into the classic problem of people versioning things by attaching multiple files to an entity, and naming them things like
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can always opt to force naming in our side smthg like
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps, I'm not sure yet how much we should be limiting what files should be associated with an entity, or if we need to have a schema for them, too. I'm not sure we know all the use cases yet, other than "attach some files". That being said, I can already see the issue with immutability, and people doing "revision" management through file names, which would be a big problem, imo, since then it's impossible to know, without human effort, which file is "right".
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I am ok with having versions at the file level and not at the entity level.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for immutability; how do we deal w/ deletion? what if people want to delete a large file? we also have considered the deprecation of entities, along with how that would impact files.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would probably keep the asset entry but remove the underlying file. A flag on the asset can say that it is removed. |
||
| * A file inherits its Authorization from the entity to which it is associated | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know where to put that comment. I understand an entity can have several assets. The asset pointing to the entity. However, we don't put any information about the relationship between the asset and the entity.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another question. Say a morphology has an asset "neurolucida" file. Later, I create a "swc" that derives from the neurolucida file. If I put the swc as another asset of that same morphology, I have a problem: The current knowledge graph is not addressing any of these 2 issues.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May be what that means is that the provenance model should consider assets and not entity. ?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| ## API | ||
| * When getting an entity, the `?expand-files=True` query parameter can be used. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. minor suggestion: using |
||
| This will include add the following payload to the request: | ||
| ``` | ||
| {"filename.ext": { | ||
| "url": | ||
| "type": "aMimetype/like.string", | ||
| "size": 314159, # in bytes | ||
| "sha1": "c7102fb8700511782990db9ac149b28cb76c79f0", | ||
| }, | ||
| "filename.swc": { | ||
| "url": | ||
| "type": "application/swc", # or should it be application/octet-stream or `application/vnd.swc` | ||
| "size": 271828, # in bytes | ||
| "sha1": "571a7182818284590450db9ac718281828459045", | ||
| } | ||
| "foldername": { | ||
| "url": "/some/path/on/cluster", | ||
| "type": "application/x-directory", | ||
| "size": 0, # Not applicable | ||
| "sha1": "0", # Not applicable | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| * The `url` supplied with the result can be used to download the file | ||
| * For folder types there no way to list or download the contents | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suspect this will have to change; but for now it's the easiest to implement
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMHO: s3 url should be private for sec reasons, and the url that we send in the response should be constructed by us as (
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, I left the url unspecified as that's an implementation detail; it will likely be an endpoint that entitycore offers, or perhaps its nginx proxy; we will have to see |
||
|
|
||
| ## Staging | ||
| * On the cluster, a folder is automatically staged, as it's path should be accessible. | ||
| * For files; either the file is pulled down through the API, via the `url` property, or there will need to be a way to map | ||
|
|
||
| ## Configurations | ||
|
|
||
| The frontend currently stores `configuration` data (ex: `cell-composition`, `connectome-model`) in NEXUS. | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This system will have to be updated for |
||
|
|
||
|
|
||
| # To be looked at more: | ||
| ``` | ||
| experimental-data/_count | ||
| model-data/_count | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we support versioning of an entity, and we change only the metadata, then different versions of the same entity will point to the same file or directory (unless the files are copied, but it's not optimal for storage costs and management).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that cries out for some extra layer of indirection to do deduplication; with all the complexity that comes with.
The question then comes to; do we support versioning, how do we support it, and what are the semantics of it with respect to files; I think this is a can of worms, but definitely should be part of this document.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For comparison or inspiration this is the dandi openapi for versioning datasets: https://api.dandiarchive.org/swagger/
For reference my short notes about the overall service are at https://github.com/openbraininstitute/prod-contribute-fix-data/issues/7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we don't support versioning of entity but we support version of assets: