-
Notifications
You must be signed in to change notification settings - Fork 0
basic files API #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| ``` | ||
|
|
||
| * The `url` supplied with the result can be used to download the file | ||
| * For folder types there no way to list or download the contents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this will have to change; but for now it's the easiest to implement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO: s3 url should be private for sec reasons, and the url that we send in the response should be constructed by us as (GET https://{root_url}/distributions/{id}/download)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I left the url unspecified as that's an implementation detail; it will likely be an endpoint that entitycore offers, or perhaps its nginx proxy; we will have to see
| * An entity (eg: morphology), can be associated with multiple files | ||
| * A file is associated with only one Entity, this also means that a `Folder` owns all the contents, and no other Entity points to anything within the folder | ||
| * A file has a type associated with it; at first approximation these are MIME types, however there are many vendor specific ones that will have to be handled | ||
| * Once registered, a file is immutable; it cannot be changed or replaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might turn into the classic problem of people versioning things by attaching multiple files to an entity, and naming them things like morphology-foo-final-final-final-v2.swc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can always opt to force naming in our side smthg like {base_name}-{version}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, I'm not sure yet how much we should be limiting what files should be associated with an entity, or if we need to have a schema for them, too. I'm not sure we know all the use cases yet, other than "attach some files".
That being said, I can already see the issue with immutability, and people doing "revision" management through file names, which would be a big problem, imo, since then it's impossible to know, without human effort, which file is "right".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am ok with having versions at the file level and not at the entity level.
| * An entity (eg: morphology), can be associated with multiple files | ||
| * A file is associated with only one Entity, this also means that a `Folder` owns all the contents, and no other Entity points to anything within the folder | ||
| * A file has a type associated with it; at first approximation these are MIME types, however there are many vendor specific ones that will have to be handled | ||
| * Once registered, a file is immutable; it cannot be changed or replaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for immutability; how do we deal w/ deletion? what if people want to delete a large file? we also have considered the deprecation of entities, along with how that would impact files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably keep the asset entry but remove the underlying file. A flag on the asset can say that it is removed.
|
|
||
| ## Configurations | ||
|
|
||
| The frontend currently stores `configuration` data (ex: `cell-composition`, `connectome-model`) in NEXUS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This system will have to be updated for entitycore.
I have discussed this before, but IMO a configuration should only be committed to entitycore if it it's used; rather than every intermediate saved version which is how it's being done at the moment, AFAIK.
However, this would mean that there is a another service/endpoint that has most of the same functionality as entitycore, but that allows for mutability. I'm not sure that's a better solution :(
| ``` | ||
|
|
||
| * The `url` supplied with the result can be used to download the file | ||
| * For folder types there no way to list or download the contents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO: s3 url should be private for sec reasons, and the url that we send in the response should be constructed by us as (GET https://{root_url}/distributions/{id}/download)
| * A file inherits its Authorization from the entity to which it is associated | ||
|
|
||
| ## API | ||
| * When getting an entity, the `?expand-files=True` query parameter can be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor suggestion: using expand as a list to allow adding different properties,
expand: ["files", "annotation", ...]
| * An entity (eg: morphology), can be associated with multiple files | ||
| * A file is associated with only one Entity, this also means that a `Folder` owns all the contents, and no other Entity points to anything within the folder | ||
| * A file has a type associated with it; at first approximation these are MIME types, however there are many vendor specific ones that will have to be handled | ||
| * Once registered, a file is immutable; it cannot be changed or replaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can always opt to force naming in our side smthg like {base_name}-{version}
| ## Semantics | ||
|
|
||
| * An entity (eg: morphology), can be associated with multiple files | ||
| * A file is associated with only one Entity, this also means that a `Folder` owns all the contents, and no other Entity points to anything within the folder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we support versioning of an entity, and we change only the metadata, then different versions of the same entity will point to the same file or directory (unless the files are copied, but it's not optimal for storage costs and management).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that cries out for some extra layer of indirection to do deduplication; with all the complexity that comes with.
The question then comes to; do we support versioning, how do we support it, and what are the semantics of it with respect to files; I think this is a can of worms, but definitely should be part of this document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For comparison or inspiration this is the dandi openapi for versioning datasets: https://api.dandiarchive.org/swagger/
For reference my short notes about the overall service are at https://github.com/openbraininstitute/prod-contribute-fix-data/issues/7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we don't support versioning of entity but we support version of assets:
- entity properties can change
- asset are kept and point to the entity, with a version identifier.
- assets have a "is_version" relationship between them
- we support only incr. versions. 0->1->2->3...
- provenance links can point to asset
| * A file has a type associated with it; at first approximation these are MIME types, however there are many vendor specific ones that will have to be handled | ||
| * Once registered, a file is immutable; it cannot be changed or replaced | ||
| * A file inherits its Authorization from the entity to which it is associated | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know where to put that comment. I understand an entity can have several assets. The asset pointing to the entity. However, we don't put any information about the relationship between the asset and the entity.
Say I have a morphology and 3 assets which are images. One is XY plane view, one is YZ plane view,...
With the current model, I have 3 assets, mimetype being image pointing to the morphology. We are missing a "property/metadata" that says "xy_plane" or something that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question. Say a morphology has an asset "neurolucida" file. Later, I create a "swc" that derives from the neurolucida file. If I put the swc as another asset of that same morphology, I have a problem:
Currently, we consider derivation between entities. So I cannot put a derivation link between the neurolucida and the swc. This means there is no way to know that the "swc" file was generated by a particular activity using the neurolucida file.
The current knowledge graph is not addressing any of these 2 issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be what that means is that the provenance model should consider assets and not entity. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
type of relationship between the asset and the entity
Yes this is a limitation of the current KG and it usually requires to iterate over all the distributions until finding the expected content-type, but this strategy doesn't work if there are multiple files with the same content-type.
Implementation detail: In the new tables there is ametajsonb column that can be used to store arbitrary metadata.
It seems hard to define a relationships that works for any type of entity, but if we can formalize the possible types of relations, we could also add a dedicated attribute/column. -
relationships between assets in the same entity
I see the assets just as the artifacts of the entity, and I wouldn't expect relations between the artifacts.
For example, to create a morphology I could generate the ASC file locally, convert it to SWC, validate them locally, and only after that create the entity and upload the files altogether, and in this case there isn't any specific relation between them.
In general, things can become complicated if multiple files are used to generate multiple files (in the same entity), and we want to track all the relations.
No description provided.