-
Notifications
You must be signed in to change notification settings - Fork 25
Description
This is a great library and I'm already using it lots. I would like to evaluate positions of a protein's atoms with the full expression of symmetry in the complete unit cell or entire crystal. I can't seem to find it in the documentation for this library, nor in the code.
Expected Behavior
I would like to see documented examples of how to retrieve that atoms positions along with all the symmetries of those positions.
Current Behavior
I've written my own basic parser (below), but it only works for PDB files. Molecules in the PDB can often be dowloaded in either PDB format or MMCIF format, however some appear to only be downloadable as mmCIF, e.g. "6PEM". In these cases, downloadpdb throws an exception unless the MMCIF format is specified.
julia> BioStructures.downloadpdb("6PEM") # no .pdb file available!
[ Info: Downloading PDB: 6PEM
┌ Error: Download failed: curl: (22) The requested URL returned error: 404 Not Found
└ @ Base download.jl:43
ERROR: failed process: Process(`/usr/bin/curl -s -S -g -L -f -o /var/folders/ry/wg4yqb1j5lz3lcyzyls96k2r0000gn/T/jl_32sXTz http://files.rcsb.org/download/6PEM.pdb.gz`, ProcessExited(22)) [22]
julia> BioStructures.downloadpdb("6PEM", file_format=MMCIF) # works!
Possible Solution / Implementation
Perhaps two functions be added to the library. One function that can inform the user which filetypes a protein can be downloaded in, and another function that allows the user to read symmetries regardless of the file type. However there may be more natural solutions that integrate better with the BioStructures.Model (e.g. adding a field for parsed symmetries). I would be happy with anything that supported both formats and allowed me to ultimately get all the atoms in the unit cell or entire crystal.
By way of example of the second function, the code that I implemented as a workaround follows. It assumes that you have already downloaded a PDB file to a pdb_cache_directory, and it only works with .pdb format files.
function read_symmetries(protein_name)
# get the symmetries of the complex - rotation matrix and translation vector
# they are in the form:
# index rotation M translation
# REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000
# REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000
# REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000
#
full_protien_name = protein_name * ".pdb"
fn = joinpath(pdb_cache_dir, full_protien_name)
lines = readlines(fn)
remark = "REMARK 350 BIOMT"
lines_defining_symmetry = filter(x->startswith(x,remark), lines)
@assert (length(lines_defining_symmetry) % 3 == 0) "odd number of lines in symmetry section"
nsymmetries = round(Int64, length(lines_defining_symmetry)/3)
rotation_matries = Array{Float64,2}[]
translation_vectors = Array{Float64,1}[]
for i in 1:nsymmetries
idx = (i-1)*3
s1 = split(lines_defining_symmetry[idx+1])
s2 = split(lines_defining_symmetry[idx+2])
s3 = split(lines_defining_symmetry[idx+3])
rotation_matrix = [ parse(Float64, s1[5]) parse(Float64, s1[6]) parse(Float64, s1[7]);
parse(Float64, s2[5]) parse(Float64, s2[6]) parse(Float64, s2[7]);
parse(Float64, s3[5]) parse(Float64, s3[6]) parse(Float64, s3[7]) ]
translation_vector = [ parse(Float64, s1[8]), parse(Float64, s2[8]), parse(Float64, s3[8]) ]
push!(rotation_matries, rotation_matrix)
push!(translation_vectors, translation_vector)
end
rotation_matries, translation_vectors
end
Your Environment
- Package Version used: v0.6.0
- Julia Version used: v1.3.1
- Operating System and version (desktop or mobile): macOS Version 10.15.2