Skip to content

Request for symmetries #15

@abahm

Description

@abahm

This is a great library and I'm already using it lots. I would like to evaluate positions of a protein's atoms with the full expression of symmetry in the complete unit cell or entire crystal. I can't seem to find it in the documentation for this library, nor in the code.

Expected Behavior

I would like to see documented examples of how to retrieve that atoms positions along with all the symmetries of those positions.

Current Behavior

I've written my own basic parser (below), but it only works for PDB files. Molecules in the PDB can often be dowloaded in either PDB format or MMCIF format, however some appear to only be downloadable as mmCIF, e.g. "6PEM". In these cases, downloadpdb throws an exception unless the MMCIF format is specified.

julia> BioStructures.downloadpdb("6PEM")   # no .pdb file available!
[ Info: Downloading PDB: 6PEM
┌ Error: Download failed: curl: (22) The requested URL returned error: 404 Not Found
└ @ Base download.jl:43
ERROR: failed process: Process(`/usr/bin/curl -s -S -g -L -f -o /var/folders/ry/wg4yqb1j5lz3lcyzyls96k2r0000gn/T/jl_32sXTz http://files.rcsb.org/download/6PEM.pdb.gz`, ProcessExited(22)) [22]
julia> BioStructures.downloadpdb("6PEM", file_format=MMCIF)   # works!

Possible Solution / Implementation

Perhaps two functions be added to the library. One function that can inform the user which filetypes a protein can be downloaded in, and another function that allows the user to read symmetries regardless of the file type. However there may be more natural solutions that integrate better with the BioStructures.Model (e.g. adding a field for parsed symmetries). I would be happy with anything that supported both formats and allowed me to ultimately get all the atoms in the unit cell or entire crystal.

By way of example of the second function, the code that I implemented as a workaround follows. It assumes that you have already downloaded a PDB file to a pdb_cache_directory, and it only works with .pdb format files.

function read_symmetries(protein_name)
    # get the symmetries of the complex - rotation matrix and translation vector
    # they are in the form:
    #                     index         rotation M               translation
    # REMARK 350   BIOMT1   1  1.000000  0.000000  0.000000        0.00000
    # REMARK 350   BIOMT2   1  0.000000  1.000000  0.000000        0.00000
    # REMARK 350   BIOMT3   1  0.000000  0.000000  1.000000        0.00000
    #
    full_protien_name = protein_name * ".pdb"
    fn = joinpath(pdb_cache_dir, full_protien_name)
    lines = readlines(fn)
    remark = "REMARK 350   BIOMT"
    lines_defining_symmetry = filter(x->startswith(x,remark), lines)
    @assert (length(lines_defining_symmetry) % 3 == 0) "odd number of lines in symmetry section"
    nsymmetries = round(Int64, length(lines_defining_symmetry)/3)
    rotation_matries = Array{Float64,2}[]
    translation_vectors = Array{Float64,1}[]
    for i in 1:nsymmetries
        idx = (i-1)*3
        s1 = split(lines_defining_symmetry[idx+1])
        s2 = split(lines_defining_symmetry[idx+2])
        s3 = split(lines_defining_symmetry[idx+3])
        rotation_matrix = [ parse(Float64, s1[5]) parse(Float64, s1[6]) parse(Float64, s1[7]);
                            parse(Float64, s2[5]) parse(Float64, s2[6]) parse(Float64, s2[7]);
                            parse(Float64, s3[5]) parse(Float64, s3[6]) parse(Float64, s3[7]) ]
        translation_vector = [ parse(Float64, s1[8]), parse(Float64, s2[8]), parse(Float64, s3[8]) ]
        push!(rotation_matries, rotation_matrix)
        push!(translation_vectors, translation_vector)
    end
    rotation_matries, translation_vectors
end

Your Environment

  • Package Version used: v0.6.0
  • Julia Version used: v1.3.1
  • Operating System and version (desktop or mobile): macOS Version 10.15.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions