Skip to content

Error loading *.pdb files generated by OpenBabel or AtomsBase.jl-compatible packages #75

@hellmrf

Description

@hellmrf

I'm getting an error while reading basically any file generated by OpenBabel.

ERROR: Two copies of the same atom have the same alternative location ID.

The same error is also thrown for some *.pdb files generated by Chemfiles.jl, PDBTools.jl or AtomsIO.jl (AtomsBase.jl-compatible).

Open Babel

Let's use obabel to generate some files:

obabel -:"CC(=O)OC1=CC=CC=C1C(O)=O" -O aspirin.pdb --gen3d
obabel -:"CC(=O)OC1=CC=CC=C1C(O)=O" -O aspirin.mmcif --gen3d
obabel -:"CC(=O)NC1=CC=C(C=C1)O" -O paracetamol.pdb --gen3d
obabel *.pdb -O multi.pdb

And now we have aspirin.pdb, aspirin.mmcif, paracetamol.pdb, and multi.pdb (or download.zip).

Chemfiles.jl

For example, with Chemfiles.jl, we can load them:

julia> using Chemfiles

julia> trajectory = Trajectory("multi.pdb")
Trajectory(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_TRAJECTORY}(Ptr{Chemfiles.lib.CHFL_TRAJECTORY} @0x0000000009294110, false), nothing)

julia> trajectory |> length |> Int
2

julia> frame = read(trajectory)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x0000000002417e60, false))

julia> positions(frame)
3×21 Chemfiles.ChemfilesArray:
 1.071  2.533  2.977   3.273   4.679  5.417  6.803   7.457   6.728   5.335   4.64    3.521   5.085  0.494   0.787  0.852  4.928  7.37    8.536   7.252   3.312
 0.277  0.556  1.556  -0.507  -0.5    0.658  0.652  -0.51   -1.679  -1.69   -2.983  -3.163  -3.843  1.136  -0.601  0.128  1.576  1.563  -0.507  -2.585  -2.419
 1.264  1.454  2.0     0.924   1.053  0.783  0.939   1.343   1.57    1.409   1.648   0.915   2.389  1.62    1.849  0.206  0.47   0.761   1.48    1.873   0.319

julia> f0 = read_step(trajectory, 0)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x0000000009cde290, false))

julia> f1 = read_step(trajectory, 1)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x0000000009c23c10, false))

julia> positions(f0)
3×21 Chemfiles.ChemfilesArray:
 1.071  2.533  2.977   3.273   4.679  5.417  6.803   7.457   6.728   5.335   4.64    3.521   5.085  0.494   0.787  0.852  4.928  7.37    8.536   7.252   3.312
 0.277  0.556  1.556  -0.507  -0.5    0.658  0.652  -0.51   -1.679  -1.69   -2.983  -3.163  -3.843  1.136  -0.601  0.128  1.576  1.563  -0.507  -2.585  -2.419
 1.264  1.454  2.0     0.924   1.053  0.783  0.939   1.343   1.57    1.409   1.648   0.915   2.389  1.62    1.849  0.206  0.47   0.761   1.48    1.873   0.319

julia> positions(f1)
3×20 Chemfiles.ChemfilesArray:
  1.078  2.546  2.933   3.328   4.731   5.51    6.902   7.515   6.757   5.364   8.87   0.545   0.684   0.914   2.864  5.061  7.48    7.246   4.79    9.26
 -0.099  0.135  1.177  -0.932  -1.036  -0.175  -0.304  -1.3    -2.184  -2.059  -1.44   0.855  -0.775  -0.523  -1.669  0.606  0.383  -2.962  -2.754  -0.721
  0.096  0.344  0.857  -0.067   0.04    0.816   0.848   0.099  -0.656  -0.678   0.077  0.152   0.859  -0.899  -0.582  1.422  1.457  -1.236  -1.285   0.601

julia> positions(frame) == positions(f0)
true

PDBTools.jl

With PDBTools.jl (which is AtomsBase.jl/AtomsIO.jl-compatible), everything runs smooth:

julia> using PDBTools

julia> asp = read_pdb("aspirin.pdb")
   Vector{Atom{Nothing}} with 21 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    C     UNL     X        1        1    1.071    0.277    1.264  1.00  0.00     1       X         1
       2    C     UNL     X        1        1    2.533    0.556    1.454  1.00  0.00     1       X         2
       3    O     UNL     X        1        1    2.977    1.556    2.000  1.00  0.00     1       X         3
       4    O     UNL     X        1        1    3.273   -0.507    0.924  1.00  0.00     1       X         4
       5    C     UNL     X        1        1    4.679   -0.500    1.053  1.00  0.00     1       X         5
       6    C     UNL     X        1        1    5.417    0.658    0.783  1.00  0.00     1       X         6
       7    C     UNL     X        1        1    6.803    0.652    0.939  1.00  0.00     1       X         7
       8    C     UNL     X        1        1    7.457   -0.510    1.343  1.00  0.00     1       X         8

      14    H     UNL     X        1        1    0.494    1.136    1.620  1.00  0.00     1       X        14
      15    H     UNL     X        1        1    0.787   -0.601    1.849  1.00  0.00     1       X        15
      16    H     UNL     X        1        1    0.852    0.128    0.206  1.00  0.00     1       X        16
      17    H     UNL     X        1        1    4.928    1.576    0.470  1.00  0.00     1       X        17
      18    H     UNL     X        1        1    7.370    1.563    0.761  1.00  0.00     1       X        18
      19    H     UNL     X        1        1    8.536   -0.507    1.480  1.00  0.00     1       X        19
      20    H     UNL     X        1        1    7.252   -2.585    1.873  1.00  0.00     1       X        20
      21    H     UNL     X        1        1    3.312   -2.419    0.319  1.00  0.00     1       X        21

julia> traj = read_pdb("multi.pdb")
   Vector{Atom{Nothing}} with 41 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    C     UNL     X        1        1    1.071    0.277    1.264  1.00  0.00     1       X         1
       2    C     UNL     X        1        1    2.533    0.556    1.454  1.00  0.00     1       X         2
       3    O     UNL     X        1        1    2.977    1.556    2.000  1.00  0.00     1       X         3
       4    O     UNL     X        1        1    3.273   -0.507    0.924  1.00  0.00     1       X         4
       5    C     UNL     X        1        1    4.679   -0.500    1.053  1.00  0.00     1       X         5
       6    C     UNL     X        1        1    5.417    0.658    0.783  1.00  0.00     1       X         6
       7    C     UNL     X        1        1    6.803    0.652    0.939  1.00  0.00     1       X         7
       8    C     UNL     X        1        1    7.457   -0.510    1.343  1.00  0.00     1       X         8

      35    H     UNL     X        1        2    0.914   -0.523   -0.899  1.00  0.00     2       X        14
      36    H     UNL     X        1        2    2.864   -1.669   -0.582  1.00  0.00     2       X        15
      37    H     UNL     X        1        2    5.061    0.606    1.422  1.00  0.00     2       X        16
      38    H     UNL     X        1        2    7.480    0.383    1.457  1.00  0.00     2       X        17
      39    H     UNL     X        1        2    7.246   -2.962   -1.236  1.00  0.00     2       X        18
      40    H     UNL     X        1        2    4.790   -2.754   -1.285  1.00  0.00     2       X        19
      41    H     UNL     X        1        2    9.260   -0.721    0.601  1.00  0.00     2       X        20

julia> read_pdb("multi.pdb", "residue 1")
   Vector{Atom{Nothing}} with 21 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    C     UNL     X        1        1    1.071    0.277    1.264  1.00  0.00     1       X         1
       2    C     UNL     X        1        1    2.533    0.556    1.454  1.00  0.00     1       X         2
       3    O     UNL     X        1        1    2.977    1.556    2.000  1.00  0.00     1       X         3
       4    O     UNL     X        1        1    3.273   -0.507    0.924  1.00  0.00     1       X         4
       5    C     UNL     X        1        1    4.679   -0.500    1.053  1.00  0.00     1       X         5
       6    C     UNL     X        1        1    5.417    0.658    0.783  1.00  0.00     1       X         6
       7    C     UNL     X        1        1    6.803    0.652    0.939  1.00  0.00     1       X         7
       8    C     UNL     X        1        1    7.457   -0.510    1.343  1.00  0.00     1       X         8
       9    C     UNL     X        1        1    6.728   -1.679    1.570  1.00  0.00     1       X         9
      10    C     UNL     X        1        1    5.335   -1.690    1.409  1.00  0.00     1       X        10
      11    C     UNL     X        1        1    4.640   -2.983    1.648  1.00  0.00     1       X        11
      12    O     UNL     X        1        1    3.521   -3.163    0.915  1.00  0.00     1       X        12
      13    O     UNL     X        1        1    5.085   -3.843    2.389  1.00  0.00     1       X        13
      14    H     UNL     X        1        1    0.494    1.136    1.620  1.00  0.00     1       X        14
      15    H     UNL     X        1        1    0.787   -0.601    1.849  1.00  0.00     1       X        15
      16    H     UNL     X        1        1    0.852    0.128    0.206  1.00  0.00     1       X        16
      17    H     UNL     X        1        1    4.928    1.576    0.470  1.00  0.00     1       X        17
      18    H     UNL     X        1        1    7.370    1.563    0.761  1.00  0.00     1       X        18
      19    H     UNL     X        1        1    8.536   -0.507    1.480  1.00  0.00     1       X        19
      20    H     UNL     X        1        1    7.252   -2.585    1.873  1.00  0.00     1       X        20
      21    H     UNL     X        1        1    3.312   -2.419    0.319  1.00  0.00     1       X        21

julia> read_pdb("multi.pdb", "residue 2")
   Vector{Atom{Nothing}} with 20 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
      22    C     UNL     X        1        2    1.078   -0.099    0.096  1.00  0.00     2       X         1
      23    C     UNL     X        1        2    2.546    0.135    0.344  1.00  0.00     2       X         2
      24    O     UNL     X        1        2    2.933    1.177    0.857  1.00  0.00     2       X         3
      25    N     UNL     X        1        2    3.328   -0.932   -0.067  1.00  0.00     2       X         4
      26    C     UNL     X        1        2    4.731   -1.036    0.040  1.00  0.00     2       X         5
      27    C     UNL     X        1        2    5.510   -0.175    0.816  1.00  0.00     2       X         6
      28    C     UNL     X        1        2    6.902   -0.304    0.848  1.00  0.00     2       X         7
      29    C     UNL     X        1        2    7.515   -1.300    0.099  1.00  0.00     2       X         8
      30    C     UNL     X        1        2    6.757   -2.184   -0.656  1.00  0.00     2       X         9
      31    C     UNL     X        1        2    5.364   -2.059   -0.678  1.00  0.00     2       X        10
      32    O     UNL     X        1        2    8.870   -1.440    0.077  1.00  0.00     2       X        11
      33    H     UNL     X        1        2    0.545    0.855    0.152  1.00  0.00     2       X        12
      34    H     UNL     X        1        2    0.684   -0.775    0.859  1.00  0.00     2       X        13
      35    H     UNL     X        1        2    0.914   -0.523   -0.899  1.00  0.00     2       X        14
      36    H     UNL     X        1        2    2.864   -1.669   -0.582  1.00  0.00     2       X        15
      37    H     UNL     X        1        2    5.061    0.606    1.422  1.00  0.00     2       X        16
      38    H     UNL     X        1        2    7.480    0.383    1.457  1.00  0.00     2       X        17
      39    H     UNL     X        1        2    7.246   -2.962   -1.236  1.00  0.00     2       X        18
      40    H     UNL     X        1        2    4.790   -2.754   -1.285  1.00  0.00     2       X        19
      41    H     UNL     X        1        2    9.260   -0.721    0.601  1.00  0.00     2       X        20

BioStructures.jl

But with BioStructures.jl and PDBFormat, we get some errors:

julia> using BioStructures

julia> struc = read("aspirin.pdb", PDBFormat)
ERROR: Two copies of the same atom have the same alternative location ID. Existing atom:
Atom C with serial 1, coordinates [1.071, 0.277, 1.264]
New atom record to add:
AtomRecord C with serial 2, coordinates [2.533, 0.556, 1.454]
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] unsafe_addatomtomodel!(mo::Model, atom_rec::AtomRecord; remove_disorder::Bool)
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/model.jl:1652
  [3] unsafe_addatomtomodel!
    @ ~/.julia/packages/BioStructures/ZBun0/src/model.jl:1559 [inlined]
  [4] read(input::IOStream, ::Type{…}; structure_name::String, remove_disorder::Bool, read_std_atoms::Bool, read_het_atoms::Bool, run_dssp::Bool, run_stride::Bool)
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:66
  [5] read
    @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:47 [inlined]
  [6] (::BioStructures.var"#70#71"{String, @Kwargs{}, DataType})(input::IOStream)
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:113
  [7] open(f::BioStructures.var"#70#71"{String, @Kwargs{}, DataType}, args::String; kwargs::@Kwargs{})
    @ Base ./io.jl:410
  [8] open
    @ ./io.jl:407 [inlined]
  [9] #read#69
    @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:112 [inlined]
 [10] read(filepath::String, t::Type{PDBFormat})
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:108
 [11] top-level scope
    @ REPL[7]:1
Some type information was truncated. Use `show(err)` to see complete types.

(Same behavior for paracetamol.pdb or multi.pdb).

Same for MMCIFDict (now with aspirin.mmcif):

julia> struc = "aspirin.mmcif" |> MMCIFDict |> MolecularStructure
ERROR: KeyError: key "_atom_site.group_PDB" not found
Stacktrace:
 [1] getindex
   @ ./dict.jl:477 [inlined]
 [2] getindex
   @ ~/.julia/packages/BioStructures/ZBun0/src/mmcif.jl:84 [inlined]
 [3]
   @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/mmcif.jl:332
 [4] MolecularStructure
   @ ~/.julia/packages/BioStructures/ZBun0/src/mmcif.jl:321 [inlined]
 [5] |>(x::MMCIFDict, f::Type{MolecularStructure})
   @ Base ./operators.jl:926
 [6] top-level scope
   @ REPL[9]:1
Some type information was truncated. Use `show(err)` to see complete types.

Non-Open Babel

Maybe the problem is with Open Babel, right? So let's download Aspirin's SDF from PubChem and use Chemfiles.jl to convert it to PDB.

julia> using Downloads; Downloads.download("https://pubchem.ncbi.nlm.nih.gov/rest/pug/conformers/000008C400000001/SDF?response_type=save&response_basename=2244", "./2244.sdf")
"./2244.sdf"

julia> using Chemfiles

julia> traj = Trajectory("2244.sdf")
Trajectory(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_TRAJECTORY}(Ptr{Chemfiles.lib.CHFL_TRAJECTORY} @0x0000000002315a30, false), nothing)

julia> frame = read(traj)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x00000000030c9fd0, false))

julia> Trajectory("chfl.pdb", 'w') do traj
           write(traj, frame)
       end

julia> using BioStructures

julia> struc = read("chfl.pdb", PDBFormat)
ERROR: PDBParseError: could not read model serial number at line 1 of file:
MODEL    1
Stacktrace:
 [1] read(input::IOStream, ::Type{…}; structure_name::String, remove_disorder::Bool, read_std_atoms::Bool, read_het_atoms::Bool, run_dssp::Bool, run_stride::Bool)
   @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:74
 [2] read
   @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:47 [inlined]
 [3] (::BioStructures.var"#70#71"{String, @Kwargs{}, DataType})(input::IOStream)
   @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:113
 [4] open(f::BioStructures.var"#70#71"{String, @Kwargs{}, DataType}, args::String; kwargs::@Kwargs{})
   @ Base ./io.jl:410
 [5] open
   @ ./io.jl:407 [inlined]
 [6] #read#69
   @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:112 [inlined]
 [7] read(filepath::String, t::Type{PDBFormat})
   @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:108
 [8] top-level scope
   @ REPL[9]:1

caused by: ArgumentError: input string is empty or only contains whitespace
Stacktrace:
  [1] tryparse_internal(::Type{Int64}, s::String, startpos::Int64, endpos::Int64, base_::Int64, raise::Bool)
    @ Base ./parse.jl:115
  [2] #parse#619
    @ ./parse.jl:254 [inlined]
  [3] parse
    @ ./parse.jl:253 [inlined]
  [4] read(input::IOStream, ::Type{…}; structure_name::String, remove_disorder::Bool, read_std_atoms::Bool, read_het_atoms::Bool, run_dssp::Bool, run_stride::Bool)
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:72
  [5] read
    @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:47 [inlined]
  [6] (::BioStructures.var"#70#71"{String, @Kwargs{}, DataType})(input::IOStream)
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:113
  [7] open(f::BioStructures.var"#70#71"{String, @Kwargs{}, DataType}, args::String; kwargs::@Kwargs{})
    @ Base ./io.jl:410
  [8] open
    @ ./io.jl:407 [inlined]
  [9] #read#69
    @ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:112 [inlined]
 [10] read(filepath::String, t::Type{PDBFormat})
    @ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:108
 [11] top-level scope
    @ REPL[9]:1
Some type information was truncated. Use `show(err)` to see complete types.

Finals

Also, this seem to be related to #48.

download.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions