-
Notifications
You must be signed in to change notification settings - Fork 25
Description
I'm getting an error while reading basically any file generated by OpenBabel.
ERROR: Two copies of the same atom have the same alternative location ID.
The same error is also thrown for some *.pdb files generated by Chemfiles.jl, PDBTools.jl or AtomsIO.jl (AtomsBase.jl-compatible).
Open Babel
Let's use obabel to generate some files:
obabel -:"CC(=O)OC1=CC=CC=C1C(O)=O" -O aspirin.pdb --gen3d
obabel -:"CC(=O)OC1=CC=CC=C1C(O)=O" -O aspirin.mmcif --gen3d
obabel -:"CC(=O)NC1=CC=C(C=C1)O" -O paracetamol.pdb --gen3d
obabel *.pdb -O multi.pdbAnd now we have aspirin.pdb, aspirin.mmcif, paracetamol.pdb, and multi.pdb (or download.zip).
Chemfiles.jl
For example, with Chemfiles.jl, we can load them:
julia> using Chemfiles
julia> trajectory = Trajectory("multi.pdb")
Trajectory(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_TRAJECTORY}(Ptr{Chemfiles.lib.CHFL_TRAJECTORY} @0x0000000009294110, false), nothing)
julia> trajectory |> length |> Int
2
julia> frame = read(trajectory)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x0000000002417e60, false))
julia> positions(frame)
3×21 Chemfiles.ChemfilesArray:
1.071 2.533 2.977 3.273 4.679 5.417 6.803 7.457 6.728 5.335 4.64 3.521 5.085 0.494 0.787 0.852 4.928 7.37 8.536 7.252 3.312
0.277 0.556 1.556 -0.507 -0.5 0.658 0.652 -0.51 -1.679 -1.69 -2.983 -3.163 -3.843 1.136 -0.601 0.128 1.576 1.563 -0.507 -2.585 -2.419
1.264 1.454 2.0 0.924 1.053 0.783 0.939 1.343 1.57 1.409 1.648 0.915 2.389 1.62 1.849 0.206 0.47 0.761 1.48 1.873 0.319
julia> f0 = read_step(trajectory, 0)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x0000000009cde290, false))
julia> f1 = read_step(trajectory, 1)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x0000000009c23c10, false))
julia> positions(f0)
3×21 Chemfiles.ChemfilesArray:
1.071 2.533 2.977 3.273 4.679 5.417 6.803 7.457 6.728 5.335 4.64 3.521 5.085 0.494 0.787 0.852 4.928 7.37 8.536 7.252 3.312
0.277 0.556 1.556 -0.507 -0.5 0.658 0.652 -0.51 -1.679 -1.69 -2.983 -3.163 -3.843 1.136 -0.601 0.128 1.576 1.563 -0.507 -2.585 -2.419
1.264 1.454 2.0 0.924 1.053 0.783 0.939 1.343 1.57 1.409 1.648 0.915 2.389 1.62 1.849 0.206 0.47 0.761 1.48 1.873 0.319
julia> positions(f1)
3×20 Chemfiles.ChemfilesArray:
1.078 2.546 2.933 3.328 4.731 5.51 6.902 7.515 6.757 5.364 8.87 0.545 0.684 0.914 2.864 5.061 7.48 7.246 4.79 9.26
-0.099 0.135 1.177 -0.932 -1.036 -0.175 -0.304 -1.3 -2.184 -2.059 -1.44 0.855 -0.775 -0.523 -1.669 0.606 0.383 -2.962 -2.754 -0.721
0.096 0.344 0.857 -0.067 0.04 0.816 0.848 0.099 -0.656 -0.678 0.077 0.152 0.859 -0.899 -0.582 1.422 1.457 -1.236 -1.285 0.601
julia> positions(frame) == positions(f0)
truePDBTools.jl
With PDBTools.jl (which is AtomsBase.jl/AtomsIO.jl-compatible), everything runs smooth:
julia> using PDBTools
julia> asp = read_pdb("aspirin.pdb")
Vector{Atom{Nothing}} with 21 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 C UNL X 1 1 1.071 0.277 1.264 1.00 0.00 1 X 1
2 C UNL X 1 1 2.533 0.556 1.454 1.00 0.00 1 X 2
3 O UNL X 1 1 2.977 1.556 2.000 1.00 0.00 1 X 3
4 O UNL X 1 1 3.273 -0.507 0.924 1.00 0.00 1 X 4
5 C UNL X 1 1 4.679 -0.500 1.053 1.00 0.00 1 X 5
6 C UNL X 1 1 5.417 0.658 0.783 1.00 0.00 1 X 6
7 C UNL X 1 1 6.803 0.652 0.939 1.00 0.00 1 X 7
8 C UNL X 1 1 7.457 -0.510 1.343 1.00 0.00 1 X 8
⋮
14 H UNL X 1 1 0.494 1.136 1.620 1.00 0.00 1 X 14
15 H UNL X 1 1 0.787 -0.601 1.849 1.00 0.00 1 X 15
16 H UNL X 1 1 0.852 0.128 0.206 1.00 0.00 1 X 16
17 H UNL X 1 1 4.928 1.576 0.470 1.00 0.00 1 X 17
18 H UNL X 1 1 7.370 1.563 0.761 1.00 0.00 1 X 18
19 H UNL X 1 1 8.536 -0.507 1.480 1.00 0.00 1 X 19
20 H UNL X 1 1 7.252 -2.585 1.873 1.00 0.00 1 X 20
21 H UNL X 1 1 3.312 -2.419 0.319 1.00 0.00 1 X 21
julia> traj = read_pdb("multi.pdb")
Vector{Atom{Nothing}} with 41 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 C UNL X 1 1 1.071 0.277 1.264 1.00 0.00 1 X 1
2 C UNL X 1 1 2.533 0.556 1.454 1.00 0.00 1 X 2
3 O UNL X 1 1 2.977 1.556 2.000 1.00 0.00 1 X 3
4 O UNL X 1 1 3.273 -0.507 0.924 1.00 0.00 1 X 4
5 C UNL X 1 1 4.679 -0.500 1.053 1.00 0.00 1 X 5
6 C UNL X 1 1 5.417 0.658 0.783 1.00 0.00 1 X 6
7 C UNL X 1 1 6.803 0.652 0.939 1.00 0.00 1 X 7
8 C UNL X 1 1 7.457 -0.510 1.343 1.00 0.00 1 X 8
⋮
35 H UNL X 1 2 0.914 -0.523 -0.899 1.00 0.00 2 X 14
36 H UNL X 1 2 2.864 -1.669 -0.582 1.00 0.00 2 X 15
37 H UNL X 1 2 5.061 0.606 1.422 1.00 0.00 2 X 16
38 H UNL X 1 2 7.480 0.383 1.457 1.00 0.00 2 X 17
39 H UNL X 1 2 7.246 -2.962 -1.236 1.00 0.00 2 X 18
40 H UNL X 1 2 4.790 -2.754 -1.285 1.00 0.00 2 X 19
41 H UNL X 1 2 9.260 -0.721 0.601 1.00 0.00 2 X 20
julia> read_pdb("multi.pdb", "residue 1")
Vector{Atom{Nothing}} with 21 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 C UNL X 1 1 1.071 0.277 1.264 1.00 0.00 1 X 1
2 C UNL X 1 1 2.533 0.556 1.454 1.00 0.00 1 X 2
3 O UNL X 1 1 2.977 1.556 2.000 1.00 0.00 1 X 3
4 O UNL X 1 1 3.273 -0.507 0.924 1.00 0.00 1 X 4
5 C UNL X 1 1 4.679 -0.500 1.053 1.00 0.00 1 X 5
6 C UNL X 1 1 5.417 0.658 0.783 1.00 0.00 1 X 6
7 C UNL X 1 1 6.803 0.652 0.939 1.00 0.00 1 X 7
8 C UNL X 1 1 7.457 -0.510 1.343 1.00 0.00 1 X 8
9 C UNL X 1 1 6.728 -1.679 1.570 1.00 0.00 1 X 9
10 C UNL X 1 1 5.335 -1.690 1.409 1.00 0.00 1 X 10
11 C UNL X 1 1 4.640 -2.983 1.648 1.00 0.00 1 X 11
12 O UNL X 1 1 3.521 -3.163 0.915 1.00 0.00 1 X 12
13 O UNL X 1 1 5.085 -3.843 2.389 1.00 0.00 1 X 13
14 H UNL X 1 1 0.494 1.136 1.620 1.00 0.00 1 X 14
15 H UNL X 1 1 0.787 -0.601 1.849 1.00 0.00 1 X 15
16 H UNL X 1 1 0.852 0.128 0.206 1.00 0.00 1 X 16
17 H UNL X 1 1 4.928 1.576 0.470 1.00 0.00 1 X 17
18 H UNL X 1 1 7.370 1.563 0.761 1.00 0.00 1 X 18
19 H UNL X 1 1 8.536 -0.507 1.480 1.00 0.00 1 X 19
20 H UNL X 1 1 7.252 -2.585 1.873 1.00 0.00 1 X 20
21 H UNL X 1 1 3.312 -2.419 0.319 1.00 0.00 1 X 21
julia> read_pdb("multi.pdb", "residue 2")
Vector{Atom{Nothing}} with 20 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
22 C UNL X 1 2 1.078 -0.099 0.096 1.00 0.00 2 X 1
23 C UNL X 1 2 2.546 0.135 0.344 1.00 0.00 2 X 2
24 O UNL X 1 2 2.933 1.177 0.857 1.00 0.00 2 X 3
25 N UNL X 1 2 3.328 -0.932 -0.067 1.00 0.00 2 X 4
26 C UNL X 1 2 4.731 -1.036 0.040 1.00 0.00 2 X 5
27 C UNL X 1 2 5.510 -0.175 0.816 1.00 0.00 2 X 6
28 C UNL X 1 2 6.902 -0.304 0.848 1.00 0.00 2 X 7
29 C UNL X 1 2 7.515 -1.300 0.099 1.00 0.00 2 X 8
30 C UNL X 1 2 6.757 -2.184 -0.656 1.00 0.00 2 X 9
31 C UNL X 1 2 5.364 -2.059 -0.678 1.00 0.00 2 X 10
32 O UNL X 1 2 8.870 -1.440 0.077 1.00 0.00 2 X 11
33 H UNL X 1 2 0.545 0.855 0.152 1.00 0.00 2 X 12
34 H UNL X 1 2 0.684 -0.775 0.859 1.00 0.00 2 X 13
35 H UNL X 1 2 0.914 -0.523 -0.899 1.00 0.00 2 X 14
36 H UNL X 1 2 2.864 -1.669 -0.582 1.00 0.00 2 X 15
37 H UNL X 1 2 5.061 0.606 1.422 1.00 0.00 2 X 16
38 H UNL X 1 2 7.480 0.383 1.457 1.00 0.00 2 X 17
39 H UNL X 1 2 7.246 -2.962 -1.236 1.00 0.00 2 X 18
40 H UNL X 1 2 4.790 -2.754 -1.285 1.00 0.00 2 X 19
41 H UNL X 1 2 9.260 -0.721 0.601 1.00 0.00 2 X 20BioStructures.jl
But with BioStructures.jl and PDBFormat, we get some errors:
julia> using BioStructures
julia> struc = read("aspirin.pdb", PDBFormat)
ERROR: Two copies of the same atom have the same alternative location ID. Existing atom:
Atom C with serial 1, coordinates [1.071, 0.277, 1.264]
New atom record to add:
AtomRecord C with serial 2, coordinates [2.533, 0.556, 1.454]
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] unsafe_addatomtomodel!(mo::Model, atom_rec::AtomRecord; remove_disorder::Bool)
@ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/model.jl:1652
[3] unsafe_addatomtomodel!
@ ~/.julia/packages/BioStructures/ZBun0/src/model.jl:1559 [inlined]
[4] read(input::IOStream, ::Type{…}; structure_name::String, remove_disorder::Bool, read_std_atoms::Bool, read_het_atoms::Bool, run_dssp::Bool, run_stride::Bool)
@ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:66
[5] read
@ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:47 [inlined]
[6] (::BioStructures.var"#70#71"{String, @Kwargs{}, DataType})(input::IOStream)
@ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:113
[7] open(f::BioStructures.var"#70#71"{String, @Kwargs{}, DataType}, args::String; kwargs::@Kwargs{})
@ Base ./io.jl:410
[8] open
@ ./io.jl:407 [inlined]
[9] #read#69
@ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:112 [inlined]
[10] read(filepath::String, t::Type{PDBFormat})
@ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:108
[11] top-level scope
@ REPL[7]:1
Some type information was truncated. Use `show(err)` to see complete types.(Same behavior for paracetamol.pdb or multi.pdb).
Same for MMCIFDict (now with aspirin.mmcif):
julia> struc = "aspirin.mmcif" |> MMCIFDict |> MolecularStructure
ERROR: KeyError: key "_atom_site.group_PDB" not found
Stacktrace:
[1] getindex
@ ./dict.jl:477 [inlined]
[2] getindex
@ ~/.julia/packages/BioStructures/ZBun0/src/mmcif.jl:84 [inlined]
[3]
@ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/mmcif.jl:332
[4] MolecularStructure
@ ~/.julia/packages/BioStructures/ZBun0/src/mmcif.jl:321 [inlined]
[5] |>(x::MMCIFDict, f::Type{MolecularStructure})
@ Base ./operators.jl:926
[6] top-level scope
@ REPL[9]:1
Some type information was truncated. Use `show(err)` to see complete types.Non-Open Babel
Maybe the problem is with Open Babel, right? So let's download Aspirin's SDF from PubChem and use Chemfiles.jl to convert it to PDB.
julia> using Downloads; Downloads.download("https://pubchem.ncbi.nlm.nih.gov/rest/pug/conformers/000008C400000001/SDF?response_type=save&response_basename=2244", "./2244.sdf")
"./2244.sdf"
julia> using Chemfiles
julia> traj = Trajectory("2244.sdf")
Trajectory(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_TRAJECTORY}(Ptr{Chemfiles.lib.CHFL_TRAJECTORY} @0x0000000002315a30, false), nothing)
julia> frame = read(traj)
Frame(Chemfiles.CxxPointer{Chemfiles.lib.CHFL_FRAME}(Ptr{Chemfiles.lib.CHFL_FRAME} @0x00000000030c9fd0, false))
julia> Trajectory("chfl.pdb", 'w') do traj
write(traj, frame)
end
julia> using BioStructures
julia> struc = read("chfl.pdb", PDBFormat)
ERROR: PDBParseError: could not read model serial number at line 1 of file:
MODEL 1
Stacktrace:
[1] read(input::IOStream, ::Type{…}; structure_name::String, remove_disorder::Bool, read_std_atoms::Bool, read_het_atoms::Bool, run_dssp::Bool, run_stride::Bool)
@ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:74
[2] read
@ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:47 [inlined]
[3] (::BioStructures.var"#70#71"{String, @Kwargs{}, DataType})(input::IOStream)
@ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:113
[4] open(f::BioStructures.var"#70#71"{String, @Kwargs{}, DataType}, args::String; kwargs::@Kwargs{})
@ Base ./io.jl:410
[5] open
@ ./io.jl:407 [inlined]
[6] #read#69
@ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:112 [inlined]
[7] read(filepath::String, t::Type{PDBFormat})
@ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:108
[8] top-level scope
@ REPL[9]:1
caused by: ArgumentError: input string is empty or only contains whitespace
Stacktrace:
[1] tryparse_internal(::Type{Int64}, s::String, startpos::Int64, endpos::Int64, base_::Int64, raise::Bool)
@ Base ./parse.jl:115
[2] #parse#619
@ ./parse.jl:254 [inlined]
[3] parse
@ ./parse.jl:253 [inlined]
[4] read(input::IOStream, ::Type{…}; structure_name::String, remove_disorder::Bool, read_std_atoms::Bool, read_het_atoms::Bool, run_dssp::Bool, run_stride::Bool)
@ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:72
[5] read
@ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:47 [inlined]
[6] (::BioStructures.var"#70#71"{String, @Kwargs{}, DataType})(input::IOStream)
@ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:113
[7] open(f::BioStructures.var"#70#71"{String, @Kwargs{}, DataType}, args::String; kwargs::@Kwargs{})
@ Base ./io.jl:410
[8] open
@ ./io.jl:407 [inlined]
[9] #read#69
@ ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:112 [inlined]
[10] read(filepath::String, t::Type{PDBFormat})
@ BioStructures ~/.julia/packages/BioStructures/ZBun0/src/pdb.jl:108
[11] top-level scope
@ REPL[9]:1
Some type information was truncated. Use `show(err)` to see complete types.Finals
Also, this seem to be related to #48.