Pteros  2.0
Molecular modeling library for human beings!
Core functionality

Table of Contents

Basic classes

Atom

Pteros treats all particles as atoms. There is no destinction between real atoms and various dummy particles (virtual sites, shell particles, etc.). Each atom in Pteros has the following attributes:

Property Data type Description Comment
name string Atom name
resid integer Residue id Unique within single protein or nucleic acid chain. May start at any value defined in the structure file for each chain.
resindex integer Residue index Unique within the whole system. Chain boundaries are ignored. Starts at 0. Assigned automatically.
resname string Residue name
chain char Chain identifier Single character. If no chain information is present defaults to single space.
tag string Arbitrary textual tag Defaults to empty string
mass float Atomic mass In atomic units. If not given explicitly in the structure file guessed from the atom name. Defaults to 1.0.
charge float Atomic charge Electron charge units. Defaults to 0.0. Usually read from topology files.
beta float PDB B-factor Defaults to 0.
occupancy float PDB occupancy facor Defaults to 0.
type integer Numerical index of atom type Defaults to -1, only makes sense in MD topologies.
type_name string Name of atom type Defaults to empty string.
Note
Coordinates of atoms are stored separately in frames.

Atoms are represented by the objects of Atom class.

System

The system is a container for atoms, coordinate frames and associated force field parameters. The system may contain many coordinate frames which represent different states of the system (MD time steps or NMR structures for example). System is usually read from one or more structure, trajectory or topology files. Systems are represented by the System class.

Note
The system is not a representation of particular data file! Although attributes of atoms are similar to the fields of PDB files the system is much more general concept. The system could be constructed from a single file (such as PDB), from several files (such as PDB+XTC) or could be built from scratch programmatically by adding individual atoms and frames.

Frame

The frame is representation of the instanteneous state of the system. It contains coordinates of all atoms, time stamp and the periodic box (if the periodic boundary conditions are used). Represented by the Frame class.

Selection

Selection is a subset of atoms from particular system. Selection do not contain any data but just points to existing atoms in the system. Selection should be considered as a "handle" to certain group of atoms which allows to query their properties and to manipulate their attributes and coordinates in various ways.

Loading molecular system

Supported file formats

Currently the following file formats are supported:

  • Structure files
    • PDB
    • GRO
    • MOL2
  • Trajectories
    • XTC
    • TRR
    • TNG
      Note
      TNG files also contain structure information.
    • DCD
  • Topology
    • PTTOP
      Note
      Read only. Produced from Gromacs TPR files by the tpr2pteros.py script.

What is loaded from data files?

Molecular data formats contain different information about the system. In Pteros all information stored in the data files is classified into atoms, coordinates, trajectory and topology. Type of particular piece of information is determined by Mol_file_content enum value MFC_* where MFC is abbreviation for "Molecular File Content".

Atoms (MFC_ATOMS)
Information about atoms in the system (name, residue name, chain and other attributes) but without coordinates.
Coordinates (MFC_COORD)
Single set of coordinates of atoms (single frame) and the periodic box if periodicity is present.
Trajectory (MFC_TRAJ)
The number of coordinate frames each containing a set of coordinates, time stamp and the periodic box if periodicity is present.
Topology (MFC_TOP)
MD topology containing connectivity, atom types, precise masses, partial charges, non-bond Van-der-Waals parameters, etc.

Particular file format may contain different information. For example PDB files contain atoms and coordinates. XTC files contain the trajectory but no atoms. TNG files contain atoms and trajectory at the same time, etc. There are two ways of loading data files - simple and advanced.

Loading is performed by either constructor of the System class or by System::load() method. In both cases the same behavior and parameters are used.

Simple loading

When the data file is loaded in Pteros using simple mode the information being loaded depend on the file type and on the current content of the System where the data go. The most "logical" way of adding new data to the system is chosen but there is no fine control about what is read and how. Here is an example:

C++Python
// (1) Initial loading in constructor
System s("somefile.pdb");
// (2) Adding structure file
s.load("other-file.pdb");
// (3) Adding trajectory
s.load("trajectory.xtc",3,20);
// (4) Adding topology
s.load("topology.pttop");
1 # Initial loading in constructor
2 s = System('somefile.pdb')
3 
4 # (2) Adding structure file
5 s.load('other-file.pdb')
6 
7 # (3) Adding trajectory
8 s.load('trajectory.xtc',3,20)
9 
10 # (4) Adding topology
11 s.load('topology.pttop')
  1. In this example we first construct the System from the PDB file "somefile.pdb". File type is always determined by extension. Since on step (1) the system is empty all possible information is read from this file, which means that atoms and coordinates would be read.
  2. On step (2) we are adding another PDB file "other-file.pdb". Now the system already contains atoms and one coordinate frame, thus Pteros deduces that we just want to add another coordinate frame. Thus only coordinates are read from "other-file.pdb". If the number of atoms in this file is the same as in the System another frame is added. No check is performed if "other-file.pdb" actually contains the same set of atoms as the system - everything with matching number of atoms is accepted.
  3. On step (3) we are adding an XTC trajectory. The logic is the same as on step (2) - the number of new coordinate frames are added to the System. Additional arguments instruct Pteros to read frames from 3 to 20 inclusive.
  4. Finally on step (4) we are adding topology file. Since the System already contain atoms only additional topological information (like atom types, charges, connectivity, etc.) is read.

Since different file types contain different information the logic of adding information to the System becomes rather complicated in no-trivial situations. The following table summarises what is read from different files in different cases.

Initial loading (in constructor or to the empty System)

File type What is loaded? Comment
PDB, GRO, MOL2 Atoms and the single frame MOL2 does not contain periodic box!
PTTOP Atoms, the single frame, full topology PTTOP file in Pteros are not "pure topology" - they also contain the set of starting coordinates.
TNG Atoms and multiple frames
XTC, TRR, DCD Fails and raises exception In Pteros reading trajectory into an empty system is not allowed (in contrast to VMD for example).

Adding data to the System which is not empty (by System::load())

File type What is loaded? Comment
PDB, GRO, MOL2 Single frame New frame added. The only check is matching number of atoms.
PTTOP Topology Adds atom types and charges, updates masses, adds other topology information. No coordinates are read, no frames added.
TNG, XTC, TRR, DCD Multiple frames Adds number of new frames. The only check is matching number of atoms.

Advanced loading with file handlers

Warning
Advanced mode should only be used in specific cases and only if you are absolutely sure that you need fine control. The simple mode is the most convenient in most practical scenarios.
Advanced mode and file handlers are not available from Python.

In advanced mode the user specifies explicitly what to read and what to store in the system.

#include "pteros/core/mol_file.h"
...
auto pdb_handler = Mol_file::open("some-pdb-file.pdb",'r');
auto gro_handler = Mol_file::open("some-gro-file.gro",'r');
auto xtc_handler = Mol_file::open("some-xtc-file.xtc",'r');
System s;
s.load(pdb_handler, MFC_ATOMS); // (1)
s.load(gro_handler, MFC_COORD); // (2)
// Load frames until the end of trajectory one by one
bool ok;
do {
ok = s.load(xtc_handler, MFC_TRAJ); // (3)
} while(ok);

In this example we first create three file handlers for corresponding PDB, GRO and XTC files. File handlers are created by static Mol_file::open() method. Handlers could be created for reading (mode 'r') and for writing (mode 'w'). In our case we open them for reading. After that we create an empty system and load data from handlers.

  1. On step (1) we read only atoms (MFC_ATOMS) from PDB file. No coordinates are read.
  2. On step (2) we add coordinates (MFC_COORD) from GRO file.
  3. On step (3) we read frames from XTC file (MFC_TRAJ) one by one in the loop up to the end of trajectory. The load() method returns false on failure to read next frame, which allows to track the end of trajectory.
Warning
Behavior of MFC_TRAJ for file handlers is different from what is used in simple mode! Each call of System::load(handler) reads the single frame. This behavior is intentional and allows reading frames one by one.
Note
If one will use "MFC_ATOMS | MFC_TRAJ" or "MFC_COORD | MFC_TRAJ" on steps (1) and (2) then MFC_TRAJ will be ignored. This is again intentional to separate frame-by-frame trajectory reading from reading other information.
Warning
If MFC_ATOMS is specified the system is cleared before reading even if it already contained some atoms!

Loading with callback

Both simple and advanced forms of System::load() support optional user-defined callback function as last parameter. This function takes two argiments: the pointer to parent system amd the index of current frame. If the callback returns true the loading of trajctory will continue to the next frame. If it returns false the loading will stop. Such callbacks are useful for organizing simple frame by frame processing of trajectories.

Note
Trajectory_processor class and analysis plugins provide much more powerful way of trajectory processing. Callbacks are the most useful as "quick and dirty" way of doing this in simple scripts.
C++Python
bool callback(System* sys, int frame){
cout << "Called for frame " << frame << endl;
}
...
System sys('some-structure.pdb');
sys.load('some-traj.xtc',0,-1,0,&callback)
1 def callback(sys,frame):
2  print 'Called for frame ',frame
3 
4 ...
5 
6 sys = System('some-structure.pdb')
7 sys.load('some-traj.xtc', on_frame=callback)
Note
When using callback the frames are still accumulated by the system. If you want to store only one frame in the memory use this trick:
C++Python
bool callback(System* sys, int frame){
// Do the job in callback
...
// Delete current frame.
sys->frame_delete(frame,frame);
}
1 def callback(sys,frame):
2  # Do the job in callback
3  ...
4  # Delete current frame.
5  sys.frame_delete(frame,frame)
Please note that this is done by default by analysis plugins.

Using input filters

Sometimes it is not necessary to read all atoms from the data file. For example if the system contains protein in water quite often only the protein is of interest. However, water may constitute 75% of the whole number of atoms and storing them in the System is redundant. This is especially important if many frames have to be kept in memory. In this case filtering the input may save huge amount of memory.

Pteros has the mechanism of input filtering which helps in such situations. The filter is a special kind of Selection which is set to empty system before any loading occures. All subsequent calls to System::load() will be filtered and only selected atoms will be kept in the system.

Filter is set by System::set_filter() method. It takes the same arguments as ordinary selections with two exceptions: filters can't be coordinate-dependent (they will throw an error during loading) and can't be set by callbacs.

Note
Filters do not make loading faster. In fact it becomes slower (up to 2 times for very large frames) and consumes twise as much memory during loading. However, when loading finishes the system will contain only selected atoms. If many frames are stored overall memory consumption will be much smaller. Do not use filters if loading speed is critical!

Filter could only be set to empty system (set_filter() throws an error otherwise), thus the correct way of using them is the following:

C++Python
// Create empty system
System sys;
// Set filter
sys.set_filter("name CA");
// All calls to load will use filter now
sys.load("some_protein.pdb");
sys.load("trajectory.xtc");
1 # Create empty system
2 sys = System()
3 # Set filter
4 sys.set_filter('name CA')
5 # All calls to load will use filter now
6 sys.load('some_protein.pdb')
7 sys.load('trajectory.xtc')
Warning
Filters could not be used for topologies! load() throws an error if attempting to load topology with filter.

Selecting atoms

Selections are the most important objects in Pteros which allow manipulating groups of atoms in the System. It is important to understand that selections do not contain any data but merely point to the set of atoms in the System.

Ways of creating selections

In order to select atoms one need to create Selection class in one of the following ways:

  1. Direct construction of Selection object
    C++Python
    Selection sel(system, <arguments...>);
    1 sel = Selection(system, <arguments...>)
  2. Construction by System::select() method
    C++Python
    Selection sel = system.select(<arguments...>);
    // Or using type inference:
    auto sel = system.select(<arguments...>);
    1 sel = system.select(<arguments...>)
  3. Construction by System::operator() This is exactly the same as the previous one but less verbose.
    C++Python
    Selection sel = system(<arguments...>);
    // Or using type inference:
    auto sel = system(<arguments...>);
    1 sel = system(<arguments...>)

Arguments of selection methods

The arguments passed to selection could be the following (only method 2 is used for illustration but any method takes the same variants of arguments):

  1. Textual selections (using selection language)
    C++Python
    // Defaults to 1st frame in the System
    auto sel = system.select("resname ALA GLY and within 3.0 pbc of resid 23-34 45");
    // Explicitly points to frame #3
    auto sel = system.select("resname ALA GLY and within 3.0 pbc of resid 23-34 45", 3);
    1 # Defaults to 1st frame in the System
    2 sel = system.select('resname ALA GLY and within 3.0 pbc of resid 23-34 45')
    3 # Explicitly points to frame #3
    4 sel = system.select('resname ALA GLY and within 3.0 pbc of resid 23-34 45', 3)
  2. Pair of atom indexes Selects atoms with indexes in the given range (inclusive).
    C++Python
    int ind1 = 10;
    int ind2 = 20;
    sel = system.select(ind1, ind2);
    1 ind1 = 10
    2 ind2 = 20
    3 sel = system.select(ind1, ind2)
  3. Vector of atom indexes
    C++Python
    vector<int> ind = {1,2,3,56,67,100};
    sel = system.select(ind);
    1 ind = [1,2,3,56,67,100]
    2 sel = system.select(ind)
  4. Pair of iterators to integer vector
    Warning
    This is only available in C++.
    vector<int> ind = {5,10,34,1,4,15};
    Selection sel21(sys,ind.begin(),ind.end());
  5. Custom callback function This method allows implementing arbitrary complex logic of selecting atoms by delegating the work to user-provided callback function. The callback function has the following signature:
    C++Python
    void selection_callback(const System& sys, int fr, std::vector<int>& ind);
    1 def selection_callback(sys, fr, ind)
    First argument is the parent System, second - is the target frame, and the third is integer vector, which have to be filled with indexes of selected atoms. The following example shows selecting atoms with x>5. On practice this is easier to implement using textual selections but callback allows implementing arbitrarily complex logic.
    C++Python
    // Callback function
    void sel_func(const System& sys,int fr,std::vector<int>& ind){
    // Just for example we are selecting all atoms with x>5
    ind.clear();
    for(int i=0;i<sys.num_atoms();++i)
    if(sys.XYZ(i,fr)(0)>5.0) ind.push_back(i);
    }
    ...
    System s("struct.pdb");
    // Callback function is called to fill selection
    Selection sel(s, &sel_func);
    // The same but for specific frame #3
    Selection sel(s, &sel_func, 3);
    1 # Callback function
    2 def sel_func(sys,fr,ind):
    3  # Just for example we are selecting all atoms with x>5
    4  ind = []
    5  for i in xrange(sys.num_atoms()):
    6  if sys.getXYZ(i,fr)[0]>5.0:
    7  ind.append(i)
    8 
    9 ...
    10 
    11 s = System('struct.pdb')
    12 # Callback function is called to fill selection
    13 sel = s.select(sel_func)
    14 # The same but for specific frame #3
    15 sel = s.select(sel_func, 3)

Modifying selections

Existing selection objects could be modified in two ways: by changing parent System and by selecting another set of atoms. Parent selection is changed by Selection::set_system() method:

C++Python
System sys1("structure1.pdb");
System sys2("structure2.pdb");
// Create empty selection pointing to sys1
Selection sel(sys1);
// Make it point to sys2
sel.set_system(sys2);
1 sys1 = System('structure1.pdb')
2 sys2 = System('structure2.pdb')
3 # Create empty selection pointing to sys1
4 sel = Selection(sys1)
5 # Make it point to sys2
6 sel.set_system(sys2)
Warning
Selection::set_system() always clears content of selection and leaves it empty even if it contained some data!

In order to modify content of seelction (the set of selected atoms) the number of Selection::modify() methods is present. They accept the same arguments as corresponding constructors of Selection class. Each method also have the variant where new System is passed as the first argument:

  1. Textual selections
    C++Python
    // Changes selected atoms
    sel.modify("resname ALA GLY");
    // The same but also assigns to other system
    sel.modify(other_system, "resname ALA GLY");
    // The same but also changes the target frame to frame 3
    sel.modify(other_system, "resname ALA GLY", 3);
    1 # Changes selected atoms
    2 sel.modify('resname ALA GLY')
    3 # The same but also assigns to other system
    4 sel.modify(other_system, 'resname ALA GLY')
    5 # The same but also changes the target frame to frame 3
    6 sel.modify(other_system, 'resname ALA GLY', 3)
  2. Pair of atom indexes
    C++Python
    int ind1 = 10;
    int ind2 = 20;
    // Changes selected atoms
    sel.modify(ind1, ind2);
    // The same but also assigns to other system
    sel.modify(other_system, ind1, ind2);
    1 ind1 = 10
    2 ind2 = 20
    3 # Changes selected atoms
    4 sel.modify(ind1, ind2)
    5 # The same but also assigns to other system
    6 sel.modify(other_system, ind1, ind2)
  3. Vector of atom indexes
    C++Python
    vector<int> ind = {1,2,3,56,67,100};
    // Changes selected atoms
    sel.modify(ind);
    // The same but also assigns to other system
    sel.modify(other_system, ind);
    1 ind = [1,2,3,56,67,100]
    2 # Changes selected atoms
    3 sel.modify(ind)
    4 # The same but also assigns to other system
    5 sel.modify(other_system, ind)
  4. Pair of iterators to integer vector
    Warning
    This is only available in C++.
    vector<int> ind = {5,10,34,1,4,15};
    // Changes selected atoms
    sel.modify(ind.begin(),ind.end());
    // The same but also assigns to other system
    sel.modify(other_system, ind.begin(),ind.end());
  5. Custom callback function
    C++Python
    // Callback function
    void sel_func(const System& sys,int fr,std::vector<int>& ind){
    // Just for example we are selecting all atoms with x>5
    ind.clear();
    for(int i=0;i<sys.num_atoms();++i)
    if(sys.XYZ(i,fr)(0)>5.0) ind.push_back(i);
    }
    ...
    // Changes selected atoms
    sel.modify(sel_func);
    // The same but also assigns to other system
    sel.modify(other_system, sel_func);
    // The same but also changes the target frame to frame 3
    sel.modify(other_system, sel_func, 3);
    1 # Callback function
    2 def sel_func(sys,fr,ind):
    3  # Just for example we are selecting all atoms with x>5
    4  ind = []
    5  for i in xrange(sys.num_atoms()):
    6  if sys.getXYZ(i,fr)[0]>5.0:
    7  ind.append(i)
    8 
    9 ...
    10 
    11 # Changes selected atoms
    12 sel.modify(sel_func)
    13 # The same but also assigns to other system
    14 sel.modify(other_system, sel_func)
    15 # The same but also changes the target frame to frame 3
    16 sel.modify(other_system, sel_func, 3)

Selecting everything

C++Python
// Using System::select_all() (the most efficient method)
Selection all = system.select_all();
// Using operator "()" with no arguments (the same as above)
all = system();
// Using textual selection (less efficient)
all = Selection("all");
1 # Using System.select_all() (the most efficient method)
2 all = system.select_all()
3 
4 # Using operator "()" with no arguments (the same as above)
5 all = system()
6 
7 # Using textual selection (less efficient)
8 all = Selection('all')

Selection language

Selection language in Pteros is similar but not identical to one of VMD. Simple selections could be copy-pasted from VMD to Pteros and vice versa, while more complex selections would be different. Particularly Pteros provides much more advanced selections which include periodic boundary conditions, which are not possible in VMD.

The basic elements of selection language are the following:

Selecting everything

The keyword "all" selects all atoms in the system.

Keyword selections

Consist of the keyword, which represent property of the atom, followed by one or more values of this property. The values are implicitly combined by logical OR. Depending on the keyword the values are either strings, integers or unsigned integers.

For integers in addition to individual values the ranges are allowed in two forms: 1 to 10 and 1-10. Both forms are equivalent.

For strings regular expressions could be given in single ('') or double ("") quotes. The systnax of the regular expressions is the same as used in C++11.

Keyword Type of values Example
name string name CA CB "C.*"
resname string resname ALA GLU '.N.*'
resid int resid 1 2 3 4-10 20 to 40
resindex unsigned int resindex 1 2 3 5 to 100 200-211
chain single character chain A B C
tag string tag TAG1 TAG2
index unsigned int index 12 13 45 2-7

Numeric properties

Consist of a lone keyword, which represent single numeric property of the atom. Numeric properties could be either integer or floats. They could be combined to numerical expressions and compared by numeric comparisons.

Keyword Property
x X coordinate of the atom
y Y coordinate of the atom
z Z coordinate of the atom
beta B-factor of the atom
occupancy Value of the PDB occupancy field for the atom
index Index of the atom
resindex Residue index of the atom
resid Resid of the atom
distance The distance of atom from given point, line or plane. See description of distance selections for details.
Note
Words index, resindex and resid could be either keywords followed by multiple values or numeric properties depending on the context:
Selection string Interpretation
"resid 1 2 3 4-10" interpreted as keyword selection
"resid > 25" interpreted as numeric property in numerical expression.

Numeric expressions

Numerical expressions consist of numerical properties, integer of float point literals and common arithmetic operators "+", "-", "*", "/". Raising to any integer or fractional power is possible using either "^" or "**" operators. Unary minus could be used to change the sign of expression. Operator precedence could be changed by parentheses. Numerical expressions are only meaningful as operands of numeric comparisons. Single number or single numerical property is also valid numerical expression.

Numeric comparisons

Numerical expressions are compared to each other by the common operators:

Operator Meaning Example
">" Greater than x > 3
"<" Lower than y < 5.6
"=" Equal resid = 45
"<>" Not equal resid <> 14
">=" Greater or equal index >= 15
"<=" Lower or equal resindex <= 100

The comparisons could be chained to select the ranges:

Selection text Meaning
3 < x < 10 Open range
10 > x >= 3 Semi-closed range
100 <= resid <= 200 Closed range

Logical expressions

Common logical operators "or", "and" and "not" are supported and could be used to construct arbitrary logical expressions:

resid 1-10 or x>15
name CA CB and resname ALA
not (name OW and resname SOL and resid > 34)

Within selections

Within selections allow selecting atoms which are within given cutoff distance around other selection, called "central selection". The syntax of within selections in Pteros is the following:

within <distance> [pbc|nopbc|periodic|noperiodic] [self|noself] of <central selection>

Distance is in nm.

Optional keywords "pbc" or "periodic" mean that selection takes into account periodic boundary conditions. Keywords "nopbc" or "noperiodic" means that periodicity is switched off. By default periodicity is off.

The optional keyword "self" means that central selection itself is also included into resulting selection. If "noself" is specified the central selection is excluded from result. The default is "self".

Keywords "pbc|nopbc" and "self|noself" can go in any order.

Here are some examples:

Selection text What is selected
within 3.0 of protein All atoms within 3.0 nm from any protein atom including the protein itself. Not periodic.
within 3.0 pbc of protein Same as above but periodic.
within 3.0 pbc noself of protein Same as above but the protein itself is not included.

By residue selections

The "by residue <central selection>" operator is used to select whole residues, which are referenced by the central selection. This means that if at least one atom of particular residue is present in central selection then the whole residue would be selected.

Warning
The "by residue" operation has lower precedence than logical operators thus its operand have to be put into the parentheses if it contains logical operations: "by residue (name CA and resid 1 4 5)".

Distance selections

Distance selections are unique for Pteros and allow selecting atoms by their distance from given reference point, line or plane. Their syntax is the following:

dist|distance point|vector|plane [pbc|nopbc|periodic|noperiodic] x0 y0 z0 [x1 y1 z1]

The keyword "dist" and "distance" are synonyms. If "point" is specified than three float point numbers x0, y0, z0 are expected. For "vector" and "distance" six float point numbers x0, y0, z0, x1, y1, z1 are expected.

Optional keywords "pbc" or "periodic" mean that selection takes into account periodic boundary conditions. Keywords "nopbc" or "noperiodic" means that periodicity is switched off. By default periodicity is off.

In the case of "dist vector" (x0,y0,z0) give the point in space and (x1,y1,z1) specify direction of the vector originating from this point (direction is normalized automatically). The distance from the atoms to this vector is evaluated.

In the case of "dist plane" (x0,y0,z0) give the point in space and (x1,y1,z1) specify the normal vector of the plane originating in this point (the normal vector is normalized automatically). The distance from the atoms to this plane is evaluated.

Distance keyword operates on per-atom basis and is treated as numeric property of the atom:

Selection text What is selected
dist point 12.4 34.3 45.1 < 3.0 Atoms within 3.0 nm from the point "12.4 34.3 45.1"
dist point pbc 12.4 34.3 45.1 < 3.0 The same as above but periodic
dist vector 2.1 3.3 3.5 1 0 0 < 3.0 Atoms within the cylinder with radius 3 nm and the axis going from point "2.1 3.3 3.5" along X axis (the direction vector is "1 0 0")
3.0 < dist plane 2.1 3.3 3.5 1 1 1 < 5.0 Selects the slabs of atoms which are between 3 and 5 nm from the plane which goes through the point "2.1 3.3 3.5" and has the normal "1 1 1".

Text-based and coordinate-depensent selections

Selections created by means of selection string (text-based selections) are a bit special in Pteros. The set of selected atoms may depends on atomic coordinates (for example for selection x>15 or within 3.0 of y<34). These cases are recognized automatically and such coordinate-dependent selection are treated in special manner.

Coordinate-dependent selections have to be updated if atomic coordinates change. If the coordinates of atoms were changed manually without changing the current frame this have to be done by hand using Selection::apply() method. If selection is not coordinate-dependent it does nothing.

If the user calls Selection::set_frame() to change the frame selection points to then the coordinate-dependent selection calls apply() automatically to be consistent with new atomic coordinates.

It is also possible to re-parse selection text completely by calling Selection::update(). This is necessary when the number of atoms (not only their coordinate) change. For non-textual selections it does nothing.

In most cases the feature of auto-update is very handy and saves a lot of time allowing not to bother about consistency of the coordinate-dependent selections with the current frame. However in certain cases such behavior is not desirable. For example one selects all water molecules around the protein in starting configuration using "resname SOL and within 3.0 of protein" and want to see how they diffuse in the course of MD simulation. Each update of the coordinate frame made by Selection::set_frame() will select new water shell around protein, which is not what we want. In such case one can call Selection::flatten() method. It "flattens" selection to the plain set of indexes without any auto-update magic.

Warning
Flattening is not reversible! New text-based selection have to be created to turn auto-update on again.

Text-based status of selection is also lost immediately after any operation which modifies it externally (i.e. not by means of parsing selection string) such as using append(), remove() or invert(), using any logical operators with selections, etc. Current status of selection could be queries by Selection::text_based() and Selection::coord_dependent() methods.

Sub-selections

Sub-selections allow selecting atoms inside existing selection (narrowing or refining existing selection in other terms). Sub-selections could be very useful in the following situation. Suppose that we need to create separate selections for N,C,CA and CB atoms of particular protein residue. With "normal" selections the following code could be used:

Selection sel_N(sys,"protein and resid 1 and name N");
Selection sel_C(sys,"protein and resid 1 and name C");
Selection sel_CA(sys,"protein and resid 1 and name CA");
Selection sel_CB(sys,"protein and resid 1 and name CB");

The problem with this code is that we are looping over all atoms in the system four times, ones in each selection. This is very inefficient since we only need to find our residue with "protein and resid 1" (one loop over all atoms) and then we need to search inside this residue four times (looping over ~10 atoms only). This problem is not apparent for small systems but becomes very painful for the systems with millions of atoms. Subselections solve this problem:

Selection residue1(sys,"protein and resid 1");
auto sel_N = residue1.select("name N");
auto sel_C = residue1.select("name C");
auto sel_CA = residue1.select("name CA");
auto sel_CB = residue1.select("name CB");

Subselections inherit the system and frame from the parent. The search in sub-selections is performed over selected atoms of the parent only (the only exception from this rule are within selections which involve seacrh over all atoms by design).

Subselections can also accept the pair of local indexes or the vector of local indexes:

Selection residue1(sys,"protein and resid 1");
// Select range of atoms 1-4 inclusive in parent selection
auto sel_1_4 = residue1.select(1,4);
// Select atoms 1,4,7 in parent selection
vector<int> ind = {1,4,7};
auto sel_1_4_7 = residue1.select(ind);
Note
This behavior of sub-selections is different from normal selections. Sub-selections are constructed by local indexes of parent selection, while normal selections - by global indexes in the system. There are no constructors for sub-selections taking iterators or callback functions.

Once sub-selection is created it is not distinguishable from the gnormal one. Particularly it could be modifyed as any other selection, reassigned, etc. Sub-selections are only special by the way of their creation.

Note
Subselections in Python follow exactly the same syntax as in C++.

Combining selections

Selection objects could be combined by logical operations to create new selections:

  • Logical AND

    C++Python
    auto new_sel = sel1 & sel2;
    1 new_sel = sel1 and sel2

  • Logical OR

    C++Python
    auto new_sel = sel1 | sel2;
    1 new_sel = sel1 or sel2

  • Logical NOT

    C++Python
    auto new_sel = ~sel1;
    1 new_sel = not sel2

  • Removing one selection from the other
    C++Python
    auto new_sel = sel1 - sel2;
    1 new_sel = sel2 - sel1
    Operator "-" creates new selection, by removing all atoms of sel2 from sel1. Parent selections are not modified.
    Warning
    This operator is not commutative!

It is also possible to perform logical operation in place without creating new selections:

In-place operation Method
Appending one selection to the other Selection::append()
Removing atoms from one selection from the other Selection::remove()
Invert selection Selection::invert()
Note
append() and remove() methods also work with individual atom indexes.

Accessing properties of selected atoms

If selection contains N atoms then they could be accessed by relative indexes from 0 to N-1 inclusive. Relative indexes are NOT the same as atomic indexes. Atom with global index 150 may have relative index 0 in one selection and 23 in the other. Relative indexes are very convenient in accessing properties of selected atoms. For this Selection class contains the set of accessor methods.

Warning
Accessing atom properties could be rather confusing due to the differences in C++ and Python. These differences are dictated by very different architectures of C++ and Python languages.

Accessor methods in C++

Accessor methos in C++ have the same names as the properties of atoms with the first letter capitalized. For example sel.Index(i) returns global index of atom with relative index i, sel.Name(i) returns name of this atom, sel.Resname(i) returns its residue name, etc.

Accessors X(i), Y(i) and Z(i) give access to individual atomic coordinates of the current frame (the frame pointed by selection). The method XYZ(i) returns the vector of all three coordinates. These accessors also accept frame index as optional second parameter: Z(i,fr), XYZ(i,fr), etc.

There are also accessors for periodic box associated with current frame pointed by selection (Box()) and the time stamp of the pointed frame (Time()). These methods are not available for arbitrary frames.

There is also special accessor VDW(i) which returns the van der Waals radius of i-th atom.

In C++ all of them are accessors inlined l-value functions. This means that they do not add any overhead in accessing atom properties and could be assinged to:

Selection sel(system,"name CA");
// Read access
cout << sel.Name(0) << endl;
// Write access
sel.Name(0) = "CB";
// Accessing coordinates of the current frame
cout << sel.X(0) << endl;
sel.XYZ(0) = sel.XYZ(0) + sel.XYZ(0);
// Accessing coordinates of other frames
cout << sel.X(0,frame) << endl;
sel.XYZ(0,frame) = sel.XYZ(1,frame1) + sel.XYZ(2,frame2);

Accessor methods in Python

Due to language limitations in Python each accessor have to be prepended by "get" or "set" prefixes for read and write access respectively, which makes them rather clumsy and difficult to use:

1 sel = Selection(system, 'name CA')
2 # Read access
3 print sel.getName(0)
4 # Write access
5 sel.setName(0, "CB")
6 
7 # Accessing coordinates of the current frame
8 print sel.getX(0)
9 sel.setXYZ(0, sel.getXYZ(0) + sel.getXYZ(0) )
10 
11 # Accessing coordinates of other frames
12 print sel.getX(0,frame)
13 sel.setXYZ(0, frame, sel.getXYZ(1,frame1) + sel.getXYZ(2,frame2));

Much more covenient way of accessing atom properties in Python is using indexing operator of Selection object (see below).

Indexing operator of Selection object

Selections support indexing by "[]" operator. It returns special Atom_proxy object, which has the same set of atom property accessors as Selection itslef.

In C++ the only difference is that these accessors do not take relative atomic index since Atom_proxy stores it internally.

// Note that accessors do not take atom index as first argument!
Selection sel(system,"name CA");
// Read access
cout << sel[0].Name() << endl;
// Write access
sel[0].Name() = "CB";
// Accessing coordinates of the current frame
cout << sel[0].X() << endl;
sel[0].XYZ() = sel[1].XYZ() + sel[2].XYZ();
// Accessing coordinates of other frame
cout << sel[0].X(frame) << endl;
sel[0].XYZ(frame) = sel[1].XYZ(frame1) + sel[2].XYZ(frame2);

In Python Atom_proxy objects benefit from the Python properties, which makes the usage of indexing much easier than the direct usage of accessors:

1 # Recommended way in Python
2 
3 sel = Selection(system, 'name CA')
4 # Read access
5 print sel[0].name
6 # Write access
7 sel[0].name = "CB"
8 # Note that 'name' is a property, not a method!
9 # It is NOT capitalized and no () is needed!
10 
11 # This also works for coordinates of the CURRENT frame:
12 print sel[0].x, sel[0].xyz
13 sel[0].xyz = sel[1].xyz + sel[2].xyz
14 
15 # BUT if you want to access coordinates of OTHER frame
16 # you HAVE to use ordinary accessors:
17 print sel.getXYZ(0, other_frame)
18 sel.setXYZ(0, frame, sel.getXYZ(1,frame1) + sel.getXYZ(2,frame2));
Warning
Creation of proxy objects makes access by indexing slower than direct usage of accessor methods of Selection. The difference is neglegible in most situations but do not use it in performance-critical parts of the code.

Iterating over selected atoms

There are two ways to iterate over selected atoms - using relative selection indexes and using iterators.

The first variant uses either accessor methods or indexing operator to get atom properties:

C++Python
Selection sel(system, "name CA CB");
// Usage of accessor methods for each atom property
for(int i=0; i< sel.size(); ++i){
cout << sel.Name(i) << " " << sel.Resname(i) << endl;
}
// Usage of indexing operator (slower than property accessors and not recommended in C++!)
for(int i=0; i< sel.size(); ++i){
cout << sel[i].name() << " " << sel[i].resname() << endl;
}
1 sel = Selection(system, 'name CA CB')
2 
3 # Usage of accessor methods for each atom property
4 # (clumsy syntax in Python, not reccomended despite being a bit faster)
5 for i in xrange(sel.size()):
6  print sel.getName(i), sel.getResname(i)
7 
8 # Usage of indexing operator (still not the best way in Python, see below)
9 for i in xrange(sel.size()):
10  print sel[i].name, sel[i].resname

The second way is using built-in iterators of Selection objects:

C++Python
Selection sel(system, "name CA CB");
// Usage of explicit iterators
Selection::iterator it;
for(it=sel.begin(); it!=sel.end(); it++){
cout << it->name() << " " << it->resname() << endl;
}
// Usage of range-based for loop in C++11 and higher
for(auto& a: sel){
cout << a.name() << " " << a.resname() << endl;
}
1 sel = Selection(system, 'name CA CB')
2 
3 # Recommended way in Python
4 for a in sel:
5  print a.name, a.resname

Getting particular property of all selected atoms

There is a set of methods allowing to get particular property of all selected atoms at the same time. These methods have the names "get_<property>()" and "set_<property>()" where property is name, resname, resid, beta, x, y, z, xyz, etc. In C++ getters return either std::vector for strings and char properties or Eigen::Vector for integer and float properties. Setters accept the same vectors as corresponding getters return. It is also possible to call setterss with simgle value which will "fill" all selected atoms. In Python the lists are used for strings and chars and numpy arrays for ints and floats:

C++Python
auto resids = sel.get_resid();
resids[5] += 4;
sel.set_resid(resids);
// Set all atom names to CA
sel.set_name("CA");
1 resids = sel.get_resid()
2 resids[5] += 4
3 sel.set_resid(resids)
4 
5 # Set all atom names to CA
6 sel.set_name('CA')

There are also methods "get_unique_<property>()" which return the vector of unique properties in selection. For example in order to get the list of all residue ids in selection:

C++Python
sel = system.select_all();
auto resids = sel.get_resid();
// May return 1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,...
resids = sel.get_unique_resid();
// Returns 1,2,3,4,5,...
1 sel = system.select_all()
2 resids = sel.get_resid()
3 #May return 1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,...
4 resids = sel.get_unique_resid()
5 # Returns 1,2,3,4,5,...

Building molecular systems

Pteros has reach set of methods for building molecular systems. They are not oriented on particular type of molecules like proteins or nucleic acids but allow manipulating atoms and selections in general.

Adding and deleting atoms

Appending and removing systems and selections

Multiplying selections

Rearranging atoms

There is a common problem of rearranging the atoms in the system in particular order, which is especially important for preparing MD topologies. In Pteros this is easily acomplished with System::rearrange() method. It takes a vector of selections or the vector of selection strings and arranges the atoms in the order of these selections. The atoms, which do not belong to any of provided selections are placed at the end in their initial order.

Warning
Selections should not overlap (an exception is thrown if they are). Empty selections are not allowed.
Note
This method rearranges the atom indexes only (shuffles the order of atoms). Residue or chain ids are not changed. If you need to reorder atoms by residues look at System::sort_by_resindex() method.
Warning
This method affects all frames and could be slow for very large systems and and/or large number of frames.

Working with periodicity

Pteros is designed to work with arbitrary triclinic periodic boxes. There are several ways of accounting for periodicity:

  • Many methods, which are aware of periodicity, have an optional boolean parameter "periodic" that switches on periodic computations.
  • Periodicity could be taken into account explicitly by using Periodic_box class and its methods.

Upon reading the data files Pteros sets the periodic box accordingly in the System if it is present in the data file.

Getting, setting and modifying periodic box

Periodic box could be extracted for each frame by using System::Box() method. Here is how to check if the periodic box is present:

C++Python
System s("some_file.pdb");
if( s.Box(0).is_priodic() ) cout << "Box present for frame 0!" << endl;
1 s = System('some_file.pdb')
2 if s.getBox(0).is_periodic():
3  print 'Box present for frame 0!'

An important feature of the Periodic_box is that it stores some internal data (like inverse tranform matrices) and thus individual components of the box are immutable. In order to change the box you have to get it as a 3x3 matrix, edit this matrix and assign it back using Periodic_box::modify() method:

C++Python
System s("some_file.pdb");
// Get a box as matrix. Column 0 is vector a, column 1 is b, column 2 is c.
auto m = s.Box(0).get_matrix();
// Modify individual box element
m(0,1) *= 2;
// Set it back
s.Box(0).modify(m);
1 s = System('some_file.pdb')
2 # Get a box as matrix. Column 0 is vector a, column 1 is b, column 2 is c.
3 box = s.getBox(0).get_matrix()
4 # Modify individual matrix element
5 m[0][1] *= 2
6 # Set modified box object to the System
7 s.getBox(0).modify(m)

You can also get lengths of individual box vectors with Periodic_box::get_extent() or the lengths of all vectors with Periodic_box::get_extents(). It is also possible to extract an individual vector directly with Periodic_box::get_vector().

One common operation is scaling the peridic box without changing the angles between the vectors. This could be achieved with convenience function Periodic_box::scale_vectors():

C++Python
System s("some_file.pdb");
// Scale all box vectors by x2
vector<float> scale = {2,2,2};
s.Box(0).scale_vectors(scale);
1 s = System('some_file.pdb')
2 # Scale all box vectors by x2
3 s.getBox(0).scale_vectors([2,2,2])

It is also possible to get and set the box as a set of vector lengths (a,b,c) and the set of angles between the vectors (a^c, b^c, a^b). This is performed by the Periodic_box::to_vectors_angles() and Periodic_box::from_vectors_angles() methods:

C++Python
System s("some_file.pdb");
vector<float> vecs, angles;
// Get as vectors and angles
s.Box(0).to_vectors_angles(vecs,angles);
// Modify a^b angle
angles[2] *= 2;
// Set it back
s.Box(0).from_vectors_angles(vecs,angles);
1 System s('some_file.pdb')
2 # Get as vectors and angles
3 vecs,angles = s.getBox(0).to_vectors_angles()
4 # Modify a^b angle
5 angles[2] *= 2
6 # Set it back
7 s.getBox(0).from_vectors_angles(vecs,angles)
Note
The order of angles (a^c, b^c, a^b) is the same as in PDB CRYST1 record.

Wrapping and unwrapping

There are several ways to wrap atoms or points into the box:

Note
All these methods take an optional parameter "dims" which tells which dimentions to wrap.

In contrast to wrapping unwrapping is non-trivial operation which could be done in different ways. Pteros supports two unwrapping methods:

  • Simple unwrapping is performed by Selection::unwrap(). It uses the center of masses of selection as an "anchor" and sets all atoms of selection to their periodic images, which are closest to this anchor.
    Warning
    This method is very fast but only works if selection does not exceed 1/2 of the box size in any dimension! Otherwise it produces unpredictable results.
  • Unwrapping by bonds is performed by Selection::unwrap_bonds(). In this method the user supplies the maximal bond length and all atoms of selections which are within this distance are considered as connected by bonds. After that first atom in selection is set as an anchor (the anchor atom could be overriden by optional "leading_index" parameter) and all atoms connected to it are moved to their corresponding closest periodic images. The procedure is continued for all connected atoms.
    Warning
    This mehtod works for any size of selection but is much slower than simple unwrapping!

Periodic distances and closest images

In order to compute the periodic distances the following methods could be used:

Periodic_box::shortest_vector() computes the shortest vector between two arbitrary points. Periodic_box::get_closest_image() returns the periodic image of given point, which is closest to the anchor point.

Note
All these methods take an optional parameter "dims" which tells which dimentions to treat as periodic.

Fast distance search and spatial grids

Pteros uses fast grid search algorithms for finding distances between the atoms. There are several variants of the distance seacrh which return either pairs of atoms within given cut-off distance or the list of atoms within given distance from point in space or other set of atoms.

Note
Within selections also use distance search algorithms under the hood.

Different forms of distance search

Searching contacts within single selection

Use this function to search all contacts between the atoms within single selection.

Warning
In Python the signature of this function is different:
1 (pairs,distances) search_contacts(cut_off, sel, absolute_index = False, periodic = False, do_dist = False)
Instead of taking pairs and distance vectors as reference parameters they are returned as a tuple, which Pythonic way of handling multiple return values. If the last parameter "do_dist" is "True" then distances are filled with actual values, otherwise second member of return tuple is "None".
C++Python
// Vector for found pairs
vector<Vector2i> pairs;
// Cut-off distance in nm
float cut_off = 3.5;
// Non-periodic search returning local indexes
search_contacts(cut_off,sel,pairs);
// Periodic search returning global indexes
search_contacts(cut_off,sel,pairs,true,true);
// Same as above but also return distances for each pair
vector<float> distances;
search_contacts(cut_off,sel,pairs,true,true,&distances);
1 # Cut-off distance in nm
2 cut_off = 3.5
3 
4 # In Python search_contacts() *always*
5 # returns a tuple (pairs,distances)
6 # If no distances are requested the 'distances'
7 # would be equal to None!
8 
9 # Non-periodic search returning local indexes
10 # We only take first member of the tuple (the pairs)
11 pairs = search_contacts(cut_off,sel)[0]
12 
13 # Periodic search returning global indexes
14 # We only take first member of the tuple (the pairs)
15 pairs = search_contacts(cut_off,sel,True,True)[0]
16 
17 # Same as above but also return distances for each pair
18 # Passing True in last argument instruct to fill distances with
19 # values instead of returning it as None.
20 pairs,distances = search_contacts(cut_off,sel,True,True,True)

Searching contacts between two selections

Use this function to search the contacts between two different selections.

Warning
In Python the signature of this function is different:
1 (pairs,distances) search_contacts(cut_off, sel1, sel2, absolute_index = False, periodic = False, do_dist = False)
Instead of taking pairs and distance vectors as reference parameters they are returned as a tuple, which Pythonic way of handling multiple return values. If the last parameter "do_dist" is "True" then distances are filled with actual values, otherwise second member of return tuple is "None".
C++Python
// Vector for found pairs
vector<Vector2i> pairs;
// Cut-off distance in nm
float cut_off = 3.5;
// Non-periodic search returning local indexes
search_contacts(cut_off,sel1,sel2,pairs);
// Periodic search returning global indexes
search_contacts(cut_off,sel1,sel2,pairs,true,true);
// Same as above but also return distances for each pair
vector<float> distances;
search_contacts(cut_off,sel1,sel2,pairs,true,true,&distances);
1 # Cut-off distance in nm
2 cut_off = 3.5
3 
4 # In Python search_contacts() *always*
5 # returns a tuple (pairs,distances)
6 # If no distances are requested the 'distances'
7 # would be equal to None!
8 
9 # Non-periodic search returning local indexes
10 # We only take first member of the tuple (the pairs)
11 pairs = search_contacts(cut_off,sel1,sel2)[0]
12 
13 # Periodic search returning global indexes
14 # We only take first member of the tuple (the pairs)
15 pairs = search_contacts(cut_off,sel1,sel2,True,True)[0]
16 
17 # Same as above but also return distances for each pair
18 # Passing True in last argument instruct to fill distances with
19 # values instead of returning it as None.
20 pairs,distances = search_contacts(cut_off,sel1,sel2,True,True,True)

Searching around given selection

Use pteros::search_within() function to search all atoms from source selection which are within given distance from any atom of target selection.

Warning
In Python the signature of this function is different:
1 indexes = search_contacts(cut_off, source, target, include_self = True, periodic = False)
Indexes are returned instead of being passed by reference, which is Pythonic way of handling this situation.
C++Python
// Source selection is water
auto source_sel = system("resname SOL");
// Target selection is protein
auto target_sel = system("protein");
// Vector of found atom indexes
// Absolute indexes are returned!
vector<int> result;
float cut_off = 3.5; // in nm
// Find all atoms of source in 3.5 nm shell around target
// not including target itself accounting for periodicity
search_within(cut_off, source_sel, target_sel, result, false, true);
1 # Source selection is water
2 source_sel = system('resname SOL')
3 # Target selection is protein
4 target_sel = system('protein')
5 
6 cut_off = 3.5 # in nm
7 
8 # Find all atoms of source in 3.nm 5 shell around target
9 # not including target itself accounting for periodicity
10 # Absolute indexes are returned!
11 result = search_within(cut_off, source_sel, target_sel, False, True)

Repeated searching around multiple targets

There is a common situation when one needs to find, for example, water molecules around multiple protein residues or around head group of each lipid in the membrane. In this scenario the source selection (water) is always the same, while the target selection changes and multiple searches have to be done. Usage of pteros::search_within() function is not optimal in this case bacause it will redundantly distribute source atoms to the same grid multiple times. In this case the Distance_search_within provides much better performance. This class is initialized with source selection and then allows to search around different targets efficiently multple times. It also allows searching around fixed points in space.

C++Python
// Source selection for water
auto water = system("resname SOL");
// Multiple target selections for 20 protein residues
vector<Selection> residues;
for(int i=1; i<20; i++){
residues[i].modify(system,"protein and resid "+to_string(i));
}
// Vector of vectors for results
vector<vector<int>> results(20);
// Initialise for periodic search and to return local indexes
Distance_search_within seacrher(3.5, water, false, true);
// Search for each residue
for(int i=1; i<20; i++){
searcher.seacrh_within(residues[i], results[i]);
}
1 # Source selection for water
2 water = system('resname SOL')
3 # Multiple target selections for 20 protein residues
4 residues = []
5 for i in xrange(20):
6  sel = system("protein and resid %i" % (i+1))
7  residues.append(sel);
8 
9 # Storage for results
10 results = []
11 
12 # Initialise for periodic search and to return local indexes
13 searcher = Distance_search_within(3.5, water, False, True)
14 
15 # Search for each residue
16 for i in xrange(20):
17  res = searcher.seacrh_within(residues[i+1]);
18  results.append(res);

Custom spatial grids

Sometimes one does not need to do a distance searcg but want to distribute the atoms from given selection into spatial grid. This is useful for computing 3D density distributions, occupancy volumetric hystograms, velocity fields, etc. This task is perfromed by Grid class.

Note
Grid is used internally by all distance search functions, thus it is rather low-level class. Particularly grid cells contain raw pointers to atomic coordinates, which are treated in a complex way internally. In periodic grids they may point not to actual atoms but to temporary wrapped coordinates.
Warning
Grid is currently C++ only.
// Create 100x100x100 grid
int N = 100;
Grid g(N,N,N);
// Populate it from given selection
// in periodic manner
g.populate_periodic(sel);
// Print number of atoms in the grid cells
for(int i=0;i<N;++i){
for(int j=0;j<N;++j)
for(int k=0;k<N;++k)
cout << g.cell(i,j,k).size() << endl;

Grid cells are of type vector<Grid_element> and contain information about indexes and coordinates of atoms in each cell.

Evaluating non-bond energies

Pteros is able to compute the short-range non-bond energies if topology file is read for the system. Currently Pteros supports special .pttop format of topology files, which have to be generated from Gromacs .tpr run input files using "tpr2pteros.py" script.

Warning
Long-range electrostatics with PME or other lattice sum methods are not supported in Pteros.

Treatment of non-bonded interactions in Gromacs is, unfortunately, complex and rather confusing. Coulomb and van-der-Waals interactions have a type and a modifier. The type is the functional form of interaction and the modifier is a special function, which is used to avoid the discontinuity of derivatives at cut-off distance. In practice different combinations of types and modifiers in Gromacs .mdp files lead to non-trivial and confusing choises of actual interaction functions. Pteros is trying to match these complex rule as close as possible but it is not guaranteed that the results would be numerically identical.

Note
Pteros is known to produce correct results for MARTINI coarse grained force field and simple cut-off interactions.

Generating Pteros .pttop files

Call "tpr2pteros.py" script to convernt Gromacs .tpr binary file to Pteros .pttop topology. The script is doing simple caching - if .tpr file did not change since last call of the script it will do nothing. Currently Pteros uses the same options for short-range energies as set in parent Gromacs .tpr file.

Note
This requires working Gromacs installation.

Energy components

All energy evaluation methods return an object of type Energy_components which contain separate fields for different energy components:

Field Energy component
total Total energy
lj_sr Short-ranhe Lennard-Jones energy
lj_14 Lennard-Jones energy of 1-4 pairs
q_sr Short-ranhe Coulomb energy
q_14 Coulomb energy of 1-4 pairs

Energy_components objects could be added to each other with "+" and "+=" operators.

Computing non-bond energies

Before evaluating non-bond energies the user should always check if the System has correctly set force field parameters by calling System::force_field_ready(). If it returns true than it is safe to call energy functions.

Warning
The Pteros will crash if calling energy functions on system without force field (i.e. without any topology loaded). No check is made in energy functions themselves for efficiency reasons.

Methods of System class

The method System::non_bond_energy() has two forms which evaluate the energy between two atoms or in the list of given atom pairs:

C++Python
System s("some-structure.pdb");
s.load("topology.pttop"); // Required!
// Check if system is ready for energy evaluation
if(s.force_field_ready()){
// (1) ENERGY BETWEEN TWO ATOMS
// Energy between atoms 1 and 2 for frame 0
auto en12 = s.non_bond_energy(1,2,0);
// The same but accounting for periodicity
auto en12pbc = s.non_bond_energy(1,2,0,true);
// Print total energies
cout << en12.total << " " << en12pbc.total << endl;
// (2) ENERGY FOR THE LIST OF ATOM PAIRS
// The list of atom pairs
vector<Vector2i> lst = ...;
// Non-periodic variant for frame 0
auto en = s.non_bond_energy(lst,0);
// The same but accounting for periodicity
auto en_pbc = s.non_bond_energy(lst,0,true);
// Print total energies
cout << en.total << " " << en_pbc.total << endl;
}
1 s = System('some-structure.pdb')
2 s.load('topology.pttop') # Required!
3 
4 # Check if system is ready for energy evaluation
5 if s.force_field_ready():
6  # (1) ENERGY BETWEEN TWO ATOMS
7 
8  # Energy between atoms 1 and 2 for frame 0
9  en12 = s.non_bond_energy(1,2,0)
10  # The same but accounting for periodicity
11  en12pbc = s.non_bond_energy(1,2,0,True)
12  # Print total energies
13  print en12.total, en12pbc.total
14 
15  # (2) ENERGY FOR THE LIST OF ATOM PAIRS
16 
17  # The list of atom pairs
18  lst = [[1,2],[3,4],[5,6]]
19  # Non-periodic variant for frame 0
20  en = s.non_bond_energy(lst,0)
21  # The same but accounting for periodicity
22  en_pbc = s.non_bond_energy(lst,0,True)
23  # Print total energies
24  print en.total, en_pbc.total

Methods of Selection class

Selection class provides two higher-level methods of energy evaluation. The first one computes self-energy of selection within given distance cut-off. The second computes the energy between two selection of the same size within given cut-off for given frame.

Note
If cutoff is zero or negative both methods computes interaction between all atoms (very slow for large selections).
C++Python
Selection sel1(sys,"name CA");
Selection sel2(sys,"name CB");
// Check if it is safe to compute energies
if(sys.force_field_ready()){
// Energy of all atom pairs within 0.3 nm accounting for periodicity
auto self_en = sel1.non_bond_energy(0.3, true);
// Interaction between two selections within 0.3 nm for frame 3
// and accounting for periodicity
auto en12 = non_bond_energy(sel1, sel2, 0.3, 3, true);
}
1 sel1 = sys("name CA")
2 sel2 = sys("name CB")
3 
4 # Check if it is safe to compute energies
5 if sys.force_field_ready():
6  # Energy of all atom pairs within 0.3 nm accounting for periodicity
7  self_en = sel1.non_bond_energy(0.3, True)
8 
9  # Interaction between two selections within 0.3 nm for frame 3
10  # and accounting for periodicity
11  en12 = non_bond_energy(sel1, sel2, 0.3, 3, True)

Secondary structure of proteins

Pteros uses DSSP method for determining secondary structure of proteins. There three methods Selection::dssp(). They write detailed DSSP report to file or to stream if file name of stream object are provided as an argument. If called without arguments the DSSP string is returned with letter-code for secondary structure. The code is the same as in original DSSP program:

Structure code
alphahelix H
betabridge B
strand E
helix_3 G
helix_5 I
turn T
bend S
loop ' ' (space)
Note
System::dssp() always works with the whole system, not with selection. This is caused by the fact that the algorithm have to see all atoms of the protein backbone in all protein residues while user can easily provide selection where only some of them are present.

Solvent accessible surface area

Solvent accessible surface area (SASA) is computed in Pteros using using POWERSASA code developed in the Karlsruhe Institute of Technology and inclided into SIMONA package. This is fast algorithm of computing area of the molecules based on rather complex theory of power diagrams (paper). The method Selection::sasa() returns the solvent accessible area of atoms in selection. Optional argument could be used to set the probe radius and tu get additional data like total volume, area per atom and volume per atom.

Note
POWERSASA code is not open source. I contacted the authors of POWERSASA several times to ask for official permision to use their code in Pteros but all requestes were ignored. I concluded that nobody is concerned about the licensing of POWERSASA now. However, compilation of this code is disabled by default. See Enabling SASA code for details how to enable it.