Pteros  2.0
Molecular modeling library for human beings!

Getting started

Suppose that the Pteros library is compiled and all linking requirements are satisfied. In order to start using Pteros in you program you need single include statement:

#include "pteros/pteros.h"

This will include basic classes of the library. All classes and function of the Pteros library are defined inside "pteros" namespace. The following line will allow to omit repetitive "pteros::" prefix in your program:

using namespace pteros;
More advanced usage requires additional headers. For example: #include "pteros/core/mol_file.h"

The fundamental objects in Pteros are systems and selections. System is what its name suggests - the whole molecular system with atoms and their coordinates. The attributes and the coordinates of atoms are physically stored in the system. Typically the system is loaded from one or several files (such as PDB or GRO). The system can contain several sets of coordinates, called frames, which are typically loaded from trajectory files of MD simulations. The system can be created in two ways:

// 1)
System sys1; // Empty system
sys1.load("2lao.pdb"); // Load data into system
// 2)
System sys2("2lao.pdb"); // Read the structure file into the system immediately

The method load has reach set of additional options and could be called several times to add difference pieces of data from several structure and trajectory files. See Advanced loading with file handlers for details.

Atoms and frames are stored inside the system are usually not accessed directly. The system is only a container for them, while all manipulations are done by means of selections.


Selection is a subset of atoms in the System. Selection does not hold any data, but merely point to particular group of atoms. There are several ways of creating a selection - from textual description, from the sequence of indexes or using custom selection functions. Every selection is associated with one and only one system. You can't select atoms from several systems simultaneously.

Textual selections

In order to create a selection from the textual dexcription you must supply the parent system and selection string. The selection syntax in Pteros is very similar to one used in VMD.

Instead of giving long and boring formal description of the selection syntax, let's learn it by example:

// Create a system
System sys("some-protein.pdb");
// Select everything
Selection sel0(sys,"all");
// Select by atom name(s)
Selection sel1(sys,"name CA");
Selection sel2(sys,"name CA CB OA"); // Selects all CA, CB and OA atoms
// Select by residue number. You can use ranges in two forms with "-" and with "to".
Selection sel3(sys,"resid 3"); // single residue number 3
Selection sel4(sys,"resid 1-25"); // Residues from 1 to 25 inclusive
Selection sel5(sys,"resid 100 1-25 200 to 206"); // Residues from 1 to 25, from 200 to 206 and also residue 100
// Select by atom index (starts from zero). You can also use ranges.
Selection sel6(sys,"index 3");
Selection sel7(sys,"index 20 5 100-600 700 9 0 2");
// Select by residue name(s)
Selection sel8(sys,"resname ALA"); // Selects all ALA residues
Selection sel9(sys,"resname THR MET"); // Selects all THR and MET residues
// Select by chain(s)
Selection sel10(sys,"chain A B"); // Selects chains A and B
// Select by coordinates of atoms (arbitrary arithmetic expressions are supported)
Selection sel11(sys,"x<10 and y>z*4.5+x");
// This will select all atoms inside a cylinder aligned with Z axis with a radius of 2 Angstroms:
Selection sel12(sys,"x^2+y^2 < 2^2");
// Arbitraraly complex logic with AND, OR and NOT could be used
Selection sel13(sys,"((name CA and resid 10 to 100) or (name CB and resname ALA)) or x^2>y^2");
Selection sel14(sys,"not (name CA and resid 10-100)");
// Select all atoms, which are within given distance (in nm!) from another selection
Selection sel15(sys,"within 2.5 of (name CA and resname ALA)");
// The same as above, but takes care of periodic boundary conditions (try doing this in VMD. Good luck :) )
Selection sel15periodic(sys,"within 2.5 periodic of (name CA and resname ALA)");
// Instead of "periodic" keyword you can use shorter "pbc". "nopbc" and "noperiodic" could also be used but they are used by default and could be omitted.
// Within also takes "noself" keyword which excludes the central selection
// (those after "of") from result:
Selection sel15noself(sys,"water and within 0.7 pbc noself of protein"); // Select water around protein without protein itself
// Select whole residue if at least one atom from this residue is found in selection, enclosed into "()"
Selection sel16(sys,"by residue (name CA and resname ALA)");
// Select using regular expressions. Will select all atoms with names staring with C - CA, CB, C1H, etc.
Selection sel17(sys, "name 'C.*'");
// Select atoms within 0.5 nm of the point in space with coordinates {1.12, 4.56, 6.7}:
Selection sel18(sys,"distance point 1.12 4.56 6.7 < 0.5");
// Select atoms within 0.5 nm of the line, specified by two points {1.12, 4.56, 6.7} and {3.5 5.6 10.1}:
Selection sel19(sys,"distance vector 1.12 4.56 6.7 3.5 5.6 10.1 < 0.5");
// Select atoms within 0.5 nm of the plane, specified by the point {1.12, 4.56, 6.7} and the normal vector {0 1 1}
// accounting for periodic boundary conditions (pbc is handy equivalent to periodic):
Selection sel19(sys,"distance pbc plane 1.12 4.56 6.7 0 1 1 < 0.5");

Textual selections are "smart" in a way that selection text is analysed and optimized in numerous ways before evaluation. If selection depends on the coordinates of atoms it updates automatically if the coordinates change:

// Load structure
System s("some-protein.pdb");
// Load trajectory with multiple frames
// Create selection depending on coordinates
Selection sel(s,"x>10"); // By default points to frame 0
// Now points to frame 1. Coordinates of atoms change but selection
// is smart and updates automatically!

Non-textual selections

It is also possible to make a selection from the pair of indexes or the pair of iterators of some integer sequence:

int ind1 = 10;
int ind2 = 20;
Selection sel20(sys,ind1,ind2); // Selects all indexes from 10 to 20 inclusive
vector<int> ind = {5,10,34,1,4,15};
Selection sel21(sys,ind.begin(),ind.end()); // Select by iterators to integer container

Finally, if you want to implement really complex logic of selecting atoms you can use selection with callbacks:

// Create a callback function which includes complex selection logic
// It takes a system, a number of target frame and the vector of selected indexes
// which have to be filled. These indexes point to selected atoms
void sel_func(const System& sys,int fr,std::vector<int>& ind){
// Some complex logic which fills ind with values goes here
// Just for example we selecting all atoms with x>5
for(int i=0;i<sys.num_atoms();++i)
if(sys.XYZ(i,fr)(0)>5.0) ind.push_back(i);
System s("struct.pdb");
Selection sel(s, &sel_func); // Callback function is called to fill selection

Modifying selections

You can also create empty selections and populate them later using modify() methods:

Selection sel2; // Empty selection, not associated with the system
Selection sel3(sys); // Selection bounded to system sys, but selection text is not yet specified
// Now populate these selections
sel2.set_system(sys); // Set system first
sel2.modify("name CA");
sel3.modify("name CB"); // The system was set already
// You can reassign selection to another system if you want:
sel3.set_system(other_sys); // This clears selection...
sel3.modify("name CB"); // we need to create it again

Different modify() methods exist, which correspond to other types of selections - for the pair of indexes, for the pair of iterators, for callback function, etc.

Copying and assigning selections

Selections could be copyed and assigned, particularly it is possible to place them to STL containers:

vector<Selection> vec;
Selection sel(sys,"all");
// Make a vector of 10 empty selections
vector<Selection> vec10(10);
// Populate some of them
vec10[5].modify("resname ALA");
vec10[6].modify("not name O");

If assigning one selection to another, the deep copy of the selection (not just a reference!) is created.

Selection s1(sys);
Selection s2(sys);
s1 = s2; // s1 is a deep copy of s2. Modifications to s1 do not change s2.

The systems are also copyable, but with one important twist - associated selections are not copyed with the parent system.

Manipulating selections

We can do a lot of different things with selections. Let's start from obtaining residue names of all selected atoms as an STL vector:

vector<string> res_names = sel1.get_resname();

We can also obtain any property of particular selected atom with very simple syntax. For example let's print the chain and the resid of the first atom in selection:

cout << sel1.Chain(0) << endl;
cout << sel1.Resid(0) << endl;

Note that first atom in selection is not the first atom in the system! If you want to know the index of this atom in the system you should do

cout << sel1.Index(0) << endl; //May print "1328". The first atom in selection is in fact the atom 1328 in the system

In fact the code "sel1.Chain(0)" above is an equivalent of verbose expression

// Just an example, will not compile!
cout << sys.atoms[sel1.index[0]].chain;

This fragment will not compile because Selection::index is private, but in any case the shorthand function Selection::Chain() simplifies the things a lot. Such shorthand functions are inlined, thus in principle there is no performance loss. Other atom attributes could also be accessed by means of such functions with the name, which coincide with the attribute name, but with the capital first letter (Name(i), Chain(i), Index(i), X(i), etc.). The main attributes of the atoms are:

  • name - the name of atom in PDB file (such as "CA")
  • resid - the number of residue (an integer). Unique within each chain.
  • resindex - Unique index of the residue in the whole system, even is multiple chains are present.
  • resname - the name of the residue in 3-letters code (such as "ALA" or "GLY")
  • chain - the chain in PDB file (single character, such as "A")
  • tag - arbitrary textual tag (often called "segment" in CHARMM, NAMD or VMD)
  • mass - the mass of the atom in atomic units
  • charge - atomic charge (only assigned correctly if MD topology is loaded)
  • beta - the B-factor in PDB file
  • occupancy - the occupancy in PDB file

Now let's play with the coordinates of atoms. First of all let's load molecular dynamics trajectory into the system:

System s("some-protein.pdb");
Selection sel1("name CA");

The trajectory should contain the same number of atoms as the system. The XTC, TRR and DCD trajectory files are now supportd.

It is also possible to read only certain portion of trajectory, say between frames 10 and 100:

System s("some-protein.pdb");

Selections are always born pointing to the frame 0 (frame count starts from zero). Let's make selection point to the frame 3:


Now we can obtain the coordinate of particular atom i for the frame 3:

float x_coord = sel1.X(i);

or the coordinates of all atoms in selection for frame 3 as:

vector<float> all_x_coords = sel1.get_x();

It is also possible to get coordinates of any frame by supplying second parameter:

// Copy all coordinates of atom 10 for frame 3 to atom 20 for frame 5
sel1.XYZ(20,5) = sel1.XYZ(10,3);

Another way of getting the properties of atoms and coordinates in selection is using the indexing syntax:

cout << sel1[0].Resid() << " " << sel1[3].XYZ(10) << endl;

This code will output resid of the atom 0 and the coordinates of atom 3 for frame 10. Usually indexing syntax is less convenient and more verbose, however it has an big advantage of working in the iterator-based or range-based loops:

// Iterator-based loop:
for(Selection::iterator it=sel1.begin(); it!=sel1.end(); it++){
cout << it->Resid() << endl;
// Range-based C++11 for loop:
for(auto& at: sel1){
cout << at.Resid() << endl;

One can also duplicate frames, copy one frame to the other and delete frames. Note that this is done by the methods of System class:

System sys("some-protein.pdb");
// Duplicate frame 4 (copy becomes the last frame)
// Copy coordinates of this duplicated frame to frame 1
sys.frame_copy(sys.num_frames()-1, 1); // Note the usage of num_frames() to get the nuber of frames in the system

Geometry transformations

Pteros provides reach set of geometry transformation functions. Transformation applied to selection will immediately take effect on all selections, which overlap with given selection. Let's look at some examples:

// Translate selection by given vector
Vector3f dir(1.0, 3.4, -4.5);
// Rotate selection around axis X (axis 0) by some angle (in radians) relative to the center of masses
// Rotate selection around axis Y (axis 1) by some angle (in radians) relative to the given pivot point
Vector3f pivot(10.0, 20.0, 30.0);
// Rotate selection around given vector, by some angle and relative to given point
Vector3f axis(0.0, 4.0, -2.0);

RMS fitting and alignment

It is very easy to compute the RMSD between two selection of the same size (they can belong to different systems):

Selection sel1(sys1,"name CA");
Selection sel2(sys2,"name CB");
cout << "RMSD=" << rmsd(sel1,sel2) << endl;

We can also do this for arbitrary frames:

Selection sel1(sys1,"name CA");
Selection sel2(sys2,"name CB");
// RMSD between frame 0 of the first selection and frame 1 of the second one
cout << "RMSD=" << rmsd(sel1, 0, sel2, 1) << endl;

It is possible to compute RMSD for different frame of the same selection:

Selection sel1(sys1,"name CA");
// RMSD between frames 0 and 1
cout << "RMSD=" << sel1.rmsd(0,1) << endl;

In order to do RMSD fitting of two selections of the same size it enough to write:

Selection sel1(sys1,"name CA");
Selection sel2(sys2,"name CB");
// Fit selection sel1 to sel2

However, the most common situation is when you are fitting together, say, Ca atoms, but need to rotate the whole protein according to this fitting. This is accomplished by computing fitting transformation first and then applying it:

Selection sel1(sys,"name CA");
// Compute a fit transform for fitting frame 1 to frame 3
Affine3f trans = sel1.fit_transform(1,3);
// And apply it to the whole protein
// The perevious line shows how to create a temporary selection "on the fly".

Python bindings

Although Pteros is a C++ library, many molecular analysis tasks require writing simple "throw-away" scripts without edit-compile-run overhead of C++. Python bindings serve this purpose in Pteros. In addition to this end-user application Python bindings are also vital part of the Very-high-level facilities system.

Bindings are described in a dedicated documentation page.

High-level facilities

Although System and Selection classes already provide quite high-level tools for building custom analysis programs, Pteros contains even more advanced facilities for rapid implementation of complex analysis algorithms. When you build your custom analysis program, it is usually painful to implement the following things:

  • Read only specified range of frames from trajectory based on time or frame number.
  • Read the trajectory, stored by pieces in several files.
  • Read very large trajectory, which doen't fit into the memory frame by frame.
  • Implement parallel execution of several analysis tasks, to keep all processor cores busy.
  • Implement processing of the command line arguments, which set all options of trajectory processing and represent custom flags for your analysis.

It is necessary to emphasize an importance of parallel processing. MD trajectories are often huge (up to ~100Gb) and reading them from disk tipically takes many minutes, especially if the storage is not local. If you have 5 different anaysis tasks, which should be applied to the same trajectory it is very wasteful to run them sequntially and to read the whole trajectory five times. It is much more logical to read the trajectory only ones and execute all your tasks in parallel for each frame. By doing this you will also utilize the power of you modern multi-core processor effectively.

All these routine operations in Pteros are incapsulated into the Trajectory_processor class. The logic of using this class is the following. You supply it with the set of options (the trajectory to read, which frames to include into the analysis, etc). In addition you create a number of Consumer objects, which represent separated analysis tasks, and connect them to the Trajectory_processor. After that you run the processor. It launches all supplied tasks in separate parallel threads, read the trajectory frame by frame and passes the frames to each of the tasks for user-defined processing.

Let's write a simple example of computing average minimal and maximal coordinates in selection along the trajectory using the power of Trajectory_processor. First of all we need to subclass a Consumer class:

class Our_task: public Consumer {
// Constructor
Our_task(Trajectory_processor* pr, string sel_str): Consumer(pr){
// set selection text
sel_text = sel_str;
// Inherited methods from Consumer:
// Called immediately before first frame is processed
virtual void pre_process();
// Called immediately after last frame is processed
virtual void post_process(const Frame_info& info);
// Called each time new frame arrives.
// This frame is stored in system.traj[0]
virtual void process_frame(const Frame_info& info);
// Variables, which are specific for our analysis
Vector3f min_average, max_average;
string sel_text;
Selection sel;

All logic of our analysis should be implemented in three virtual methods: pre_process(), process_frame() and post_process(). The names are self-explanatory. Let's implement them:

virtual void pre_process(){
// Prepare our min and max variables for computing averages
virtual void process_frame(const Frame_info& info){
// Currently loaded frame is stored in internal system in frame 0.
// Our selection already points to it by default.
// Compute minimal and maximal values
Vector3f min,max;
sel.minmax(min,max); //Using minmax() method of selection class
// Add to averages
min_average += min;
max_average += max;
virtual void post_process(const Frame_info& info){
// Here we make use of Frame_info object to get the time range of processing
min_average /= (info.last_time-info.first_time);
max_average /= (info.last_time-info.first_time);
// Transpose is used to print a vector in one line instead of column of numbers
cout << "Averaged minimal value: " << min_average.transpose() << endl;
cout << "Averaged maximal value: " << max_average.transpose() << endl;

Now we can write a main program for our small analysis utility:

#include "pteros/analysis/trajectory_processor.h"
#include "path to Our_task definition"
using namespace std;
using namespace pteros;
int main(int argc, char* argv[]){
// Create container for command-line options
Options options;
// Parse command line
// Create Trajectory processor and pass options to it
Trajectory_processor proc(options);
// Create our analysis task and connect it to processor
Our_task task1(&proc, "name CA");
// Create another task
Our_task task2(&proc, "name CB");
// And another one
Our_task task3(&proc, "within 2.5 of resid 1 to 100");
// Run processor!

Now three tasks, operating on different selections will run in parallel while reading the trajectory. But wait, what trajectory we are going to read? This is specified at run time by the command line arguments:

./our_program -f some-protein.pdb traj-part1.xtc traj-part2.xtc -b 14fr -e 250ps

In our case we specify -f, -b and -e arguments, which are absorbed internally by Trajectory_processor. Trajectory_processor looks at the list of files cpecified after -f and finds a structure file (some-protein.pdb in our case). This file is loaded into the "system" variable of all our tasks. Then Trajectory_processor reads all trajectories one by one in order of appearance and calls our task for frame processing. Processing starts at frame 14 and ends when the time stamp in current frame becomes larger then 250 ps. All this complex logic is completely incapsulated by Trajectory_processor class, which saves you a lot of time.

As you noticed, we hard-coded selection texts in out code. Let's do our program more flexible. We will modify it to take multiple additional arguments like this:

1 ./our_program -f some-protein.pdb traj-part1.xtc traj-part2.xtc -b 14fr -e 250ps -sel "name CA" "name CB" "resid 1-15"

This is surprisingly simple:

int main(int argc, char* argv[]){
// Create container for command-line options
Options options;
// Parse command line
// Create Trajectory processor;
Trajectory_processor proc(options);
// Container for our tasks
vector<shared_ptr<Our_task> > tasks;
// Get all values from -sel key and cycle over them
for(auto& sel_text: options("sel").as_strings()){
// Create task
shared_ptr<Our_task> task(&proc, sel_text);
// Run processor!

In fact we only need three lines to process our additional options!

Trajectory_processor doesn't own the consumers, so we need to save them in a vector to ensure that they will not be destructed immediately.

Very-high-level facilities

Pteros provides even higher level facilities for developing custom trajectory analysis algorithms - the analysis plugins. The analysis plugin is a class with very simple interface derived from Consumer, which runs in parallel during reading the trajectory. In contrast to Consumer analysis plugins are loaded and executed by dedicated driver program, so you don't need to bother with initialization of Trajectory_processor, passing parameters and other 'housholding' code. The most exciting thing about analysis plugins is that they could be written either in C++ or pure Python using almost identical API and intergrated seamlessly.

Plugins written in C++ could also be compiled as the separate stand-alone programs, which doesn't require Python to run. Just add -DSTANDALONE_PLUGINS="ON" option when compiling your plugin.

C++ analysis plugins run at the same speed as manually written programs, which use Trajectory_processor and Consumer - there is no run-time overhead (except initial searching and loading of plugins, which is usually neglegible). The driver script only connects the consumers with Trajectory_processor from Python side and after that no Python code is evaluated at all.

Pure Python tasks are, of course, limited by the speed of Python interpreter but they are extremely easy to write and to modify. In general pure Python tasks, which mostly call compiled Pteros methods are also very fast.

In the /bin directory of your pteros installation you can find executable Python script, which is the driver for analysis plugins. All plugins (both C++ and Python) are stored in python/pteros_analysis_plugins directory. Any shared library (*.so) or python file (*.py), which appear there is treated as a plugin and could be loaded by the driver.

The driver script is called like this (splitted by several lines for clarity):

1 python \
2 -f \
3  structure-file.pdb \
4  some-trajectory.xtc \
5  -b 0fr -e 100fr \
6 -task center -selection "name CA" -mass_weighted true \
7 -task sample1 -val 1 \
8 -task sample2 -val 2 \
9 -task user_script -plugin_file -val 42

The driver loads specified structure file and reads provided trajectory frame by frame in the given range of frames. On each frame all specified tasks are called.

  • Compiled C++ tasks are executed in separate threads completely in parallel.
  • All Python tasks are executed sequencially in one dedicated thread.

Thus all Python tasks run in parallel with C++ tasks but sequentially in respect to each other (this is an unfortunate limitation of the Python multithreading model). In practice you'll unlikely run many time consuming Python tasks simultaneously, so this should not be a serious performance limitation. In any case if the execution speed becomes a problem it is better to implement the plugin in C++.

Passing the parameter -plugin_file to any task allows loading plugins from any non-standard location by the relative or absolute path.

You are free to run multiple instances of the same task with different parameters

1 -task center -selection "name CA" -mass_weighted true \
2 -task center -selection "name CB CC" -mass_weighted false \

all of them will be executed separately and will not interfere with each other.

Writing pure Python analysis plugin

Let's write simple pure Python analysis plugin, which computes center of masses of given selection. Put the following code into the file and place it into the directory python/pteros_analysis_plugins of your Pteros installation

1 from pteros import *
3 class Task:
4  def pre_process(self):
5  self.sel_text = self.options("selection").as_string()
6  self.use_mass = self.options("mass_weighted","False").as_bool()
7  self.sel = Selection(self.system, self.sel_text )
8  print "Working on selection '%s'" % self.sel.get_text()
10  def process_frame(self,info):
11  print "Frame ", info.absolute_frame, " time ", info.absolute_time
12  print "There are %i atoms in selection" % self.sel.size()
13  print "Selection center: ",
15  def post_process(self,info):
16  print "Finished!"

Any Python plugin should define a class Task with three methods: pre_process(), post_process() and process_frame(). The signatures of the methods are evident from the code. Such class gets "magic" variables system, label and options. The system variable is a reference to the underlying system object, while label is a textual label "\<TaksName\>_id\<N\>", where TaskName is the name of analysis plugin and N is the unique number of the task. Label is handy when you need to get a unique name of the output file, which will never clash with the output of other tasks, which are running in parallel.

The variable options contains the options, which corresponds to this particular task. We extract the options selection and mass_weighted using it.

The rest of the code is self-explanatory.

Writing compiled analysis plugin

Let us implement the same plugin in C++. It will look almost the same and has very small syntactic overhead:

class PLUGIN_NAME: public Compiled_plugin_base {
PLUGIN_NAME(Trajectory_processor* pr, Options_tree* opt): Compiled_plugin_base(pr,opt) {
void pre_process(){
string sel_text = options("selection").as_string();
use_mass = options("mass_weighted","false").as_bool();
cout << "Working on selection " << sel.get_text() << endl;
void process_frame(const Frame_info &info){
cout << "There are " << sel.size() << " atoms in selection" << endl;
cout << "Frame " << info.absolute_frame
<< " time " << info.absolute_time << endl;
cout << "Selection center: " << << endl;
void post_process(const Frame_info &info){
cout << "Finished" << endl;
Selection sel;
bool use_mass;

We inherit a class from Compiled_plugin_base and override the same methods pre_process(), post_process() and process_frame(). In C++ we need to put an explicit constructor, which will initialize the base class. The rest of the code is almost 1-to-1 translation of the Python example given above.

The crucial point is the macro CREATE_COMPILED_PLUGIN(PLUGIN_NAME), which does all the magic for us. Behind the scene it creates the code for compiled Python extension module. After compilation and linking we get file, which is loaded by the driver program at the same way as our pure Python plugin.

In the case of the stand-alone plugin this macro will create an independent pteros_center executable instead.

The macro PLUGIN_NAME comes from the CMake build script, but you can define your own name in the code (in this case you whould also make sure that the build system produces shared library with appropriate name!).

The simplest way to build and install your plugin is using the template CMake project located at the /template_plugin directory of the Pteros source tree.