Molecular Structure Manipulation Tools

Scripts for manipulation, comparison and conversion of molecular structures.



Syntax: '[<PATH>/caller] <script_name>.py <arguments>'
for a short instruction: call '<script_name>.py' without any arguments
File handling is based upon the Open Babel conversion tools. Therefore all file types available in openbabel can be accessed (try babel -H). Additionally there is support for Columbus/Newton-X format ("col") and Tinker format ("txyz2").

Superposition of different structures of one molecule using a quaternion fit [1].

Syntax: ' <ref_struc> <struc1> <struc2> ... -t<file_type>'
e.g. -txyz

Structures <struc1>, <struc2>, ... are superimposed onto <ref_struc>. <file_type> is a file type supported by openbabel (type 'babel -help' for a complete list).
Superposition is done through aligning the centers of mass for translation, and a quaternion fit for rotations. Only proper rotations are considered.
The numbering of the atoms is crucial. Renumbering is not implemented in this script. It could be done manually or through python programming with usage of

File format conversion utility, based on and similar to the babel program but with additional support for Columbus/Newton-X ("col") and Tinker ("txyz2") formats.

Syntax: ' <in_format> <in_file> <out_format> <out_file>'
e.g. tmol coord col geom (to convert between Turbomole and Columbus formats)

Insert molecules into a larger template file and optionally combine the molecular vibrations.

Syntax: ''

Requirements: Structure files for the template and the molecules to insert. Normal modes (if required) are given as molden format files.

Execution: Follow the explanations on the screen. The non-trivial task is to identify the corresponding atoms between the structures. For this it is suggested to open both molecules at the same time in a molecular structure viewing program to compare the indices. Note that the program asks for the indices of the atoms in the (larger) template file corresponding to indices in the molecule inserted (putting the indices in the opposite order will give a wrong result).

Output: Structure file with inserted structures.
combined_vib.mld: Molden file with combined normal modes.

Conversion utility from a Tinker archive to a set of point charges in Columbus format. The structure can be separated into QM and MM regions and the functionality for adding link atoms is supplied. The script can be used to extract several frames in order to perform a calculation in an averaged potential. The point charges are created in Columbus format as (Bohr). Additionally the geometry of the core region for the first frame is written to in standard xyz format (Angstrom).

Syntax: ' <tinker-arc> <prm-file>'

<tinker-arc> is the Tinker archive with the structures. Point charges for the atom types are parsed from <prm-file>.

The remaining input is parsed from a file with the following information. Only values with no specified default have to be entered.

at_listSpecification of the atoms which shall be converted to point charges (i.e. the MM region). Use to(<start>,<end>) to specify ranges.
framesSpecification of the dynamics frames to use for conversions.
link_atomemptyUse link_atom[<QM_atom>]=[<MM_atom>,<ratio>,<symbol>] to introduce a link atom with atomic symbol <symbol> into This link atom is placed on the connecting line between the QM and MM atoms at <ratio> times their distance away from the QM atom. Per default the point charge of the MM_atom is set to zero use zero_scatter for charge redistribution.
zero_scatteremptySpecify charges to redistribute. Point charges of atoms in <zero_i> are set to zero and their summed up charge is distributed over <scatter_i>. If an atom is specified in <zero_i> which is not in <at_list>, then its charge is added to the scatter charge. If an empty list [] is specified for <scatter_i>, then the charges are discarded.
It is specified as a nested list, i.e. zero_scatter=[ [<zero_0>,<scatter_0>], [<zero_1>,<scatter_1>], ...], where <zero_i> and <scatter_i> are each lists as well. Typical choices for <zero_i> are either just the MM link atom, or both the QM region and the MM link atom. A typical choice for <scatter_i> would consist of all the atoms bonded to the MM link atom.
new_chargeemptyManual modification of a charge. Use new_charge[<index>]=<charge> to modify the charge of the atom indexed with <index> in the output file. new_charge is applied before zero_scatter, in case they should be used in connection.
sepFalseIf this is set to True then the information is read in from separate input files.
scale1.Scaling factor of the charges. The default 1. means that charges are divided by the number of frames. A value different from 1. would only be needed in special cases (for example when a symmetrization is performed later).
lvprt0Print level for debug and verification purposes.

- Please use square brackets to enclose lists and to(<start>,<end>) for specifying ranges. Use "+=" when specifying zero_scatter

at_list = [1,3,5] + to(20,9230) + [9260,9262]
frames = to(1,50)

# link atom, where only the charge of the MM link atom is moved
link_atom[9263] = [9262, 0.68, 'H']
zero_scatter+=[[ [9262], [9251,9253,9258] ]]

# charge consistent link atom, considering the partial charge of the whole QM region as well as the MM link atom
link_atom[9231] = [9230, 0.69, 'H']
zero_scatter+=[[ to(9230,9244), [9219,9221,9226] ]]
at_list here contains the indices 1, 3, 5, 20-9230, 9260, 9262.

Formally the file is parsed as Python code (with an additional to(s,e)=range(s,e+1) function). Any other Python syntax may be used for constructing the at_list, frames, and zero_scatter lists and the link_atom and new_charge dictionaries.

Reduce a file (Bohr) of point charges in Columbus format by combining close lying charges. A dynamic threshold is employed with the goal to describe areas close to the region of interest very accurately and to preserve the large scale electrostatic moments of a significantly larger system. Two point charges i and j of the same sign are combined into their center of charge if the distance dij between them is

where ri and rj are their respective distances to the origin, r0 should approximately define the molecular cavity, and C is a global scaling constant.

Within the Columbus program package, it is also possible to restore gradients for all of the original point charges. It is therefore possible to compute direct QM/MM dynamics in this fashion.

Syntax: ' [<mode>]'

If <mode> is left out or mode=reduce, a reduction is performed. If mode=expand, a gradient computed in the reduced set of point charges is expanded to the whole set of input charges. This expansion is implemented within the Columbus program package.

The input is parsed from a file with the following information. Only values with no specified default have to be entered.

C0.0Scaling threshold. The lower this threshold is, the more point charges are retained. Suggested values: 0.01≤C≤0.2
r00.0Radius (Bohr) of the molecular cavity.
orig[0.,0.,0.]Specify the origin in Bohr radii. ri, rj and r0 are taken relative to this. Typically the origin should lie at the center of mass of the QM region.
scale[1.,1.,1.]Scaling factor to allow specifying an elliptical cavity. Internally all quantities are rescaled by this factor. If for example the system extends 12 Bohr in the x-direction, 6 Bohr in the y-direction, and 4 Bohr in the z direction. You may specify r0=4.; scale=[3.,1.5,1.]
formout'col'Output format. 'col' will produce in Columbus format, 'tmol' pointcharges in Turbomole format.

- Please use square brackets to enclose lists and to(<start>,<end>) for specifying ranges.


Linear interpolation between two cartesian molecular structures of the same molecule.

Syntax: ' <start_struc> <end_struc> <out_dir> <steps_nr> <file_type>'
e.g. interpolate 10 xyz

<end_stuc> is superimposed onto <start_struc>, then <steps_nr> intermediate structures are constructed and put into the directory <out_dir>.
Linear interpolation is a clearly defined way of finding intermediate structures. But well chosen internal coordinates are probably more physical, especially when rotations play an important role.

Computation of the distances (i.e. norm of the difference vector, RMSD) between a set of structures of one molecule.

Syntax: ' <struc1> <struc2> ... [-t<file_type>] [-mwp<mass_weight_power>] [-fit<fitting>]'
e.g. -txyz

A table with the distances between the specified structures is printed out.
<file_type> is a file type supported by openbabel (type 'babel -help' for a complete list).
The default for <mass_weight_power> is -mwp1 which leads to regular mass weighting (i.e. Å2amu for the squared norm and Åamu1/2 for the distance); -mwp0 means cartesian coordinates.
-fit1 which is the default specifies that structures are initially fitted onto the first structure, -fit0 leaves the structures unchanged.,

Print out values of internal coordinates.

Syntax: ' <coors> <file1> [<file2> ...]'
<coors> contains the types of internal coordinates (dist, bend, tors) considered and the atoms involved.

Example: dist 4 5 bend 1 2 3 tors 5 6 7 8 may be used to analyze a file containing multiple structures.

Renumbering of an xyz file.

Syntax: ' <in_file> <renumber_file> <out_file>'
e.g. renumber.txt

Renumber the xyz-file <in_file>. <renumber_file> contains the old numbers of the atoms in the new order (if optionally some of the atoms are left out, they are added at the end). The result is written into <out_file>.

Python subroutine libraries

The underlying python subroutine libraries can be used as a basis for more complex tasks. For more information, look at the documentation included in the python files (either in the source code or with the python 'help(...)' command). [1] Karney, C. F. F. Journal of Molecular Graphics & Modelling 2007, 25, 595.


Felix Plasser (email: felix.plasser "at"
University of Vienna, Institute for Theoretical Chemistry
Währingerstr. 17, 1090, Vienna, Austria