MolDB5R

a free software package for creating a web-based, searchable molecular structure/reaction database

Norbert Haider, University of Vienna, 2010-2013
norbert.haider@univie.ac.at

These instructions explain how to set up MolDB5R as a web-based, fully searchable molecular structure database. Structures and data can be added via a web interface or by import from an MDL SD file (at the server command-line). This import option is shown by an example, using the SD files which are freely available from the PubChem FTP site at ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF/. Analogously, reactions and data can be imported from an MDL RD file.

For some information about the original ideas, please visit http://merian.pch.univie.ac.at/~nhaider/cheminf/moldb.html, additional detailed information can be found in this article: Haider, N., Functionality Pattern Matching as an Efficient Complementary Structure/Reaction Search Tool: an Open-Source Approach. Molecules, 15, 5079-5092 (2010). For citation in scientific publications, please use this reference rather than the project homepage.

The previous version, MolDB5 (without reaction support), is documented at
http://merian.pch.univie.ac.at/~nhaider/cheminf/moldb5doc.html

MolDB5R is a collection of fully functional PHP scripts for running a structure/reaction database with search options for text, functional groups, and structure/substructure. Included are also some Perl scripts for database setup and data import from an SD or RD file.

Download: http://merian.pch.univie.ac.at/pch/download/chemistry/moldb/moldb5r.tar.gz


Content

Features
Technical background
Installation
Using MolDB5R
Administration of MolDB5R
License
Appendix A (permissions and privileges)
Appendix B (configuration file settings)
Appendix C (performance tuning)

Features


Technical background

The system uses MySQL as the database engine, whereas its "chemical intelligence" is provided by the checkmol/matchmol program (or its socket-based server version, cmmmsrv). All relevant data are stored in MySQL tables, except the bitmap graphics of the molecules. Structure/substructure searches are performed in a two-stage process, consisting of an efficient pre-selection and a full atom-by-atom comparison of the remaining candidates. For pre-selection, a combination of molecular statistics with dictionary-based fingerprints and hash-based fingerprints is used. Optionally, this operation can be performed with memory-based MySQL tables instead of disk-based tables. This will make the search even faster. In a test installation on a standard PC (AMD Athlon 1.6 GHz CPU, 1.5 GB of memory), substructure searches in a database of 500.000 compounds are typically complete within a few seconds.

Demo MolDB5R database


Installation

Hardware requirements: 

any x86 PC or Mac (the faster, the better)

Software requirements:

==> these packages are standard components of every Linux distribution (for Windows, the XAMPP package can be used, for Mac OS X, the MAMP package is a good choice)

moreover:
Note: this package was developed and tested on a Linux machine, but with a few adaptations it should run also on a  FreeBSD system. Under Mac OS X (10.6 aka Snow Leopard), MolDB5R runs without any modifications, but requires some non-trivial software installation: GhostScript (see above) and the Perl DBI and DBD::mysql modules. The Windows development and (some) testing was done on a virtualized Windows XP system with the XAMPP 1.7.0 package. Instead of the Perl component of XAMPP, ActivePerl 5.6.1 was used and the missing database driver was installed via the PPM utility (ppm install DBI; ppm install DBD::MySQL).

Where to start?

After downloading and unpacking the MolDB5R package in your working directory, rename one of the three sample configuration files (moldb5.conf.sample_linux, moldb5.conf.sample_macosx or moldb5.conf.sample_windows) into "moldb5.conf". Load this file into a text editor and adjust the settings. A detailed overview of all parameters is given in Appendix B.

Select a name for your database, define user IDs for a proxy user with write privileges and another proxy user with read-only privileges (the example file "moldb5.conf" already contains these entries which can be accepted or modified; please change at least the passwords!).

If you want to use bitmap graphics of your molecules (highly recommended), please specify a directory where these bitmaps should be stored and make sure you have write permission there. This should be the final location of the .png files (somewhere within your web server's document root, preferentiall as a subdirectory "bitmaps" of your MolDB5R web directory). If you don't want bitmaps, just set $bitmapdir to an empty string ("").

Initialize the MolDB5R database by running the Perl script "initdb.pl" (enter "perl initdb.pl", be sure to be in the same directory as the script). You will be prompted for the password of the MySQL administrator (typically, this is the "root" user). The script will create the appropriate MySQL database and two tables within this database (one for metadata and one for the fragment dictionary). If everything goes well, the script will produce this output:
setup of database $database OK    (where $database displays the name you have specified)
setup of fragment dictionary OK



Import of structures+data from an SD file (*.sdf):

As an example, we use Compounds_000000001_000025000.sdf (after unpacking the .gz file which you can download from the PubChem FTP site).

If you have wget installed, you can retrieve the PubChem file very conveniently by entering e.g.

wget ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF/Compound_000000001_000025000.sdf.gz

Prerequisite: the checkmol/matchmol program must be installed (e.g., in /usr/local/bin, for more information, see
http://merian.pch.univie.ac.at/~nhaider/cheminf/cmmm.html).

These steps should be executed exactly in this order:
  1. Change into the util/ subdirectory and run the Perl script "sdfcheck.pl" with the name of the SD file to be imported as the command-line argument, e.g. "perl sdfcheck.pl Compound_000000001_000025000.sdf" (add a leading "../" or "../../" etc. to the SD filename if necessary). This will create a definition file "sdf2moldb.def". By default, only the first 10000 records are inspected which should be sufficient in order to collect the necessary information about the file structure. 

  2. Load the file "sdf2moldb.def" (see above) into a text editor and inspect it. Here, you can change the types of   MySQL columns or their dimensions. In addition, you MUST rename the most descriptive field name (preferentially the one holding the molecule's trivial name or its systematic name) into "mol_name". If you do not do this, you will have to manually adjust all the PHP scripts in order to retrieve the correct field. In our PubChem example, change the MySQL field name (i.e., the 2nd column, between the first and the second ":") "pubchem_iupac_name" into "mol_name". Many numeric fields are assumed to be floating-point numbers (MySQL type "DOUBLE"), but in fact are integer numbers, so their type should be changed into (e.g.) "INT(11)". In our PubChem example, only two fields should be kept as "DOUBLE": "pubchem_openeye_mw" and "pubchem_cactvs_xlogp". If you do not need one or more of the recognized SDF fields in your MySQL table, simply delete the corresponding lines from the definition file. Make sure that there is no field named "mol_id", because this is a reserved name of the MolDB5R system.

  3. Import the content of your SD file into your MySQL database by running the Perl script "sdf2moldb.pl" with the name of the SD file to be imported as the command-line argument, e.g. with "perl sdf2moldb.pl Compounds_000000001_000025000.sdf". Depending on the number of molecules in the SD file, this operation may take some time (on a standard business PC, approx. 1-5 h for 100000 records). It is recommended to do this operation in a background job, most conveniently in a "screen" session (screen is a utility program which is included in most Linux distributions).
    Note: you can create bitmap images of your structures either during the initial data import (set "$make_bitmaps = 1;" in the sdf2moldb.pl script) or later (set "$make_bitmaps = 0;"). In the latter case, data import will be significantly faster (about 3 times), bitmap files can then be created by running the updatebitmap.pl script in the util/ subdirectory.

  4. If you want to add new records, repeat step 3 with another SD file. By default, all data will be appended. ATTENTION: all SD files must have the same format (names and types of data fields) as the first one. For example, you can sequentially import all of the SD files from the PubChem site (see above). (And don't forget to let me know your performance figures for a database with >40 million structures....)  ;-) 
     
  5. If you want to create a new data collection, repeat the steps described above, but change the "db_id=1" line in the sdf2moldb.def file into "db_id=2" (or higher, if applicable).

Import of reactions+data from an RD file (*.rdf)

Reactions with associated data can be imported in a very similar manner as described for SDF import (see above). Please note that in the present version, only "flat" (non-hierarchical) RD files can be handled.

  1. Run the Perl script "rdfcheck.pl" in the util/ directory with the name of the RD file to be imported as the command-line argument.
     
  2. Edit the resulting definition file "rdf2moldb.def". Define a field "rxn_name" and make sure that there is no field "rxn_id". Make sure that the db_id is set to a number which is still available.
     
  3. Import the RD file by running the Perl script "rdf2moldb.pl" with the RD file name as the command-line argument. Bitmap and/or SVG depiction is provided in full analogy to SDF import.


Next, the web server part can be installed:

  1. Of course, you must know the location of your web server's document root directory. This might be something like "/var/www/htdocs" or "/xampp/htdocs", etc. (in a typical Apache installation, it is specified in the configuration file "httpd.conf" or in one of its include files). On many websites, a special subdirectory (e.g, "public_html") witin a user's home directory can also be a valid location.
     
  2. Within the document root, you should have already created a directory which will hold the MolDB5R PHP scripts, e.g. "moldb5r". Usually, the "bitmaps" directory will be there as a subdirectory (e.g., "moldb5r/bitmaps"). Create another subdirectory named "admin" (e.g., "moldb5r/admin").
     
  3. Copy all the files from the directory "wwwfiles" of this package to the MolDB5R web directory, copy also all of the scripts in "wwwfiles/admin" to the "admin" subdirectory of the web directory.
     
  4. The configuration file "moldb5conf.php" should contain exactly the same values for all parameters as the setup configuration file "moldb5.conf". In principle, "moldb5conf.php" is nothing else than an exact copy of "moldb5.conf" with just two extra lines (one at the beginning: "<?php ", and one at the end: "?>"). For testing, this file can be placed also in the MolDB5R web directory. In a publicly accessible production environment, however, this file should be placed outside the web server's document root, somewhere in the PHP include path (check your php.ini file and/or the PHP docs). Keeping the "moldb5conf.php" file within the document root is a security risk, as it contains plain-text passwords. Make sure the web server can read this file and revoke read permission for other users. If you are using bitmap graphics, make sure that the variable $bitmapURLdir points to the correct location, e.g. "/moldb5r/bitmaps" (no trailing slash!).
     
  5. Make sure you have the JME Java applet installed. The file "JME.jar" should be placed either in the same directory as the main MolDB5R PHP scripts scripts or in a publicly accessible directory specified in the $java_codebase variable in the configuration file. 


Note 1: it is assumed that PHP is not running in "safe mode". If safe_mode is set to On in the php.ini file, you have to make sure that  checkmol/matchmol is allowed to be executed, this can be done by specifying the corresponding directory in the variable safe_mode_exec_dir in php.ini (please check your PHP documentation).

Note 2: starting with version 5R.01, MolDB supports other character sets in addition to latin1 (which is still the default). Please set the $charset variable in the configuration file(s) to either "latin1", "latin2" or "utf8".


The subdirectory "util" of the working directory contains some Perl scripts:

The mk... scripts should be run whenever a new version of checkmol/matchmol (or cmmmsrv, respectively) is installed on your system.

The script cp2mem.pl will copy the persistent (disk-based) molstat and molcfp MySQL tables to memory-based MySQL tables, which are accessed faster than the disk-based tables. In order to make use of these tables, the moldb_meta table has to be modified (the "usemem" column must be set to 'T'), this can be done with the Perl script "setmem.pl" with an  appropriate command-line argument, e.g. "perl setmem.pl 1=T" (enabling memory-based tables for data collection 1). Do not forget to run cp2mem.pl after enabling memory-based tables and after each reboot of your machine, otherwise the memory-based tables would be empty. Generally, this option is recommended only for larger read-only data collections.

There are also utility scripts for diagnostic purposes (listdb.pl), (re-)creation of definition files from the MySQL table structure (dumpdef.pl) and data export into SD or RD files (moldb2sdf.pl, moldb2rdf.pl). All these scripts contain a short description of their functionality.



That's it for the installation part. You may wish to adjust the HTML layout of the index.php and moldb.css files and the PHP scripts, replacing "MolDB5R" by something else. Especially, the script "custom-inc.php" should be adopted to individual needs. If you write any useful extensions, I would be grateful if you donate these scripts to the project. And a little link to the checkmol/matchmol homepage would also be appreciated.
 

Using MolDB5R

Start page
On the start page (index.php), the user can select the data collection(s) from the list box and clicking the "Apply selection" button. Multiple selections (if enabled in the configuration file) can be made by pressing the <Ctrl> key while clicking at the list item. Do not forget to click the "Apply selection" button. All search options are available via the navigation bar at the top of the page. A link to the administration page is displayed at the bottom (it may be advisable to remove this link...).

Browse
All selected data collections are available for browsing on this page, using standard navigation elements. Compound structures are displayed graphically (using bitmap or SVG pictures, if available), together with compound or reaction names and the ID number (mol_id or rxn_id). The latter represents a hyperlink to a detailed view of this particular entry (provided by the script "details.php"). Please note that inline SVG requires a fairly recent web browser (e.g. any current Firefox, Chrome, Safari or [if you really want to use an MS product] Internet Explorer 9 or higher).

Text search
There is a very simple text search tool which by default only searches the "mol_name" (and/or "rxn_name") field. Search terms are not case-sensitive, as a wildcard character you can use the "%" or "*" symbol. By selecting the option "include other searchable fields", you can extend the search to all text fields which have been defined as "searchable" by the administrator (see below). Starting with MolDB5R.03, an advanced text search option is available which allows entering search terms for each individual searchable data field (these terms can be combined with logical AND or OR operators). By nature, the advanced text search feature can be used only when a single data collection is selected by the user.

Functional group search
This search option makes use of checkmol's capabilities to recognize approx. 200 different functional groups. Categorisation of compounds is performed already when they are entered into the database. The properties are stored in binary format in a special table (molfgb), and the search operation is extremely fast. Multiple selection of functional groups from the list box can be made by pressing the <Ctrl> key while clicking at the desired item.

As soon as a reaction data collection is selected, there are extended search options available: the user can define whether the selected functional group(s) should be present in any reactant or in any product or should be created or lost during the reaction.

Structure search
This is the central search facility, permitting exact search, substructure search, and similarity search. In the browser, Java and Javascript must be enabled in order to display the JME applet. The input structure must contain at least three atoms. Checkboxes are available for more specific search modes (strict atom/bond type comparison, geometry checks) if a search results in too many hits. Structure search works also for reaction data collections. As an alternative to drawing the query structure, it can be entered as text in a pop-up window (in MDL molfile format).

Reaction structure search
As a new feature of MolDB5R, reaction data collections can be searched for structures or substructures (no similarity search option). User input is drawn in the JME applet and can be either one structure (as reactant or as product) or multiple structures, e.g., one reactant and one product (use the "new" button in JME to start a new molecule). There is also basic support for atom mapping (provided the corresponding mappings are present in the data collection).
Please note that reaction structure search is considerably slower than "normal" structure search.

Administration of MolDB5R

Data collection handling
Apart from data import from SD or RD files, new entries can be added via the administration frontend in the "admin" subdirectory. The main page ("admin/index.php") is used for adding new data collections and editing/erasing  existing ones. Besides the (recommended) password protection of this directory, the "trustedIP" setting restricts access to these functioanlities (for a more detailed description of privileges, see below).
On this page, only one data collection can be selected at a time. All operations (edit data collection properties, edit data field definitions, erase selected data collection) will apply to this selected data collection. Whereas "edit data collection properties" and "erase selected data collection" are self-explaining, the option "edit data field definitions" requires some more explanation: here, you can add new fields to the records of the selected data collection, determine their (MySQL) type, the name which should display on the "details" output, and (optionally) a field name for SDF/RDF export. A newly created data collection has only two data fields: "mol_id" (or "rxn_id", respectively; an integer number) and "mol_name" (or "rxn_name") which cannot be changed. A data collection which has been imported from an SD or RD file contains (probably) some more fields. Whenever you add a new field, you can choose its type from the drop-down listbox or enter any valid MySQL column type by selecting "other..." from the list and entering the correct type in the input box. Caution: there is no syntax checking done by MolDB5R, wrong input will result in MySQL errors. If you need some special MySQL column types and options, you should use some more sophisticated administration software like phpMyAdmin.
Note: changing the type of a data collection (from SD to RD or from RD to SD) is only possible if the data collection is still empty.

Structure/data input
Using "Add/edit data records", you can enter new structures/reactions+data or edit existing ones. This is done by the script "editdata.php", again all actions apply to the selected data collection.

Structure/data export
Using "Export data records", you can download structures/reactions as well as their associated data (selectable for each field) in SDF or RDF format, respectively. A list of entry numbers or number ranges can be specified, otherwise all records of this data collection are exported.

Add new entry
Structures can be entered either by drawing, using the JME applet, or by pasting text in MDL molfile format into the text area. After submission, the structure is displayed and can again be edited if desired. For structure data collections, a duplicate check is performed by default (which can be disabled). After confirmation, the structure is added to the data collection and all relevant entries (molstat, functional group patterns, fingerprints) are made in the appropriate tables. The data entry form should be completed (at least, the name of the molecule should be entered) and saved. Several option can be selected from the "format" list for display of textual data:
In the user frontend (browse, search, details), if SVG images are used and your browser has SVG support, any newly entered structure or reaction will be immediately displayed in full quality. Without SVG support, newly added structures are always displayed using the JME applet in "depict" mode, until the bitmap graphics have been created by running the "updatebitmap.pl" admin script. This should be done automatically by a cron job.
Input of reactions is done analogously as described above, either by drawing in JME or by pasting into the text box in MDL rxnfile format. Atom maps should be created for as many atom pairs as possible.

Edit/erase entry
"Edit entry no." and "erase entry no." require a valid ID number (mol_id/rxn_id) as input. If you do not know this number, you can find it out by doing a search or just browse the data collection. Whenever an entry is erased, it is removed from all MySQL tables, but the original bitmap file (if it exists) will not be deleted. Instead, the "updatebitmap.pl" script will rename it into something like "00123456.png.to_be_deleted" on its next run. You can then use some OS shell command in order to actually delete these files (or modify the updatebitmap.pl script to do a real deletion).
When editing a reaction, an option is presented to "extract molecules". This allows to copy any reactant or product as a single structure to another (structure) data collection.

Copy entry
The specified entry number can be either duplicated within the same data collection (e.g., to create an entry for a closely related structure/reaction by subsequent minor modification) together with all textual/numeric data or it can be copied to a different data collection (in this case, only as a structure/reaction without any textual/numeric data, as the corresponding data fields may differ from collection to collection).

License

Copyright © 2010-2013 Norbert Haider, University of Vienna

This program is free software: You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 3 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details:  http://www.gnu.org/licenses/.
 

Appendix A

Permissions and privileges

MolDB5R does not implement its own user administration, but relies on two access conditions:
  1. The web server's standard mechanism to protect a directory. This is usually configured either in a .htaccess file in the directory in question or in the central configuration file (typically /etc/apache2/httpd.conf), using the <Directory> keyword. Usually, a combination of valid client domain and password protection (htpasswd) is specified there. Caution: .htaccess files are effective only if enabled in the httpd.conf file! Please refer to the Apache documentation. It is highly recommended to restrict access to the "admin" subdirectory by this mechanism.
     
  2. The IP address of the client machine. Up to 10 "trusted IP" addresses can be specified in the moldb5conf.php file: these addresses have all possible privileges within MolDB5R (provided the user meets the access condition defined in .htaccess, see above). These users have the status of an "administrator". In addition, "sub-administrators" can be specified on a per-data-collection basis, listing their client IP address in the moldb_meta table, field "trustedIP" (again, up to 10 IP addresses).
Who can do what?

Only the administrator(s) can
The administrator(s) and the sub-administrator(s) of a data collection can
Any user who can access the "admin" subdirectory can


Appendix B

Parameter settings in the moldb5.conf / moldb5conf.php files

The configuration file moldb5.conf in the working directory and the PHP script moldb5conf.php in the MolDB5R web directory are almost identical: the only difference are the standard PHP tags "<?php " and "?>" in the first and last line of moldb5conf.php.

$database      = "moldb5r";     # name of the MySQL database
$hostname      = "localhost";   # hostname of MySQL server, probably "localhost"
$clientname    = "localhost";   # name of MySQL client, usually "localhost"
$mysql_admin   = "root";        # MySQL administrator, usually "root"
$rw_user       = "mdb5-admin";  # proxy user with CREATE and INSERT privileges
$rw_password   = "top-secret";  # (make sure this script is unreadable to others)
$ro_user       = "mdb5-user";   # proxy user with SELECT privilege
$ro_password   = "secret";      # (better avoid dollar signs etc.)
$drop_db       = "y";           # erase entire database before re-creating it?


In this section, MySQL-specific settings have to be made: a databse name has to be specified. If $drop_db is set to "y", this database will be erased and recreated every time you run the initdb.pl script. The machine on which the MySQL server is running is specified by $hostname. Usually, the MySQL server and the web server are running on the same machine, in this case $hostname is set to "localhost" and $clientname is also set to "localhost". The name of the MySQL administrator user must be specified, ususally this is "root" (not to be confused with the system "root" user under Linux/Unix systems). The two proxy user accounts in MySQL will be automatically created when initdb.pl is run.

$tweakmolfiles = "y";           # "y" or "n" ("y" gives better performance)


If this parameter is set to "y", the imported molfiles will be replaced by (slightly) modified molfiles which contain some extra information about aromaticity. These "tweaked" molfiles are generated by checkmol and enable matchmol to do atom-by-atom comparisons (in a substructure search) more efficiently (see Appendix C). Tweaked molfiles are completely compatible with other programs which can import MDL molfiles.

$prefix        = "";            # this allows to have different MolDB5R instances
                                # in one MySQL database; each MolDB5R instance can
                                # contain multiple data collections

As explained in the comment, you can use this extra prefix to keep more than one MolDB5R installation in a single MySQL database or you can place MolDB5R together with other database applications (such as bulletin boards, wikis, blogs, etc.) in a single MySQL database. This makes sense only in environments where you are restricted to use only one MySQL database.

$sitename      = "MolDB5R demo"; # appears in title and headline
$charset       = "latin1";      # may be "latin1", "latin2" or "utf8"
$CHECKMOL      = "/usr/local/bin/checkmol"; 
$MATCHMOL      = "/usr/local/bin/matchmol";
$use_cmmmsrv   = "y";           # set to "y" if a cmmmsrv daemon is available
$cmmmsrv_addr  = "127.0.0.1";   # must be numeric
$cmmmsrv_port  = 55624;         # the default cmmmsrv port is 55624

The $sitename string will appear in the header of all MolDB5R pages. The $charset variable controls both HTML and MySQL handling of character sets. Currently, only latin1, latin2 and utf8 are supported (default: latin1). If other character sets are desired, please edit the function set_charset() in the functions.php script.
In $CHECKMOL and $MATCHMOL, the executables of checkmol and matchmol are specified. Under Linux, the programs are usually installed in the /usr/local/bin directory (most conveniently, matchmol is just a hard link to checkmol), under Windows it can be installed, for example, in C:\bin (but do not include the drive letter in these values). Setting $CHECKMOL to "checkmol" would be also OK if the program's directory is included in the system's search path (beware: this must apply also to the search path of the web server process).
Whereas all the backend Perl scripts rely on checkmol/matchmol, the PHP portion of the package can also use the cmmmsrv daemon instead, if installed on your system. This server process provides the same functionality as the command-line programs (checkmol/matchmol), but communicates with the web server via TCP sockets instead of shell calls. This gives somewhat better performance (see Appendix C). In principle, cmmmsrv can run on a different machine than the web server, in this case $cmmmsrv_addr must be set to the (numeric) IP address of this machine.


# the following variables control the 2D structure display

$enable_svg     = "y";   # first choice
$enable_bitmaps = "y";   # second choice
$enable_jme     = "y";   # structure editor; fallback for 2D display

The use of both SVG and bitmap graphics is recommended. For SVG, mol2ps/mol2svg version 0.4 or higher is necessary and your web browser must be fairly recent.

The following options are new, starting from MolDB5R version 5.20 and are required for generating graphics in SVG format

$MOL2SVG         = "/usr/local/bin/mol2svg";
$mol2svgopt     = "--rotate=auto3Donly --hydrogenonmethyl=off --color=/usr/local/etc/color.conf"; # options for mol2svg, e.g. "--showmolname=on"
#$mol2svgopt     = "--rotate=auto3Donly --hydrogenonmethyl=off"; # options for mol2svg, e.g. "--showmolname=on"
$mol2svgopt_rxn = "-R --rotate=auto3Donly --hydrogenonmethyl=off"; # options for mol2svg in reaction mode
$svg_scalingfactor = 1.0;           # 1.0  gives good results
$svg_scalingfactor_rxn = 0.75;      # 0.75 is a good compromise for reactions

The following section is needed for creating bitmap graphics with mol2ps and GhostScript:

$MOL2PS        = "/usr/local/bin/mol2ps";
$GHOSTSCRIPT   = "/usr/bin/gs";
#$bitmapdir = "";
$bitmapdir     = "/var/www/htdocs/moldb5r/bitmaps";  # this is the base directory
$bitmapURLdir  = "/moldb5r/bitmaps";
$digits        = 8;      # filenames will be 00000001.png, etc.
$subdirdigits  = 4;      # uses the first x digits of $digits (0 = no subdirectories)
#$mol2psopt     = "--rotate=auto3Donly --hydrogenonmethyl=off --color=/usr/local/etc/color.conf"; # options for mol2ps
$mol2psopt     = "--rotate=auto3Donly --hydrogenonmethyl=off"; # options for mol2ps, e.g. "--showmolname=on"
$mol2psopt_rxn = "-R --rotate=auto3Donly --hydrogenonmethyl=off"; # options for mol2ps in reaction mode
$scalingfactor = 0.22;          # 0.22 gives good results
$scalingfactor_rxn = 0.18;      # 0.18 is a good compromise for reactions


These options are relevant only if you want to use bitmap graphics for 2D depiction of your molecular structures, otherwise set $bitmapdir to an empty string ($bitmapdir = ""). $MOL2PS points to the mol2ps program, $GHOSTSCRIPT to the Ghostscript program (under Windows, this must be the console application "gswin32c.exe"). The directory specified in $bitmapdir must be writable to you, but not necessarily to the web server. Whenever new structures are entered via the web frontend, bitmap files are not immediately created, but queued for creation by an auxiliary script "updatebitmap.pl" which should be run in regular intervals by the user who has write permission to $bitmapdir. Most conveniently, this is done by a cron job, e.g. once a day.
Within the bitmap directory, subdirectories are created automatically by updatebitmap.pl (as well as sdf2moldb.pl) for each data collection, using the number ("db_id") of that data collection as the directory name. For larger data collection (>10000 structures), it is advisable to further divide the directory hierarchy by specifying a value of (e.g.) 4 as $subdirdigits and (e.g.) 8 as $digits for the PNG file names. This would give paths to the actual graphics files like /var/www/htdocs/moldb5r/bitmaps/1/0000/00000045.png (in an URL, this would be /moldb5r/bitmaps/1/0000/00000045.png). Thus, the maximum number of PNG files in one directory can be limited, otherwise you might run into problems with some shell commands.
For all available options of mol2ps, please refer to the mol2ps/mol2svg homepage or just enter "mol2ps" on a command prompt. Please note that for SVG images, mol2ps/mol2svg v0.4 or higher is required.


# further settings=============================================================

$multiselect   = "y";           # allow selection of multiple data collections
$default_db    = "1";           # default db_id
$trustedIP     = "127.0.0.1, 192.168.0.10";   # max. 10 IP addresses
$fpdict_mode   = 1;      # 1 = auto adjust, 2 = force 64 bit, 3 = force 32 bit
$scratchdir    = "/data/moldb/moldb-scratch";  # needed by cp2mem.pl,
                                               # must be writeable by mysql UID, too
$enablereactions  = "y";    # if not "y": no reaction support

$enable_download = "y";     # download option for hit structures/reactions
$download_limit  = 100;     # maximum number of hit structures/reactions to be downloaded (per search)

$java_codebase = "/classes"; # either leave empty or specify URL of the directory containing JME.jar


In this section, you can specify whether or not users can select only one data collection for searching/browsing or all available data collections. If there are more than one data collections, $default_db specifies the one which is automatically selected whenever a user visits your MolDB5R website (this variable can contain a comma-separated list of more than one collections, written in quotes such as "1,2,4").
In $trustedIP, you can specify up to 10 (numeric) IP addresses for extended administrative privileges. Besides protecting the "admin" subdirectory with a password (using the standard mechanisms of the Apache web server, e.g. .htaccess files), the $trustedIP directive gives extra protection. Only from a client machine listed in $trustedIP, you can change the settings of a data collection (e.g. from read-only to read/write) or erase an entire data collection. If you want to assign special administrative priveleges to just one data collection to somebody (e.g. for write access to a read-only data collection), use the analogous "trusted IP address" specification for this particular collection in the web administration frontend.

The parameter $fpdictmode is rather technical and determines the format in which dictionary-based fingerprints are stored in the molcfp table (either as unsigned 32-bit integers or as 64-bit integers). In the first case, a MySQL column will store information about presence/absence of 31 structural elements from the dictionary, in the latter case, up the 62 bits will be used for this purpose. A setting of "1" is recommended. Whenever the fragment dictionary is changed or extended (by editing the file fp01.sdf and re-loading it with initfpdef.pl), the molcfp tables have to be re-created by invoking the scripts mkmolcfp.pl and mkrxncfp.pl..

The $scratchdir variable holds the name of a directory for temporary files which are used by "cp2mem.pl" when it copies the content of disk-based tables (molstat and molcfp) to memory-based tables (see Appendix C). It is very important that not only the user who runs "cp2mem.pl" has write access to this directory, but also the MySQL server process (which usually runs as user "mysql").

The $enablereactions variable controls whether or not MolDB5R presents its support for reaction databases to the user. If set to "n", the "Reaction Search" menu item will not be displayed and only SD data collections are presented for selection. Nevertheless, the administration front-end will contain all the tools for reaction handling.

The $enable_download and $download_limit variables are used to enable/disable download of hit structures/reactions after a search (all search types). Only structures in SDF format or reactions in RDF format (maximum number per search is defined by $download_limit), but none of the associated textual/numeric data fields can be downloaded. For a full export of structures/reactions + data, use the "export data records" button on the administration page.

In the $java_codebase parameter, you can specify the location (URL) of the JME applet (JME.jar). If JME.jar is in the same directory as the other MolDB5R php-scripts, just set this parameter to an empty string (""). If MolDB5R is installed in a password-protected directory of your web server (using the .htaccess mechanism), it is advisable to place JME.jar somewhere else in a publicly accessible directory in order to avoid additional authentication requests by the user's Java runtime environment. For instance, if we assume your web server's document root is "/var/www/htdocs" and MolDB5R is installed in a password-protected directory "/var/www/htdocs/moldb5r" (the URL would look like http://www.somedomain.com/moldb5r"), you could place JME.jar in "/var/www/htdocs/classes" and thus set the $java_codebase parameter to "/classes".


The last part of the configuration file contains various definitions for table names and should not be changed, unless you have a very good reason to do so. Please note that some new definitions (for reaction tables) have been added in MolDB5R which were not present in MolDB5 and previous versions.


Appendix C

Performance tuning

There are three options which should be considered when maximum performance in substructre searches is desired:



NH, 2010-06-11; last update: 2013-06-19