MolDB6

a free software package for creating a web-based, searchable molecular structure/reaction database

Norbert Haider, University of Vienna, 2014-2021
norbert.haider@univie.ac.at

These instructions explain how to set up MolDB6 as a web-based, fully searchable molecular structure database. Structures and data can be added via a web interface or by import from an MDL SD file (at the server command-line). This import option is shown by an example (see below), using the SD files which are freely available from the PubChem FTP site at ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF/. Analogously, reactions and data can be imported from an MDL RD file.

For some information about the original ideas, please visit
https://homepage.univie.ac.at/norbert.haider/cheminf/moldb.html

The previous version, MolDB5R, is documented at
https://homepage.univie.ac.at/norbert.haider/cheminf/moldb5rdoc.html

MolDB6 is a collection of fully functional PHP scripts for running a structure/reaction database with search options for text, functional groups, and structure/substructure. Included are also some Perl scripts for database setup and data import/export from/to an SD or RD file.

Download: https://homepage.univie.ac.at/norbert.haider/download/chemistry/moldb/moldb6.tar.gz
Some screenshots: https://homepage.univie.ac.at/norbert.haider/cheminf/moldb6screenshots.html

Content

Features
Technical background
Installation
Using MolDB6
Administration of MolDB6
License
Appendix A (permissions and privileges)
Appendix B (configuration file settings)
Appendix C (performance tuning)

Features

New features in MolDB6 as compared to MolDB5R:

Technical background

The system uses MySQL as the database engine, whereas its "chemical intelligence" is provided by the checkmol/matchmol program (or its socket-based server version, cmmmsrv). All relevant data are stored in MySQL tables, except the bitmap graphics of the molecules. Structure/substructure searches are performed in a two-stage process, consisting of an efficient pre-selection and a full atom-by-atom comparison of the remaining candidates. For pre-selection, a combination of molecular statistics with dictionary-based fingerprints and hash-based fingerprints is used. Optionally, this operation can be performed with memory-based MySQL tables instead of disk-based tables. This will make the search even faster. For structure data collections, there is also a similarity search implemented with a user-selectable ratio of structural vs. functional similarity.

Demo MolDB6 database


Installation

Hardware requirements: 

any x86 PC or Mac (the faster, the better)

Software requirements:

==> these packages are standard components of every Linux distribution (for Windows, the XAMPP package can be used, for Mac OS X, the MAMP package is a good choice)

moreover:
Note: the MolDB6 package was developed and tested on a Linux machine, but (maybe with a few adaptations) it should run also on a  FreeBSD system. Under Mac OS X (tested on 10.6 aka Snow Leopard), MolDB6 runs without any modifications, but requires some non-trivial software installation: GhostScript (see above) and the Perl DBI and DBD::mysql modules. The Windows development and (some) testing was done on a virtualized Windows XP system with the XAMPP 1.7.0 package. Instead of the Perl component of XAMPP, ActivePerl 5.6.1 was used and the missing database driver was installed via the PPM utility (ppm install DBI; ppm install DBD::MySQL). Also with some Linux distributions, it may be necessary to add the Perl driver for MySQL, e.g. in CentOS 8, you would enter the following command: dnf install perl-DBD-MySQL

Where to start?

After downloading and unpacking the MolDB6 package in your working directory, rename or copy the sample configuration file into "moldb6.conf". Load this file into a text editor and adjust the settings. A detailed overview of all parameters is given in Appendix B.

Select a name for your database, define user IDs for a proxy user with write privileges and another proxy user with read-only privileges (the example file "moldb6.conf" already contains these entries which can be accepted or modified; please change at least the passwords!).

If you want to use bitmap graphics of your molecules (recommended only if you want to support clients with outdated browsers), please specify a directory where these bitmaps should be stored and make sure you have write permission there. This should be the final location of the .png files (somewhere within your web server's document root, preferentiall as a subdirectory "bitmaps" of your MolDB6 web directory). If you don't want bitmaps, just set $bitmapdir and $bitmapURLdir to an empty string ("") and set $enable_bitmaps = "n"; in the moldb6uiconf.php file. 

Initialize the MolDB6 database by running the Perl script "initdb.pl" (enter "perl initdb.pl", be sure to be in the same directory as the script). You will be prompted for the password of the MySQL administrator (typically, this is the "root" MySQL user, not to be confused with the "root" operating system user). The script will create the appropriate MySQL database and two tables within this database (one for metadata and one for the fragment dictionary). Moreover, the two proxy user accounts are created and their passwords set as specified in the configuration file. If everything goes well, the script will produce this output:
setup of database $database OK    (where $database displays the name you have specified)
setup of fragment dictionary OK
Please note that this Perl script does not make use of the MySQL database driver module, but just calls the "mysql" command-line program. Thus, it works also on systems where the appropriate driver modules are not installed. Depending on your version of the MySQL server, it might be necessary to adjust the contents of the $oldmysql variable in the header of the initdb.pl script.



Import of structures+data from an SD file (*.sdf):

As an example, we use Compounds_00000001_00025000.sdf (after unpacking the .gz file which you can download from the PubChem FTP site).

If you have wget installed, you can retrieve the PubChem file very conveniently by entering e.g.

wget https://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF/Compound_000000001_000500000.sdf.gz

Prerequisite: the checkmol/matchmol program and the mol2ps/mol2svg program must be installed (e.g., in /usr/local/bin, for more information, see
https://homepage.univie.ac.at/norbert.haider/cheminf/cmmm.html and https://homepage.univie.ac.at/norbert.haider/cheminf/mol2ps.html).

These steps should be executed exactly in this order:
  1. Change into the util/ subdirectory and run the Perl script "sdfcheck.pl" with the name of the SD file to be imported as the command-line argument, e.g. "perl sdfcheck.pl Compound_000000001_000500000.sdf" (add a leading "../" or "../../" etc. to the SD filename if necessary). This will create a definition file "sdf2moldb.def". By default, only the first 10000 records are inspected which should be sufficient in order to collect the necessary information about the file structure. 

  2. Load the file "sdf2moldb.def" (see above) into a text editor and inspect it. Here, you can change the types of MySQL columns or their dimensions. In addition, you MUST rename the most descriptive field name (preferentially the one holding the molecule's trivial name or its systematic name) into "mol_name". If you do not do this, you will have to manually adjust all the PHP scripts in order to retrieve the correct field. In our PubChem example, change the MySQL field name (i.e., the 2nd column, between the first and the second ":") "pubchem_iupac_name" into "mol_name". Many numeric fields are assumed to be floating-point numbers (MySQL type "DOUBLE"), but in fact are integer numbers, so their type should be changed into (e.g.) "INT(11)". In our PubChem example, only fields such as "pubchem_openeye_mw" and "pubchem_cactvs_xlogp" should be kept as "DOUBLE". If you do not need one or more of the recognized SDF fields in your MySQL table, simply delete the corresponding lines from the definition file. Make sure that there is no field named "mol_id", because this is a reserved name of the MolDB6 system. There should also be no field names starting with "auto_mol_", as they are reserved labels in MolDB6.

  3. Import the content of your SD file into your MySQL database by running the Perl script "sdf2moldb.pl" with the name of the SD file to be imported as the command-line argument, e.g. with "perl sdf2moldb.pl Compound_000000001_000500000.sdf". Depending on the number of molecules in the SD file, this operation may take some time (on a standard business PC, approx. 1 h for 10000 records). It is recommended to do this operation in a background job, most conveniently in a "screen" session (screen is a utility program which is included in most Linux distributions).
    Note: you can create bitmap images of your structures either during the initial data import (set "$make_bitmaps = 1;" in the sdf2moldb.pl script) or later (set "$make_bitmaps = 0;"). In the latter case, data import will be significantly faster, bitmap files can then be created (if needed at all) by running the updatebitmap.pl script in the util/ subdirectory.

  4. If you want to add new records, repeat step 3 with another SD file. By default, all data will be appended. ATTENTION: all SD files that are intended for the same data collection must have the same format (names and types of data fields) as the first one. For example, you can sequentially import all of the SD files from the PubChem site (see above). (And don't forget to let me know your performance figures for a database of >150 million structures....)  ;-) 
     
  5. If you want to create a new data collection, repeat the steps described above, but change the "db_id=1" line in the sdf2moldb.def file into "db_id=2" (or higher, if applicable).

Import of reactions+data from an RD file (*.rdf)

Reactions with associated data can be imported in a very similar manner as described for SDF import (see above). Please note that in the present version, only "flat" (non-hierarchical) RD files can be handled.

  1. Run the Perl script "rdfcheck.pl" in the util/ directory with the name of the RD file to be imported as the command-line argument.
     
  2. Edit the resulting definition file "rdf2moldb.def". Define a field "rxn_name" and make sure that there is no field "rxn_id". Make sure that the db_id is set to a number which is still available.
     
  3. Import the RD file by running the Perl script "rdf2moldb.pl" with the RD file name as the command-line argument. Bitmap and SVG depiction works in the same way as for structure data collections.


Next, the web server part can be installed:

  1. Of course, you must know the location of your web server's document root directory. This might be something like "/var/www/htdocs" or "/xampp/htdocs", etc. (in a typical Apache installation, it is specified in the configuration file "httpd.conf" or in other "*.conf" files that are consulted on start-up). On many websites, a special subdirectory (e.g, "public_html") witin a user's home directory can also be a valid location.
     
  2. Within the document root, you should have already created a directory which will hold the MolDB6 PHP scripts, e.g. "moldb6". Usually, the "bitmaps" directory will be there as a subdirectory (e.g., "moldb6/bitmaps"). Create another subdirectory named "admin" (e.g., "moldb6/admin").
     
  3. Copy all the files from the directory "wwwfiles_php5" or "wwwfiles_php7" (depending on your PHP version) of this package to the MolDB6 web directory, copy also all of the scripts in "wwwfiles_php5/admin" or "wwwfiles_php7/admin" to the "admin" subdirectory of the web directory.
     
  4. The main configuration file "moldb6conf.php" should contain exactly the same values for all parameters as the setup configuration file "moldb6.conf". In principle, "moldb6conf.php" is nothing else than an exact copy of "moldb6.conf" with just two extra lines (one at the beginning: "<?php ", and one at the end: "?>"). For testing, this file can be placed also in the MolDB6 web directory. In a publicly accessible production environment, however, this file should be placed outside the web server's document root, somewhere in the PHP include path (check your php.ini file and/or the PHP docs). Keeping the "moldb6conf.php" file within the document root is a security risk, as it contains plain-text passwords. Make sure the web server can read this file and revoke read permission for other users. If you are using bitmap graphics, make sure that the variable $bitmapURLdir points to the correct location, e.g. "/moldb6/bitmaps" (no trailing slash!).
     
  5. Make sure you have at least one of the supported structure editors installed (e.g., JSME). The location of the editor(s) must be specified in the second configuration file, moldb6uiconf.php (an example is provided in the wwwfiles directory). This file should also contain the location of the JSmol package, if used.


Note 1: it is assumed that PHP is not running in "safe mode". If safe_mode is set to "On" in the php.ini file, you have to make sure that  checkmol/matchmol and mol2ps/mol2svg are allowed to be executed, this can be done by specifying the corresponding directory in the variable safe_mode_exec_dir in php.ini (please check your PHP documentation).

Note 2: MolDB6 supports also other character sets in addition to latin1 (which is the default). Please set the $charset variable in the configuration file(s) to either "latin1", "latin2" or "utf8".


The subdirectory "util" of the working directory contains some further Perl scripts:

The mk... scripts should be run whenever a new version of checkmol/matchmol (or cmmmsrv, respectively) is installed on your system.

The script cp2mem.pl will copy the persistent (disk-based) molstat and molcfp MySQL tables to memory-based MySQL tables, which are accessed faster than the disk-based tables. In order to make use of these tables, the moldb_meta table has to be modified (the "usemem" column must be set to 'T'), this can be done with the Perl script "setmem.pl" with an  appropriate command-line argument, e.g. "perl setmem.pl 1=T" (enabling memory-based tables for data collection 1). Do not forget to run cp2mem.pl after enabling memory-based tables and after each reboot of your machine, otherwise the memory-based tables would be empty. Generally, this option is recommended only for larger read-only data collections.

There are also utility scripts for diagnostic purposes (listdb.pl), (re-)creation of definition files from the MySQL table structure (dumpdef.pl) and data export into SD or RD files (moldb2sdf.pl, moldb2rdf.pl). All these scripts contain a short description of their functionality.

Tools for the import of very large data sets from SD files:

Because the "normal" data import from an SD file using the "sdf2moldb.pl" script can be rather time-consuming, there is now (2016 version) an alternative method available: the import task is split into (a) the creation of one or more SQL files that contain all relevant information and simple placeholders for the unique mol_id numbers, and (b) loading of these pre-assembled SQL files into the database. The first sub-task is accomplished by the script "sdf2sql.pl", it is computationally "expensive" and can be run in parallel on several independent machines in order to generate all the desired SQL files. The latter then can be copied to the server machine and they are loaded very fast by the simple script "sql2moldb.pl", this script only takes care of replacing the preliminary mol_id placeholders with the correct ID numbers in ascending order. For more information, please consult the explanations in the script code.



That's it for the installation part. You may wish to adjust the HTML layout of the index.php (preferentially the custom-inc.php) and moldb.css files and the PHP scripts, replacing "MolDB6" by something else. If you write any useful extensions, I would be grateful if you donate these scripts to the project. And a little link to the checkmol/matchmol homepage would also be appreciated.
 

Using MolDB6

Start page
On the start page (index.php), the user can select the data collection(s) from the list box and clicking the "Apply selection" button. Multiple selections (if enabled in the configuration file) can be made by pressing the <Ctrl> key while clicking at the list item. All search options are available via the navigation bar at the top of the page. A link to "Preferences" (if enabled) and to the administration page is displayed at the right-hand side of the menu (it may be advisable to remove this link...).

Browse
All selected data collections are available for browsing on this page, using standard navigation elements. Compound structures are displayed graphically (using SVG or bitmap pictures, if available), together with compound or reaction names and the ID number (mol_id or rxn_id). The latter represents a hyperlink to a detailed view of this particular entry (provided by the script "details.php"). The kind of graphics used (SVG or bitmaps) will depend on the settings in moldb6uiconf.php and on the capabilities of the client's browser.

Text search
There is a simple text search tool which by default only searches the "mol_name" (and/or "rxn_name") field. Search terms are not case-sensitive, as a wildcard character you can use the "%" or "*" symbol. By selecting the option "include other searchable fields", you can extend the search to all text fields which have been defined as "searchable" by the administrator (see below). An advanced text search option is also available which allows entering search terms for each individual searchable data field (these terms can be combined with logical AND or OR operators). By nature, the advanced text search feature can be used only when a single data collection is selected by the user.

Functional group search
This search option makes use of checkmol's capabilities to recognize approx. 200 different functional groups. Categorisation of compounds is performed already when they are entered into the database. The properties are stored in binary format in a special table (molfgb), and the search operation is extremely fast. Multiple selection of functional groups from the list box can be made by pressing the <Ctrl> key while clicking at the desired items.

As soon as a reaction data collection is selected, there are extended search options available: the user can define whether the selected functional group(s) should be present in any reactant or in any product or should be created or lost during the reaction.

Combined structure and reaction search input
By clicking on the light-grey "no structure defined" image, a pop-up window opens. It contains the structure editor (according to the settings in the "Preferences" dialogue) and lets the user draw an input structure or reaction. By clicking on "Submit to search form", this structure (or reaction) is copied to the search form and the pop-up window closes. Depending on the type of search item (structure or reaction), the search form displays slightly different options (see below).

Structure search
This is the central search facility, permitting exact search, substructure search, and similarity search. The input structure must contain at least three atoms. Checkboxes are available for more specific search modes (strict atom/bond type comparison, geometry checks) if a search results in too many hits. Structure search works also for reaction data collections. As an alternative to drawing the query structure, it can be entered as text in a pop-up window (in MDL molfile format).

Reaction structure search
Reaction data collections can be searched for structures or substructures (no similarity search option). User input is drawn in the pop-up window as explained above and can be either one structure (as reactant or as product) or multiple structures, e.g., one reactant and one product (use the "new" button in JME/JSME to start a new molecule). There is also basic support for atom mapping (provided the corresponding mappings are present in the data collection).

Administration of MolDB6

Data collection handling
Apart from data import from SD or RD files, new entries can be added via the administration frontend in the "admin" subdirectory. The main page ("admin/index.php") is used for adding new data collections and editing/erasing  existing ones. Besides the (recommended) password protection of this directory, the "trustedIP" setting restricts access to these functioanlities (for a more detailed description of privileges, see below)..
On this page, only one data collection can be selected at a time. All operations (edit data collection properties, edit data field definitions, erase selected data collection) will apply to this selected data collection. Whereas "edit data collection properties" and "erase selected data collection" are self-explaining, the option "edit data field definitions" requires some more explanation: here, you can add new fields to the records of the selected data collection, determine their (MySQL) type, the name which should display on the "details" output, and (optionally) a field name for SDF/RDF export. A newly created data collection has only two data fields: "mol_id" (or "rxn_id", respectively; an integer number) and "mol_name" (or "rxn_name") which cannot be changed. Moreover, all new structure (not reaction) data collections by default contain the fields "auto_mol_formula" and "auto_mol_fw". A data collection which has been imported from an SD or RD file contains (probably) some more fields. Whenever you add a new field, you can choose its type from the drop-down listbox or enter any valid MySQL column type by selecting "other..." from the list and entering the correct type in the input box. Caution: there is no syntax checking done by MolDB6, wrong input will result in MySQL errors. If you need some special MySQL column types and options, you should use some more sophisticated administration software like phpMyAdmin.
Note: changing the type of a data collection (from SD to RD or from RD to SD) is only possible as long as the data collection is still empty.

Structure/data input
Using "Add/edit data records", you can enter new structures/reactions+data or edit existing ones. This is done by the script "editdata.php", again all actions apply to the selected data collection.

Add new entry
Structures can be entered either by drawing, using the JME or JSME applet, or by pasting text in MDL molfile format into the text area. After submission, the structure is displayed and can again be edited if desired. After confirmation, the structure is added to the data collection and all relevant entries (molstat, functional group patterns, fingerprints) are made in the appropriate tables. The data entry form should be completed (at least, the name of the molecule should be entered) and saved. Several options can be selected from the "format" list for display of textual data:
In the user frontend (browse, search, details), newly added structures are always displayed as SVG images or (for old browsers) using the JME/JSME applet in "depict" mode, until the bitmap graphics have been created by running the "updatebitmap.pl" command-line script. This should be done automatically by a cron job. In the admin frontend (add/edit structure), SVG or JME/JSME is used for structure display.
Input of reactions is done analogously as described above, either by drawing in JME/JSME or by pasting into the text box in MDL rxnfile format. Atom maps should be created for as many atom pairs as possible.

Edit/erase entry
"Edit entry no." and "erase entry no." require a valid ID number (mol_id/rxn_id) as input. If you do not know this number, you can find it out by doing a search or just browse the data collection. Whenever an entry is erased, it is removed from all MySQL tables, but the original bitmap file (if it exists) will not be deleted. Instead, the "updatebitmap.pl" script will rename it into something like "00123456.png.to_be_deleted" on its next run. You can then use some OS shell command in order to actually delete these files (or modify the updatebitmap.pl script to do a real deletion).
When editing a reaction, an option is presented to "extract molecules". This allows to copy any reactant or product as a single structure to another (structure) data collection.

Copy entry
The specified entry number can be either duplicated within the same data collection (e.g., to create an entry for a closely related structure/reaction by subsequent minor modification) together with all textual/numeric data or it can be copied to a different data collection (in this case, as a structure/reaction together with only those textual/numeric data that have an exactly matching column in the target data collection).

License

Copyright © 2014-2021 Norbert Haider, University of Vienna

This program is free software: You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 3 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details:  http://www.gnu.org/licenses/.
 

Appendix A

Permissions and privileges

MolDB6 does not implement its own user administration, but relies on two access conditions:
  1. The web server's standard mechanism to protect a directory. This is usually configured either in a .htaccess file in the directory in question or in the central configuration file (typically /etc/apache2/httpd.conf), using the <Directory> keyword. Usually, a combination of valid client domain and password protection (htpasswd) is specified there. Caution: .htaccess files are effective only if enabled in the httpd.conf file! Please refer to the Apache documentation. It is highly recommended to restrict access to the "admin" subdirectory by this mechanism.
     
  2. The IP address of the client machine. Up to 10 "trusted IP" addresses can be specified in the moldb6uiconf.php file: these addresses have all possible privileges within MolDB6 (provided the user meets the access condition defined in .htaccess, see above). These users have the status of an "administrator". In addition, "sub-administrators" can be specified on a per-data-collection basis, listing their client IP address in the moldb_meta table, field "trustedIP" (again, up to 10 IP addresses).
Who can do what?

Only the administrator(s) can
The administrator(s) and the sub-administrator(s) of a data collection can
Any user who can access the "admin" subdirectory can


Appendix B

Parameter settings in the configuration files

Please note that the settings for MolDB6 are kept in different files. All those settings that are relevant both for the Perl command-line scripts (for initialisasation and various administrative tasks) and for the PHP scripts of the web-based user interface are stored in two almost identical files, named moldb6.conf (in the working directory) and moldb6conf.php (in the MolDB6 web directory). All variables in these two files should have the same content; the only difference is that in moldb6conf.php there must be the "<?php" tag in the first line and the "?>" tag in the last line.

There are additional settings that are relevant only to the PHP scripts of the web frontend, they are kept in the file moldb6uiconf.php in the MolDB6 web directory.

Variables to be set in moldb6.conf / moldb6conf.php:

$database       = "moldb6";        # name of the MySQL database
$hostname       = "localhost";     # hostname of MySQL server, probably "localhost"
$clientname     = "localhost";     # name of MySQL client, usually "localhost"
$mysql_admin    = "root";          # MySQL administrator, usually "root"
$rw_user        = "mdb6-admin";    # proxy user with CREATE and INSERT privileges
$rw_password    = "ExAmPlE123";    # (make sure this script is unreadable to others)
$ro_user        = "mdb6-user";     # proxy user with SELECT privilege
$ro_password    = "eXaMpLe456"     # (better avoid dollar signs etc.)
$drop_db        = "y";             # erase entire database before re-creating it?

In this section, MySQL-specific settings have to be made: a databse name has to be specified. If $drop_db is set to "y", this database will be erased and recreated every time you run the initdb.pl script. The machine on which the MySQL server is running is specified by $hostname. Usually, the MySQL server and the web server are running on the same machine, in this case $hostname is set to "localhost" and $clientname is also set to "localhost". The name of the MySQL administrator user must be specified, ususally this is "root" (not to be confused with the system "root" user under Linux/Unix systems). The two proxy user accounts in MySQL will be automatically created when initdb.pl is run.

$prefix        = "";            # this allows to have different MolDB6 instances
                                # in one MySQL database; each MolDB6 instance can
                                # contain multiple data collections

As explained in the comment, you can use this extra prefix to keep more than one MolDB6 installation in a single MySQL database or you can place MolDB6 together with other database applications (such as bulletin boards, wikis, blogs, etc.) in a single MySQL database. This makes sense only in environments where you are restricted to use only one MySQL database.

$charset       = "utf8";         # may be "latin1", "latin2" or "utf8"
$CHECKMOL      = "/usr/local/bin/checkmol"; 
$MATCHMOL      = "/usr/local/bin/matchmol";

The $charset variable controls both HTML and MySQL handling of character sets. Currently, only latin1, latin2 and utf8 are supported. If other character sets are desired, please edit the function set_charset() in the functions.php script.
In $CHECKMOL and $MATCHMOL, the executables of checkmol and matchmol are specified. Under Linux, the programs are usually installed in the /usr/local/bin directory (most conveniently, matchmol is just a hard link to checkmol), under Windows it can be installed, for example, in C:\bin (but do not include the drive letter in these values). Setting $CHECKMOL to "checkmol" would be also OK if the program's directory is included in the system's search path (beware: this must apply also to the search path of the web server process).
On Unix-like machines (Linux, Mac OS X), the cmmmsrv server program can be used by the PHP scripts instead of the checkmol/matchmol command-line program. For details, see below in the section referring to moldb6uiconf.php.


$MOL2SVG        = "/usr/local/bin/mol2svg";
$mol2svgopt     = "--rotate=auto3Donly --hydrogenonmethyl=off --color=/usr/local/etc/color.conf"; # options for mol2svg, e.g."--showmolname=on"
#$mol2svgopt     = "--rotate=auto3Donly --hydrogenonmethyl=off"; # options for mol2svg, e.g. "--showmolname=on"
$mol2svgopt_rxn = "-R --rotate=auto3Donly --hydrogenonmethyl=off"; # options for mol2svg in reaction mode
$svg_scalingfactor = 1.0;         # 1.0  gives good results
$svg_scalingfactor_rxn = 0.80;    # 0.75 is a good compromise for reactions

These settings refer to the mol2ps/mol2svg program which is used to generate high-quality vector images for 2D depiction of the molecular structures. For a detailed description of possible parameters, please see the mol2ps/mol2svg homepage at https://homepage.univie.ac.at/norbert.haider/cheminf/mol2ps.html

$MOL2PS        = "/usr/local/bin/mol2ps";
$GHOSTSCRIPT   = "/usr/bin/gs";
#$bitmapdir     = "";
$bitmapdir     = "/var/www/htdocs/moldb6/bitmaps";  # this is the base directory
$bitmapURLdir  = "/moldb6/bitmaps";
#$mol2psopt     = "--rotate=auto3Donly --hydrogenonmethyl=off --color=/usr/local/etc/color.conf"; # options for mol2ps
$mol2psopt     = "--rotate=auto3Donly --hydrogenonmethyl=off"; # options for mol2ps, e.g. "--showmolname=on"
$scalingfactor = 0.22;          # 0.22 gives good results

These options are relevant only if you want to use bitmap graphics for 2D depiction of your molecular structures, otherwise set $bitmapdir to an empty string ($bitmapdir = ""). $MOL2PS points to the mol2ps program, $GHOSTSCRIPT to the GhostScript program (under Windows, this must be the console application "gswin32c.exe"). The directory specified in $bitmapdir must be writable to you, but not necessarily to the web server. Whenever new structures are entered via the web frontend, bitmap files are not immediately created, but queued for creation by an auxiliary script "updatebitmap.pl" which should be run in regular intervals by the user who has write permission to $bitmapdir. Most conveniently, this is done by a cron job, e.g. once a day.
Within the bitmap directory, subdirectories are created automatically by updatebitmap.pl (as well as sdf2moldb.pl) for each data collection, using the number ("db_id") of that data collection as the directory name. For larger data collection (>10000 structures), it is advisable to further divide the directory hierarchy by specifying a value of (e.g.) 4 as $subdirdigits and (e.g.) 8 as $digits for the PNG file names. This would give paths to the actual graphics files like /var/www/htdocs/moldb6/bitmaps/1/0000/00000045.png (in an URL, this would be /moldb6/bitmaps/1/0000/00000045.png). Thus, the maximum number of PNG files in one directory can be limited, otherwise you might run into problems with some shell commands. In previous versions of MolDB, the digit numbers were also set in the configuration file, but in the current version they are set per data collection in the metadata table via the web admin frontend.
For all available options of mol2ps, please refer to the mol2ps homepage or just enter "mol2ps" on a command prompt.

$enable_inchi  = "y";   # "y" or "n"
$INCHI         = "/usr/local/bin/inchi-1";  # available from the IUPAC website
$INCHI_OPT     = "-STDIO -Key";  # Linux: " -STDIO -Key", Windows: "/STDIO /Key /AuxNone"

These options are relevant only if you want to use the auto_mol_inchikey feature. For this purpose, you will need the inchi-1 program which is available for download from the IUPAC website.



# further settings=============================================================

$fpdict_mode   = 1;      # 1 = auto adjust, 2 = force 64 bit, 3 = force 32 bit
$scratchdir    = "/data/moldb/moldb-scratch";  # needed by cp2mem.pl,
                                               # must be writeable by mysql UID, too

The $scratchdir variable holds the name of a directory for temporary files which are used by "cp2mem.pl" when it copies the content of disk-based tables (molstat and molcfp) to memory-based tables (see Appendix C). It is very important that not only the user who runs "cp2mem.pl" has write access to this directory, but also the MySQL server process (which usually runs as user "mysql").

The last part of the configuration file contains various definitions for table names and should not be changed, unless you have a very good reason to do so.

Variables to be set in moldb6uiconf.php:

$sitename      = "MolDB6 demo"; # appears in title and headline
$cssfilename   = "moldb.css";   # the default CSS (cascading style sheets) file

The $sitename string will appear in the header of all MolDB6 pages. The $cssfilebname specifies the name of the stylesheet file which should be present in the MolDB6 web directory (default: "moldb.css"). If you want to experiment with your own stylesheets, you can use this setting to quickly change the CSS file.

$use_cmmmsrv   = "y";           # set to "y" if a cmmmsrv daemon is available
$cmmmsrv_addr  = "127.0.0.1";   # must be numeric
$cmmmsrv_port  = 55624;         # the default cmmmsrv port is 55624


Whereas all the backend Perl scripts rely on checkmol/matchmol, the PHP portion of the package can also use the cmmmsrv daemon instead, if installed on your system. This server process provides the same functionality as the command-line programs (checkmol/matchmol), but communicates with the web server via TCP sockets instead of shell calls. This gives somewhat better performance (see Appendix C). In principle, cmmmsrv can run on a different machine than the web server, in this case $cmmmsrv_addr must be set to the (numeric) IP address of this machine.

$multiselect   = "y";           # allow selection of multiple data collections
$default_db    = "1";           # default db_id  (if more than one, use quotes and commas, e.g. "1,3")
$trustedIP     = "127.0.0.1, 192.168.0.10,::1";  # max. 10 IP addresses
$enablereactions = "y";         # if not "y": no reaction support

In this section, you can specify whether or not users can select only one data collection for searching/browsing or all available data collections. If there are more than one data collections, $default_db specifies the one(s) which is/are automatically selected whenever a user visits your MolDB6 website.

In $trustedIP, you can specify up to 10 (numeric) IP addresses for extended administrative privileges. Besides protecting the "admin" subdirectory with a password (using the standard mechanisms of the Apache web server, e.g. .htaccess files), the $trustedIP directive gives extra protection. Only from a client machine listed in $trustedIP, you can change the settings of a data collection (e.g. from read-only to read/write) or erase an entire data collection. If you want to assign special administrative priveleges to just one data collection to somebody (e.g. for write access to a read-only data collection), use the analogous "trusted IP address" specification in the web administration frontend.

The $enablereactions variable controls whether or not MolDB6 presents its support for reaction databases to the user. If set to "n", the "Reaction Search" menu item will not be displayed and only SD data collections are presented for selection. Nevertheless, the administration front-end will contain all the tools for reaction handling.

$enable_download = "y";         # download option for hit structures/reactions
$download_limit  = 100;         # maximum number of hit structures/reactions to be downloaded (per search)

These two variables control if a user can download hitlists from text/functional group/structure searches in SD/RD file format and how many entries may be downloaded per SD/RD file.

$enable_adminlink = "y";        # "y" or "n"; show/hide "Administration" menu item

If set to "y", the menu contains an "Administration" link, pointing to the admin/index.php script.

$enable_prefs  = "y";           # "y" or "n"; enable choice of structure editor (JME, JSME, Ketcher, FlaME)
$default_editor = "jsme";       # may be "jme", "jsme", "ketcher" or "flame" (theoretically, also "text")

If $enable_prefs is set to "y", the user may choose his/her preferred structure editor or text input (if enabled). This choice (done in a pop-up window) makes sense only if there is more than one editor available. A default editor should be specified, in case the user does not select anything or has coolies disabled.

$enable_svg       = "y";        # first choice
$enable_bitmaps   = "y";        # second choice
$enable_jme       = "y";        # structure editor; fallback for 2D display
$enable_jsme      = "y";        # structure editor; fallback for 2D display (preferred)
$enable_ketcher   = "y";        # structure editor for input only
$enable_flame     = "y";        # structure editor for input only
$enable_textinput = "y";        # allow users to enter MDL molfiles/rxnfiles as text


These variables control the 2D structure display. The preferred option for 2D display is SVG which is supported on all current browsers. MolDB6 tries to figure out automatically the clients browser version and decides if it uses SVG for structure images. For older browsers, it is handy to have bitmap images instead. For this purpose, PNG files are created as explained above. As a further fallback, the JME applet or its JSME counterpart can be used to display molfiles/rxnfiles in "depict" mode. This option, however may be problematic when long hitlists are displayed, because too many instances of the applets are used which consumes a lot of client resources.
GGA Ketcher and FlaME can be used only for structure/reaction input, but not for depiction in hitlists. The $enable_textinput variable controls if users are allowed to enter queries in MDL molfile/rxnfile format in a text input box.

$java_codebase = "/classes";    # either leave empty or specify URL of the directory containing JME.jar
$jsme_path     = "../jsme";     # absolute or relative path to the JSME directory
$ketcher_path  = "../ketcher";  # absolute or relative path to the GGA Ketcher directory
$flame_swf     = "../flame/flamer.swf";  # The FlaME swf file with absolute or relative path

For all installed (and enabled) editors, their location must be specified. JME is an exception if the JME.jar file is installed in the MolDB6 directory: in this case, the $java_codebase variable should be set to an empty string.

If JSmol/JSV is used for structure or spectra display (using the two new HTML display formats introduced in the 2018 release of MolDB6), the location of the JSmol package must be specified as well as the dimensions of the pop-up window that shows the structure/spectrum.

$jsmoldir         = "../jsmol";
$jspopup_width    = 1024;
$jspopup_height   = 700;



Appendix C

Performance tuning

There are three options which should be considered when maximum performance in substructre searches is desired:



NH, 2014-07-30; last update: 2021-02-28