cmmmsrv (Linux version) A socket-based server program providing checkmol/matchmol functionality Norbert Haider, University of Vienna, 2007-2018 norbert.haider@univie.ac.at For a detailed description of checkmol/matchmol, see http://merian.pch.univie.ac.at/~nhaider/cheminf/cmmm.html This program uses example code (mtserv) written by Sebastian Koppehel (basti@bastisoft.de) and some code of the General Purpose Hash Function Algorithms Library written by Arash Partow (http://www.partow.net). Compile with fpc (Free Pascal, see http://www.freepascal.org), but DO NOT use the -Sd or -S2 option (Delphi mode) ATTENTION: the socket stuff in Free Pascal has undergone various changes during the evolution of this compiler. The present cmmmsrv.pas source code was tested only with fpc 1.0.11, fpc 2.4.0 and fpc 2.6.2 (with fpc 2.2.0, the program compiled and could be started, but produced access violations on input). The MacOS X version was tested only with fpc 2.4.0 under Snow Leoprad (MacOS X 10.6), without any compiler optimisations. example for compilation (with optimization) and installation: fpc cmmmsrv.pas -O3 -Op3 Start the server with the command "cmmmsrv", togther with options such as -l (listen on local interface 127.0.0.1 only), -p=12345 (the port to listen on), and/or -q (quiet mode), for example "./cmmmsrv -l -p=44444 -q &" The default port is 55624. Please note: cmmmsrv does not change into background automatically, so you have to start it with the "&" shell option, preferentially in a "nohup" environment; another option is to start it as a foreground process in a "screen" session. Most conviently, cmmmsrv is started by a startup script which usually resides in the /etc/init.d directory of your Linux system. Depending on your distribution, different flavors of such scripts can be used. Examples can be found in the "util" folder. ========================================================================= Purpose: cmmmsrv can be used as a backend e.g. for web-based molecular structure database applications, such as the moldb4 package (for more information, see http://merian.pch.univie.ac.at/~nhaider/cheminf/moldb4.html). Normally, this package uses a MySQL database for storage of structures and related information and invokes checkmol/matchmol (command-line version) e.g. for structure matching in a "classical" shell call. In order to avoid such repetitious (and time-consuming) shell calls, cmmmsrv can be used as a replacement. The frontend program (typically a PHP script or similar) must be able to open a socket connection to cmmmsrv, write queries to this socket and receive the answers from the same socket. The protocol is described below. On some Linux distributions, socket support in PHP is not automatically installed. In this case, the appropriate package (e.g. 'php5-sockets') has to be added. Operation: Communication with the cmmmsrv is performed entirely in plain text, so you can open a connection (for testing purposes) with a telnet client, e.g. by this command: "telnet localhost 55624" (provided that the cmmmsrv is running on your local machine and is listening on the default port). When a connection is established, the server will send a greeting message: #### checkmol/matchmol server v0.4a READY. To quit, enter "#### BYE" and it expects commands (lines starting with "####") or input structures (in MDL molfile format, each molecule terminated by "$$$$" on a separate line). Available commands are: #### checkmol:[options] Switches into checkmol mode, where [options] can be the following: e textual output of found functional groups (in English); default d textual output of found functional groups (in German) c 8-digit code output of found functional groups (see cmmm homepage) b bitstring (in decimal format) representing the presence of each group s the ASCII representation of the above bitstring, i.e. 0s and 1s) p lists the position of each functional group (atom number of key atom) x write molecular statistics (only non-zero values, with names) X write molecular statistics (all values, without descriptor names) l list all available descriptors for molecular statistics M accept metal atoms as ring members (for backward compatibility) m write MDL molfile (with special encoding for aromatic atoms/bonds) r force SSR (set of small rings) ring search mode (instead of SAR) a count charges in molecular statistics h hashed fingerprint mode with boolean output H hashed fingerprint mode with decimal output #### matchmol:[options] Switches into matchmol mode, where [options] can be the following: x exact match s strict comparison of atom and bond types (including ring check) r force SSR (set of small rings) ring search mode m write matching molecule as MDL molfile to standard output (default output: record number + ":T" for hit or ":F" for miss M accept metal atoms as ring members n additional output of atom numbers for matching atom pairs N like "n", but only for the first matching substructure found g check geometry of double bonds (E/Z) G check geometry of chiral centers (R/S) a check charges strictly i check isotopes strictly d check radicals strictly f fingerprint mode (1 haystack, multiple needles) with boolean output F fingerprint mode (1 haystack, multiple needles) with decimal output On switching to matchmol mode, the buffer holding the "needle" structure is cleared, so that the first new structure which is submitted will be regarded as the "needle" (i.e., the query structure). All following structures will be regarded as "haystack" (i.e. candidate structures). For each of these structures, the server will answer "n:F" (false, no match) or "n:T" (true, match), where n is a consecutive number (molcount), starting with 1 for the first haystack structure. #### mcreset Resets the molcount numbering to 1, but leaves the needle structure in place. #### Completes any pending operation (e.g. binary fingerprint generation) and echoes back a line containing only "####". This feature can be used for flow control, by terminating every sequence of input structures with "####" and waiting for the "####" echo to arrive at the end of regular output. #### bye Closes the connection. ========================================================================= Security considerations: cmmmsrv can be run with the lowest possible privileges, as it does not need any access to the file system. It is recommended to start cmmmsrv with the "-l" command-line option, so that it will listen only on the local interface (127.0.0.1). Usually this is sufficient if the frontend program (typically, a Perl or PHP script) is running on the same machine and opens the connection on "localhost" or "127.0.0.1". If the cmmmsrv service is to be accessed by a different machine, the "-l" option must be omitted and (most probably) a suitable firewall/packet filter rule must be applied in order to open the cmmmsrv port (default: 55624). In this case, it is recommended to restrict access to this port only to those machines which really need it, rather than to the entire internet.