Awesome
<!--[![Build Status](https://travis-ci.org/Crespo-Otero-group/fromage.svg?branch=master)](https://travis-ci.org/Crespo-Otero-group/fromage) [![Docs]( https://readthedocs.org/projects/fromage/badge/?version=latest&style=plastic)](https://fromage.readthedocs.io/en/latest/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/Crespo-Otero-group/fromage.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/Crespo-Otero-group/fromage/context:python)--> <p align="left"> <img height="150" src="doc/logo.png"> </p>fromage (FRamewOrk for Molecular AGgregate Excitations) is a Python framework designed to facilitate the study of molecular aggregates in the excited state. It contains utilities for geometry manipulation going from periodic to finite models, exciton analysis and ONIOM calculations. The current version is 1.0
fromage is developed by the Crespo-Otero group. We acknowledge the EPRSC for funding this project (EP/R029385/1) and Leverhulme Trust (RPG-2019-122).
The documentation can be found here.
To cite the use of the program, please use:
Rivera, M., Dommett, M., Sidat, A., Rahim, W., Crespo‐Otero, R. fromage: A library for the study of molecular crystal excited states at the aggregate scale. J Comput Chem 2020; 1– 14. https://doi.org/10.1002/jcc.26144
And if you are using one of the ONIOM implementations:
Rivera, M., Dommett, M., Crespo-Otero, R. ONIOM(QM:QM′) Electrostatic Embedding Schemes for Photochemistry in Molecular Crystals. J. Chem. Theory Comput. 2019; 15, 4, 2504-2516 https://doi.org/10.1021/acs.jctc.8b01180
1 Installation
1.1 Easy installation using pip
- Make sure that you have the following installed:
- Python 2.7 and above (Python 3 recommended)
- swig
- Modified version of Ewald (only necessary for Ewald embedding calculations)
- Clone this repository to wherever you want to install it:
cd /path/to/dir/
git clone https://github.com/Crespo-Otero-group/fromage.git
cd fromage/
- Install
sudo pip install .
N.B. this will install numpy
, scipy
and pytest
on your system.
- Set environment variables. In your
.bashrc
, write
export FRO_GAUSS=g16
export FRO_EWALD=Ewald
If you are using different binaries for Gaussian or Ewald, change accordingly.
Voilà!
1.2 Common pitfall
When installing, you might find the error:
Python.h: No such file or directory
In order to install Python packages which contain C or C++ on Linux, you need python-dev
or python3-dev
which provides the header Python.h
.
2 Preparing the calculation
fro_prep_run.py
requires:
- A
.xyz
file of a unit cell - A
config
- A file containing the high level embedding charges
- A file containting the low level embedding charges
- A set of
*.template
files formh
,ml
andrl
(mg
too for MECI search)
And optionally:
- A
.xyz
target shell file - A self consistent calculation template file
2.1 Unit cell file
This is the kind of file which is typically produced from a periodic DFT calculation. Avoid double counted atoms outside the unit cell which might be snuck in by visualization programs such as VESTA. For this kind of problem, the fro_uc_tools.py
utility can come in useful. Just make a file containing the unit cell vectors and run fro_uc_tools.py cell.xyz vectors -d
. If you deem that geometry optimisation of your cell is unnecessary (publish this result because I am interested) you will need to convert a .cif
file into a unit cell .xyz
file. For this I recommend Open Babel which would use the syntax:
babel -i -cif cell.cif -o -xyz cell.xyz --filluc
The --filluc
keyword is crucial otherwise you will end up with an asymmetric unit of the cell.
2.2 Configuration file
The config
file is a list of keywords followed by their values which the user should input. A set of reasonable default values is coded in parse_config_file
for every keyword except for the lattice vectors which can not be assumed.
If you use the target_shell
keyword you will need to supply the program with an additional target shell .xyz
file to specify the shell that you want in your real system.
2.3 Population analysis files
These files contain the starting (and in some cases also ending) values for the charges that you intend to use in your embedding. The high level charges will eventually be used in the embedding of the mh
calculation and can be fitted to the Ewald potentially directly or self consitently. They need to be extracted from a Gaussian output file or in the special case of direct Ewald embedding a CP2K >= 4.1 file will do as well. For the low level embedding, only Gaussian is available.
It is crucial that you match the population analysis from the low level charges to the low level of theory in order to properly cancel out the electrostatic intreactions from rl
. The choice of high level charges is more subtle but consistency would have you use the same level of theory as in mh
.
2.4 Primitive template files
These files are named mh.template
, ml.template
, rl.template
and optionally mg.template
. They serve as model template files with a blank name for the checkpoint file, blank calculation name, blank atomic positions and blank point charges. Prepare calculations populates all of these fields except for the atomic positions which will later be repeatedly updated by the optimisation procedure. It is important to include the following keywords:
charge
: allows for the use of point charge embedding (not actually used inrl
)symmetry=none
: conserves the input geometry throughout the calculation, making the position of the charges correctforce
: to calculate the energy gradients necessary for the optimisation
2.5 Target shell file
In certain cases, the cluster of molecules generated radially will be impractical due to the packing of the crystal. For example it may need to include a large number of distant molecules to also include a certain nearest neighbour molecule. In those cases a shell file can be manually edited in the user's favourite chemistry visualisation software to remove extra molecules. In that case a target shell file can be supplied which will be used to generate rl
and ml
. Be extra careful that the central molecule has the correct orientation with respect to your generated cluster. It is recommended to use the shell file from a large radius calculation and manually strip it down to only the nearest neighbour molecules.
2.6 Self consistent template file
This is the Gaussian template file used if the Ewald charge background is computed self consistently. It has the same form as the primitive template files. The level of theory here can be chosen to be in excited state for a fully excited crystal. If this is your intention, don't forget to use density=current
to make sure that your population analysis is in the excited state and not the ground state. Of course ground state self-constistent calculations are also possible.
2.7 Outputs of the preparation
Once you have finished running fro_prep_run.py
, you will end up with a few files:
prep.out
which gives you information about how your preparation went. If the last line gives you an ending time, that is good newsmol.init.xyz
will be the initial position of your molecule for the optimisationshell.xyz
is the cluster of molecules without the central onemh
,ml
,rl
andmg
directories containing.temp
files corresponding to all of the parallel calculationsewald
directory where the ewald calculation is run
3 Running the calculation
3.1 Last few steps before running
To run fromage, all you need is:
- A
fromage.in
file mol.init.xyz
shell.xyz
mh
,ml
andrl
directories containing.temp
files (mg
is only needed for MECI serach)
The fromage.in
file has a similar structure to the config
file but is much simpler and is not even necessary for geometry optimisation in Gaussian. If you want to change a program used in a specific level of theory from Gaussian ot something else, simply add high_level [program]
or low_level [program]
. For MECI search, add bool_ci 1
For Turbomole RI-CC2, run a define and then write in all of the point charges from mh.temp
under the block $point_charges
after scaling.
IMPORTANT: Turbomole uses Bohr units in its control file and as such the x, y and z columns should be scaled accordingly
For Molcas RASCF, prepare an input file in the directory called molcas.input
with geom.xyz as the coordinate. To add point charges, use in &GATEWAY
:
xfield
[number of charges] Angstrom
-9.237176 -2.137048 3.557237 0.432218
-10.014996 -1.455739 0.568597 -0.168284
-10.112382 -2.173251 1.384633 0.146427
.
.
.
You should be all set now. Run fro_run.py
to begin the calculation.
3.2 Outputs
The program only has three main outputs:
fromage.out
which gives updates on all of the individual energies being calculated, the total gradient norm and the energy gapgeom_mol.xyz
which keeps a record of the optimising geometrygeom_cluster.xyz
which combinesgeom_mol.xyz
andshell.xyz
for a better view of intermolecular interactions
If the minimisation ran smoothly, the last line of fromage.out
should be the ending time.
4 Additional utilities
A couple of useful utilities are included here for manipulation of molecular crystal clusters. They may come in handy when making sure that your calculation is doing what you want.
4.1 fro_pick_mol.py
This program selects molecules out of a cluster of molecules and writes them to another file. This is particulary useful when you are dealing with a large molecular cluster and want to extract something like a dimer.
The syntax is intuitive:
fro_pick_mol.py cluster_file.xyz 1 56 22
In this case we have selected a trimer of the molecules containing the atoms labeled 1, 56 and 22 in the input .xyz file. This program will get angry if you feed it more than one atom label of one same molecule. For additional options use fro_pick_mol.py --help
4.2 fro_assign_charges.py
This is more of a debugging tool for checking to see that you are using a sensible bond length in your definition of your molecules. It reads charges and positions from a Gaussian output file for one molecule and assigns those charges to a cluster of atoms made up of those same molecules in a .xyz
-like file.
It could come in handy as a standalone utility if you want to assign charges in a forcefiled calculation.
The syntax is:
fro_assign_charges.py population_analysis.log cluster_of_molecules.xyz
As usual, use assign_charges.py --help
for more options related to bond length, charge kind etc.
4.3 fro_uc_tools.py
This utility does operations on xyz files paired up with unit cell vector files. The vector file is of the form:
8.9638004303 0.0000000000 0.0000000000
0.0000000000 10.5200004578 0.0000000000
-3.8748910079 0.0000000000 10.7924653741
Options include extracting the nonequivalent monomers from the cell, generating a tessalating cell but with all complete molecules (therefore spilling out of the bounding box of the unit cell vectors), confining a cell to the bounding box and creating supercells.
5 Some parting words
If you find yourself with a bunch of error files during your optimisation, ask yourself where some calculations might have failed in the geometry optimisation. Maybe some SCF did not converge or your central molecule escaped the cluster to be with its one true love: infinitely attractive point charges. To combat this, try adding more molecules to your cluster.
If you are happily preparing a calculation, see no error, run fro_run.py
with sensible geometries and find that your quantum chemistry program is very upset about something, it might be that the point charges you are feeding it are unreasonable. Indeed when you start fiddling with large numbers of constrained point charges in Ewald, the system of linear equations which fits them becomes highly linearly dependent and you end up with point charges with values in the thousands. If this happens to you just try a smaller number of constrained point charges while still containing your central molecule.
Hacking this program for your own personal needs is a perfectly good idea and you may be able to get a lot from just importing the I/O modules and the Atom
and Mol
classes.
If all you want to do is integrate your favourite quantum chemistry package into fromage, all you need to do is a) add new io routines in read_file
and edit_file
b) make a new Calc
object modeled after one of the existing ones in the calc
module c) Add the class and corresponding keyword to the calc_types
at the top of the calc
module
The Ewald program is often the source of all of your problems when tinkering with the embedding methods, even as a regular user pushing the program to its limits. It uses a deprecated lapack function and needs to be modified very specifically to be used with fro_prep_run.py
.
6 Contributors
The core algorithms in fromage were developed by Miguel Rivera when he was a PhD student in the Crespo-Otero group. Michael Dommett (exciton analysis) and Amir Sidat (geometry analysis) have contributed to initial versions of fromage. Currently, Federico Hernandez, Amir Sidat, and Michael Ingham are actively contributing to the code. Some new features not yet included in this version include nonadiabatic dynamic simulations, optimisation of different regions, treatment of fully connected molecular crystals (such as MOFs and COFs), and different methods to address overpolarisation. If you are interested in these new features or would like to contribute to fromage, please contact Rachel Crespo-Otero at r.crespo-otero@ucl.ac.uk.
More detailed instructions can be found in the documentation. For any questions about usage, citing or contributing, please email our group at r.crespo-otero@ucl.ac.uk
-The Crespo-Otero group