The Notebook.

A bunch of possibly interesting things.

Predicting Melting Points.

One feature I have wanted to add to ChemSi is the ability to have state symbols - that is, instead of outputting

C3H8 + 5O2 -> 3CO2 + 4H2O

It would instead output

C3H8(g) + 5O2(g) -> 3CO2(g) + 4H2O(l)

In order to do this I needed some way to predict the melting and (eventually) boiling points of different compounds, given only a little information about them.

My initial solution was to simply use enthalpy and entropy changes with the Gibb's equation. If I could get the entropy change from solid to liquid, and the enthalpy change, then I could use




This solutions works - to a point. While for chemicals I have the entropy and enthalpy data for it works very nicely, I do not have the data for all states of all compounds - and it is unreasonable to get this. Additionally, this felt like a bit of a cop out. The point of ChemSi is to try and calculate (or at least, approximate) these values from easily accessible theoretical data, such as is done with the reaction prediction. If I wanted it to predict them excellently I could just provide it with some rules (ie, compound containing C, H, and O reacts with O2 to produce blah) or even worse, just give it a list of the reactions.

After some Googling, I came across the Lindemann criterion. Essentially it boils down to

T_{m} = \frac{2\pi mc^{2}a^{2}\theta^{2}_{D}k_{B}}{h^{2}}

Where the 'dodgy constants' are \theta_{D} (Debye temperature) and c. The rest are either pretty standard constants (the Boltzmann constant, for example) or pretty easy to get. The reason that the dodgy constants were a problem is that they require information on the geometry of the crystalline material - which was quite difficult to get access too.

Just a note on that crystalline material part - almost all of these prediction methods (bar the enthalpy and entropy change one) are limited to Ionic crystals, and are entirely useless for other molecules.

Following this, I searched around some more. I had the idea that perhaps the Lattice Enthalpy would have some relationship with the melting temperature - after all, surely a lattice with a more negative lattice enthalpy would be more strongly bonded so more energy would be needed to break it apart? I decided to test this hypothesis.

For a sample of ionic lattices, Na-Cs and F-I (Only using the group 1 metals and halogens may be a bit of a limitation, which led to me having to do some fiddling later on to fit the other groups. At this stage, however, I was looking for a relationship. Lithium is not included as bonds with Lithium tend to be more covalent in character.) I was able to produce the following graph. The data was taken from this website - I know it refers to Lattice Energy but from the explanation with it I believe they are referring to the Lattice Dissociation Enthalpy so it is still usable in this context.



As you can see, this produces a reasonable correlation - the pairwise correlation coefficient calculated by Octave is 0.9300.

I decided to go ahead and use this lattice enthalpy method to predict the melting points. Unfortunately, as you may have expected, I ran into the same issue as the Gibbs method - I can't really store enough lattice enthalpies! Luckily, there is an equation called the Born-Lande equation

 E = -\frac{N_{A}Mz^{+}z^{-}e^{2}}{4\pi \varepsilon_{0}r_{0}}(1-\frac{1}{n})

This does not produce exact Lattice enthalpies - it is more of a prediction. This method is used in predict_mp_alg2().

I also decided to try and see if I could modify the equation used to calculate the shell energy that is already used for reaction prediction to approximate lattice enthalpy. I won't go into detail about it here, but it is given as predict_mp_alg1() in ChemSi if you want to check it out.

Just to show these equations working, I decided to predict the melting points of three compounds using both the Lattice Enthalpies (Born-Lande) and my shell energies. The compounds I chose were NaCl, FeCl2, and CsI.

Compound Actual MP (K) Shell Energy MP (K) Born-Lande MP (K)
NaCl 1074 1088 1122
FeCl2 950 993 895
CsI 894 1084 819

As you can see, both give a reasonable (+- 50) approximation for NaCl with the shell energy giving the best prediction; this is mirrored with FeCl2 where shell energy is closest (but both are spread further than for NaCl), and finally in CsI the Born-Lande approximation is closest, with the shell energy prediction being too far out. This is likely to do with the shell energy calculation getting less accurate at higher atomic numbers.

Both algorithms have been left in ChemSi so the differences can be seen.



Recently I was reading the excellent Serengeti Rules by Sean B. Carroll, and then this happened.

A few months ago, I wrote the population simulator that I wrote about here, but this purely dealt with so called 'bottom up' regulation - the primarly limiting factor of the population was the amount of food, and an artificial carrying capacity. While reading the Serengeti Rules, it got me thinking about 'top down' regulation - ie, big fish eats little fish, so there are fewer little fish to eat seaweed and other fishy things. I felt that the artifical carrying capacity was just that - artificial - so I set about trying to write a new simulator, which attempts to create a very simple (really simple) model of an ecosystem (more a food web).

Essentially, the model has three numbers - the population of a producer, such as a plant, the population of a primary consumer, such as a rabbit, and the secondary consumer, such as a fox. Each 'cycle' each of these changes - depending on which model I use I can make the producer either increase linearly (which allows both consumers to increase rapidly), be constant (in which case I get a realistic carrying capacity), or change based on a wave pattern (which sort of simulates seasons). This causes changes in the producer. Then, using a ratio of the prey to producers and a random number generator, it is decided whether the prey either reproduce (producer -2, prey +1), stay alive (producer -1, prey +0), or die (producer +0, prey -1). A similar procedure is done on the predators (secondary consumer).

The output of this program being run under different situations is shown below.

No Pred, Const food.

Sim 1; No predators are present, and there is a constant amount of food. The prey population (green) increases and then is relatively stable.



Sim 2; Same as Sim 1, only predators are introduced. While in Sim 1 there was only bottom up regulation, in this one there is top down regulation too - the presence of the predators keeps the prey population at a lower level.


Sim 3; Limited amount of food, small population of prey. No predators. Prey consume all the food, and starve themselves to death.


Sim 4; Prey, predators and large amount of initial food. Small growth rate of food. Initial large amount of food supports a large population of prey, but eventually much food is used up so the prey population slumps. This means there is less pressure on the food so it returns to the original levels.


Sim 5; Initial high concentration of food, so prey peak. Food is regulated by seasons and so when there is lots of food available, more prey are alive so the predator population increases.

This program can be found on my github. It is probably the least taxing of all simulations I have written so far.

Inheritance/Population Simulation.

The population reaches a stable position which is just below the max capacity.

Over the past few weeks I have been considering trying to write a sort of population simulator - ie, you have a sample of say 100 individuals (very basic individuals) and you let it run. They have some genes (currently the code has two, but it could be expanded) and these genes affect how long they live, how sexually mature they become, basically how likely they are to pass on their genetics. It would therefore show a sort of selection, and because of this would also show a population maximum.

This weekend I decided to finally start writing this, and so far it is going well. It currently has a hard set population maximum (5,000, but it can be changed in the code), however I am planning on changing this so it has a set amount of food in the environment, and as individuals grow they take up more food depending on their gene code, and then when they die this food (- some which cannot be released) is given back into the environment. If the food runs out, some organisms may have an ability to survive without, but others won't - so they will die out but the survivors will have more food.

At the moment, it creates 100 individuals with two random genes - x and y - which determine their survival characteristics. Currently it is based around the following equations;

age_{max} = 2000|sin(\frac{x}{10}) \cdot cos(\frac{y}{10})|

age_{spread} = 1000|sin(\frac{y}{10}) \cdot cos(\frac{x}{10})|

This means that as the x and y values change, the maximum age and spread change - so certain configurations will be more stable than others. The probability of reproduction is modeled as a normal distribution using;

\frac{1}{spread\cdot \sqrt{2\pi}} e^{-\frac{(age - \frac{age_{max}}{2})^{2}}{2\cdot age_{spread}^2}}

This gives a probability of reproduction, so the survival is sort of down to chance. If an individual reproduces then their genes change as they are passed on (only slightly) so the most successful genes are likely to survive. At the moment (without the food construct) individuals die at their maximum age multiplied by a factor which is dependent on the number of individuals in the population;

age_{relmax} = age_{max} * (1-\frac{pop^3}{pop_{max}^3})

So, as the number of individuals increases, the age at which individuals dies gets smaller - so in the graphs we see a boom bust population change, as when it gets too big individuals die, and when it gets too low they live longer.

Hopefully this new project will continue to grow. For now, the source is on github, and an image of the program is shown below.


The population reaches a stable position which is just below the max capacity.

The population reaches a stable position which is just below the max capacity.

Changes to ChemSi

Recently I chose to rewrite ChemSi in its entirety to clean up the code base and add some new features. Now, instead of only having a text based front end which limits the usability (so you can't have cyclic reactions, etc) and simultaneously makes it harder to maintain (I had wanted to use electron energies instead of electro negativities) I chose to write it in an object orientated manner. This enables it to be used as a Python module, as opposed to a program.


import chemi

the module will be loaded, and commands can be used within it. For example;

import chemi

q = chemi.periodic_table["Na"].out(1)
print q["name"]
print q['shells']['3s0']
print q
s = chemi.Reaction(300)
q = chemi.Reaction(300)
s.reactants.append(chemi.Compound("NaBr", 0, 0, []))
s.reactants.append(chemi.Compound("NaBr", 0, 0, []))
s.reactants.append(chemi.Compound("Cl2", 0, 0, []))
print(chemi.output(s.return_reactants()) + " -> " + chemi.output(s.return_products()))
q.reactants = s.products
q.reactants[2] = chemi.Compound("F2", 0, 0, [])
print(chemi.output(q.return_reactants()) + " -> " + chemi.output(q.return_products()))

Will give the following output;

ben@Nitrate:~/Development/ChemSi$ python
{'energy': -1.5, 'number': 1}
{'molar': 23.0, 'electronegativity': 0.9, 'name': u'Sodium', 'shells': {'2p1': {'energy': -166.7, 'number': 2}, '2p0': {'energy': -166.7, 'number': 2}, '2p-1': {'energy': -166.7, 'number': 2}, '3s0': {'energy': -1.5, 'number': 1}, '1s0': {'energy': -1646.3, 'number': 2}, '2s0': {'energy': -275.5, 'number': 2}}, 'an': 11, 'small': u'Na'}
Cl2 + 2NaBr -> Br2 + 2NaCl
F2 + 2NaCl -> 2NaF + Cl2

Here I have done something which would've been impossible in a previous version of ChemSi. I have first of all used the new periodic table definition to return some information about sodium - the name, the 3s0 shell energy (kJmol^-1, approximated using the Rydberg equation) and then a general print out of it. I then go on to add some Sodium Bromide to reaction s, and some chlorine. I then get it to use my algorithm to predict the reaction products, before printing out the reaction;

Cl_2 + 2NaBr \rightarrow Br_2 + 2NaCl

Which it predicts correctly. I then move the products into reaction q, but replace the bromine with fluorine. I then predict them together, and I get;

F_2 + 2NaCl \rightarrow Cl_2 + 2NaF

These predictions are typically more accurate than the old model - for example;

FeBr_3 + Al \rightarrow AlBr_3 + Fe

In the previous ChemSi code this would've given AlBr and FeBr2 - which isn't accurate. So, in some respects it is getting better. On the other hand, because the algorithm I am using to fill the shell energies has faults with transition metals and elements with a 4s or higher orbital (it fills them going up in n,l,m ie 1s, 2s, 2p, 3s, 3p, 3d, 4s as opposed to the actual energies) it means that the above demonstration with NaI will not work accurately - the Chlorine is calculated to be higher and so is not substituted. Regardless, the new prediction engine seems to work better - fixes will be released in the coming weeks to fix these issues.

The code is already up on GitHub and along side it you will find a new class definitions file which explains each of the new classes.

ChemKit Developments


Recently I have been continuing development of ChemKit. It is now able to calculate Gibbs free energy and predict the temperatures at which different changes and reactions will occur, while also being able to calculate the atomic orbital energy levels.

In order to calculate the Gibb's free energy I required the entropy and enthalpy changes of the reaction. While it may be possible to calculate the enthalpy change by storing the relatively simple bond enthalpies (or using the atomic orbitals and working out the bond energies) it proved harder to do so for the entropy changes so I took the lazier option to just store the compounds in data tables. This allows the program to calculate the entropy and enthalpy changes, and hence predict the temperature at which a reaction will go using the typical Gibb's energy equation;

\Delta G = \Delta H - T\Delta S

This proved relatively useful. As shown below it allows for relatively accurate prediction of reactions;

> set temp 300
> gibbs H2O(l) -> H2O(g)
Entropy Change of Reaction: 118.78Jmol-1K-1
Enthalpy Change of Reaction: 44.01kJmol-1
Gibbs Free Energy at 300.0K: 8.376kJmol-1
Will reaction go?: No
Temperature: 370.516922041K
ln K: -3.35980746089
> set temp 400
> gibbs H2O(l) -> H2O(g)
Entropy Change of Reaction: 118.78Jmol-1K-1
Enthalpy Change of Reaction: 44.01kJmol-1
Gibbs Free Energy at 400.0K: -3.502kJmol-1
Will reaction go?: Yes
Temperature: 370.516922041K
ln K: 1.05354993983

As you can see, it says the reaction (more a state change in this case, I chose the state change because it is a well known one) will not occur at 300 kelvin, but will at 400 kelvin. As we know, water boils at 373.15 kelvin (100\circ C), so this seems likely. It further predicts that the temperature at which it will finally go is 370.5K - this is slightly below the temperature normally considered to be the boiling point however given the use of average data it is relatively close.

After writing this I decided to calculate the atomic orbital energies. As currently ChemKit uses electronegativites (which are based on the lowest occupied atomic orbital... kind of) it already sort of uses the energies to predict reaction products. Adding a calculator to work them out, however, makes the change more visible. For this I used the following equation;

E = -R_{y}\frac{Z^{2}}{n^2}

Where R_{y} is the Rydberg Constant for eV (13.6eV essentially the energy of a ground state electron in hydrogen), Z is the nuclear charge and $n$ is the principal quantum number. By adjusting the nuclear charge to take into account the electron shielding this produces some relatively accurate numbers;

> element Na
=== Sodium (Na) ===
Atomic Number: 11
Atomic Mass: 22.98977
Electronegativity: 0.9
1s0 (-1646.29eV): 2
2s0 (-275.515eV): 2
2p-1 (-166.67eV): 2
2p0 (-166.67eV): 2
2p1 (-166.67eV): 2
3s0 (-1.512eV): 1

> element F
=== Fluorine (F) ===
Atomic Number: 9
Atomic Mass: 18.9984
Electronegativity: 4.0
1s0 (-1102.062eV): 2
2s0 (-166.67eV): 2
2p-1 (-85.036eV): 2
2p0 (-85.036eV): 2
2p1 (-85.036eV): 1

> element H
=== Hydrogen (H) ===
Atomic Number: 1
Atomic Mass: 1.00794
Electronegativity: 2.1
1s0 (-13.606eV): 1


As you can see from this, the highest energy orbital in fluorine is -85.0eV, in sodium it is -1.5eV and in hydrogen it is -13.6eV. This means that as electrons will tend to have a higher probability in an area of lower energy the position in which an electron will have a higher probability in NaH is closer to the Hydrogen, while in HF it is closer to the fluorine - the electron wants to be in a lower energy state.

I am going to continue developing ChemKit as my primary project from now on (it can still be found on my github), and I will hopefully come out with the aforementioned Lorentz Transformation post within the week.

The Chemistry Project - now ChemSi.

ben@Eddie:~/Development/SI/ChemSi$ python
 Welcome to ChemKit (copyright 2015).
 > resultant KI + NaCl
 KI + NaCl -> KCl + NaI
 > set verbose
 Verbose mode on
 > resultant KI + NaCl
 KI + NaCl -> KCl + NaI
 KI: 166.003g/mol
 NaCl: 58.443g/mol
 KCl: 74.551g/mol; 33.21576375817387%
 NaI: 149.894g/mol; 66.78423624182614%
 > mass C8H18O

Recently I have been working on ChemSi, which is the chemical program I mentioned in my last post. I have since written quite a large amount of code for it to utilise an algorithm I came up with last night at about 1:00AM. It is not the smoothest algorithm, and it does get it wrong about 40% of the time. Regardless, it is relatively good for what is needed here.

In an effort for transparency I will explain how the algorithm works. It is based off of displacement reactions - so for addition and other forms it has some issues. It gets the lowest electronegativity (aka the most nucleophilic element) of the elements present in a structure. It works out how many valencies this has, and finds the highest electronegativity. It repeats this until all valencies are filled - but it does mean the original structures are destroyed (so covalent bonds are treated the same as ionic) - meaning a OH group may be split up, often with disastrous results. A better method might be to do a bunch of algorithms and rank their products on how few lone elements they have (for example with ChemSi entering 'C2H4 + H2O' will get you 'CH3OH + C + H2', when it should all be as a single ethanol molecule). It could then work out the most likely product.
Overall it feels like I am making good progress on it. I am hoping to have a fully working product by next friday (which can do percentage yields). Over the summer holidays I may attempt to add a GUI (sort of like IrYdium VLab, only open source, up to date, and with every reactant).

Talking about IrYdium, I have found that many programs which do these sorts of chemical reaction predictions (ie Chemist on Android/iOS, IrYdium) have a very limited range of chemicals and you feel as though they are preprogrammed results. IrYdium for instance seems to only have chemicals for neutralizations and working out how different things affect the rate. If ChemSi does get to the stage of having a GUI I can assure you it will allow for any chemical mixture, at any temperature (but I cannot say it will always be right ;)).