cgsmiles.sample module¶
- class cgsmiles.sample.MoleculeSampler(fragment_dict, polymer_reactivities, fragment_reactivities={}, terminal_bonds=[], fragment_masses=None, all_atom=True, seed=None)[source]¶
Bases:
objectGiven a fragment string in CGsmiles format and probabilities for residues to occur, return a random molecule with target molecular weight.
First, this class has to be initiated using the class construction method from_string, which makes sure to read and resolve the fragment graphs provided in the CGsmiles string.
Once the MoleculeSampler is initiated you can call the sampler method in order to generate a new random polymer molecule from the fragment string that was provided.
In addition to just randomly creating a molecule reactivities and termination probabilities can be provided to steer the outcome. We use the term and variable name reactivity whenever referring to a probability that a certain bonding descriptor is used to highlight the fact that the probabilistic behaviour is driven by the bonding descriptors.
The sampler features a two level connectivity determination. First using the polymer_reactivities an open bonding descriptor from the growing polymer molecule is selected according to the probabilities provided. Subsequently, a fragment is randomly chosen that has a matching complementary bonding descriptor.
If in addition the fragment_reactivities are provided the algorithm will select a fragment that features a bonding descriptor according to the conditional probabilities provided. The fragment_reactivities keyword accepts a two level dict. The first key is the bonding descriptor found on the polymer molecule (i.e. the one that will be used to extend the polymer). The second level contains a dict specifying a probability for the bonding descriptors found in the fragments (i.e. their conditional probability given the polymer descriptors).
Basic Examples¶
Random-copolymer of PMMA and PS with equal probabilities.
>>> cgsmiles_str = "{#PMMA=[>]C(C)C[<]C(=O)OC,#PS=[>]CC[<]c1cccc1}" >>> polymer_reactivities = {'>':0.5, '<':0.5} >>> sampler = MoleculeSampler.from_fragment_string(cgsmiles_str, polymer_reactivities=polymer_reactivities) >>> polymer_graph = sampler.sample(target_weight=10000)
One can also make a blocky copolymer by providing the conditional reactivities. To distinguish the descriptors on PMMA vs PS we can label them with a alphanumeric string.
>>> cgsmiles_str = "{#PMMA=[$A]C(C)C[$B]C(=O)OC,#PS=[$C]CC[$D]c1cccc1}" >>> polymer_reactivites = {'$A':0.5, '$B':0, '$C': 0.5, '$D': 0.0} >>> fragment_reactivities = {'$A': {'$A': 0., '$C': 0., '$B': 0.7, '$D': 0.3}, '$B': {'$A': 0.7, '$C': 0.3, '$B': 0.0, '$D': 0.0 }, '$C': {'$A': 0., '$C': 0., '$B': 0.3, '$D': 0.7}, '$D': {'$A': 0.3, '$C': 0.7, '$B': 0.0, '$D': 0.0 }} >>> sampler = MoleculeSampler.from_fragment_string(cgsmiles_str, polymer_reactiviyies=polymer_reactivities, fragment_reactivities=fragment_reactivities) >>> polymer_graph = sampler.sample(target_weight=10000)
One can also make a blocky copolymer but steer the probability of having head-to-head tail-to-tail vs head-to-tail addition. This is done by labeling the bonding descriptors A-D as before but this time allowing head-head and tail-tail additions (i.e. (A,A) & (A,C)) but with lower reactivities.
>>> cgsmiles_str = "{#PMMA=[$A]C(C)C[$B]C(=O)OC,#PS=[$C]CC[$D]c1cccc1}" >>> polymer_reactivities = {'$A':0.25, '$B':0.25, '$C': 0.25, '$D': 0.25} >>> fragment_reactivities = {'$A': {'$A': 0.1, '$C': 0.1, '$B': 0.4, '$D': 0.4}, '$B': {'$A': 0.4, '$C': 0.4, '$B': 0.1, '$D': 0.1}, '$C': {'$A': 0.1, '$C': 0.1, '$B': 0.4, '$D': 0.4}, '$D': {'$A': 0.4, '$C': 0.4, '$B': 0.1, '$D': 0.1 }} >>> sampler = MoleculeSampler.from_fragment_string(cgsmiles_str, polymer_reactivities=polymer_reactivities, fragment_reactivities=fragment_reactivities) >>> polymer_graph = sampler.sample(target_weight=10000)
Advanced API Examples¶
Of course the sampler is not limited to linear co-polymers. Also randomly branched molecules such as bottle-brush copolymers can be generated. To steer the branching ratio and branch extension as well as terminal end groups the branch_term_probs, terminal_fragments, and terminal_reactivities can be provided.
For example, To generate a bottle brush polymer that has PMA in the backbone and PEG as side-chain terminated with an OH group the following CGsmiles string in combination with the above mentioned probabilities can be provided.
Note that in this case we declare ‘$A’ and ‘$B’ to be terminal bonding descriptors. That means if they are selected all other bonding descriptors are automatically removed from the previous residue. So if PEG connects to OH ‘>A’ is removed from PEG to avoid a trivalent PEG. Likewise, if the terminal bonding selector is not chosen (i.e. if PEG connects to PEG) it is removed from the source residue. Finally, we need to specify that ‘$A’ has 0 chance of connecting to itself.
>>> cgsmiles_str="{#PMA=[>]CC[<]C(=O)OC[>A],#PEG=[<A]COC[>A][$A],#OH=[$B]O}" >>> sampler = MoleculeSampler.from_fragment_string(cgsmiles_str, terminal_bonds=['$A', '$B], polymer_reactivities={'<': 0.1, '>': 0.1, '>A': 0.8, '<A':0.8, '$A':0.3, '$B':0.0}, fragment_reactivities={'$A': {'$A':0, '$B': 1.0},}, all_atom=True) >>> molecule = sampler.sample(target_weight=5008)
This combination results into a dense brush where very PMA molecule has a PEG branch whose length is determined by a 1 in 3 probability to terminate.
In contrast one can also generate sparse branched molecules. For instance, Dextran at low molecular weights is mainly a linear alpha 1-6 polyglucose polymer. Occasionally, a 1-4 branch extends from it which has lengths of 2-4. At the Martini coarse-grained level Dextran is described by three particles A, B, and C. The alpha 1-6 bonds are between A and C whereas the 1-4 bonds are between A and B.
>>> cgsmiles_str="{#GLC=[$A][#A]1[#B][$B][#C]1[$C]}" >>> sampler = MoleculeSampler.from_fragment_string(cgsmiles_str, fragment_masses={'GLC': 165}, polymer_reactivities={'$A': 0.8, '$C': 0.1, '$B': 0.1}, fragment_reactivities={'$A': {'$A': 0.0, '$C': 1.0, '$B': 0.0}, '$B': {'$A': 1.0, '$C': 0.0, '$B': 0.0}, '$C': {'$A':1.0, '$C':0.0, '$B':0.0}}, all_atom=False) >>> molecule = sampler.sample(target_weight=5008)
- param fragment_dict:
a dict of fragment graphs at one resolution. Each graph must have the same attributes as returned by the cgsmiles.read_fragments function.
- type fragment_dict:
dict[str, networkx.Graph]
- param polymer_reactivities:
a dict that lists all bonding descriptors in the fragments and assign a probability to them (i.e. that they will react).
- type polymer_reactivities:
dict[str, float]
- param fragment_reactivities:
probability of given a certain bonding descriptor what should be the next bonding descriptor. For example: {’&A’: {’&A’: 0.2, ‘&B’: 0.8}, ‘&B’: {’&B’: 0.2, ‘&A’: 0.8}}
- param terminal_bonds:
a list of bonding descriptors for termini
- type terminal_bonds:
list[str]
- param fragment_masses:
masses of the molecule fragments; if all_atom is True these can be left out and are automatically computed from the element masses
- type fragment_masses:
dict[str, float]
- param all_atom:
if the fragments are all-atom resolution
- type all_atom:
bool
- param seed:
set random seed for all processes; default is None
- type seed:
int
- add_fragment(molecule, open_bonds, fragments, polymer_reactivities, fragment_reactivities)[source]¶
Pick an open bonding descriptor according to polymer_reactivities and then pick a fragment that has the complementary bonding descriptor. For the second step conditional probabilities can be given using fragment_reactivites.
- Parameters:
molecule (networkx.Graph) – the molecule to extend
open_bonds (dict[list[collections.abc.Hashable]]) – a dict of bonding active descriptors with list of nodes in molecule as value
fragments (dict[list[str]]) – a dict of fragment names indexed by their bonding descriptors
polymer_reactivities – the porbabilities that a bonding connector in the polymer forms a bond with the next fragment
fragment_reactivities – conditional probabilties that given a bonding connector in the polymer a specific bonding connector in the fragments will be selected
- Returns:
networkx.Graph – the grown molecule
str – the fragment name of the added fragment
- classmethod from_fragment_string(cgsmiles_str, **kwargs)[source]¶
Initiate a MoleculeSampler instance from a cgsmiles string, describing the fragments fragments making up the polymer at one resolution.
- Parameters:
cgsmiles_str (str)
**kwargs – same as MoleculeSampler.__init__
- Return type: