cgsmiles.sample module

class cgsmiles.sample.MoleculeSampler(fragment_dict, polymer_reactivities, fragment_reactivities={}, terminal_bonds=[], fragment_masses=None, all_atom=True, seed=None)[source]

Bases: object

Given a fragment string in CGsmiles format and probabilities for residues to occur, return a random molecule with target molecular weight.

First, this class has to be initiated using the class construction method from_string, which makes sure to read and resolve the fragment graphs provided in the CGsmiles string.

Once the MoleculeSampler is initiated you can call the sampler method in order to generate a new random polymer molecule from the fragment string that was provided.

In addition to just randomly creating a molecule reactivities and termination probabilities can be provided to steer the outcome. We use the term and variable name reactivity whenever referring to a probability that a certain bonding descriptor is used to highlight the fact that the probabilistic behaviour is driven by the bonding descriptors.

The sampler features a two level connectivity determination. First using the polymer_reactivities an open bonding descriptor from the growing polymer molecule is selected according to the probabilities provided. Subsequently, a fragment is randomly chosen that has a matching complementary bonding descriptor.

If in addition the fragment_reactivities are provided the algorithm will select a fragment that features a bonding descriptor according to the conditional probabilities provided. The fragment_reactivities keyword accepts a two level dict. The first key is the bonding descriptor found on the polymer molecule (i.e. the one that will be used to extend the polymer). The second level contains a dict specifying a probability for the bonding descriptors found in the fragments (i.e. their conditional probability given the polymer descriptors).

Basic Examples

Random-copolymer of PMMA and PS with equal probabilities.

>>> cgsmiles_str = "{#PMMA=[>]C(C)C[<]C(=O)OC,#PS=[>]CC[<]c1cccc1}"
>>> polymer_reactivities = {'>':0.5, '<':0.5}
>>> sampler = MoleculeSampler.from_fragment_string(cgsmiles_str,
                                                   polymer_reactivities=polymer_reactivities)
>>> polymer_graph = sampler.sample(target_weight=10000)

One can also make a blocky copolymer by providing the conditional reactivities. To distinguish the descriptors on PMMA vs PS we can label them with a alphanumeric string.

>>> cgsmiles_str = "{#PMMA=[$A]C(C)C[$B]C(=O)OC,#PS=[$C]CC[$D]c1cccc1}"
>>> polymer_reactivites = {'$A':0.5, '$B':0, '$C': 0.5, '$D': 0.0}
>>> fragment_reactivities = {'$A': {'$A': 0.,   '$C': 0.,   '$B': 0.7, '$D': 0.3},
                             '$B': {'$A': 0.7, '$C': 0.3, '$B': 0.0, '$D': 0.0 },
                             '$C': {'$A': 0.,   '$C': 0.,   '$B': 0.3, '$D': 0.7},
                             '$D': {'$A': 0.3, '$C': 0.7, '$B': 0.0, '$D': 0.0 }}
>>> sampler = MoleculeSampler.from_fragment_string(cgsmiles_str,
                                                   polymer_reactiviyies=polymer_reactivities,
                                                   fragment_reactivities=fragment_reactivities)
>>> polymer_graph = sampler.sample(target_weight=10000)

One can also make a blocky copolymer but steer the probability of having head-to-head tail-to-tail vs head-to-tail addition. This is done by labeling the bonding descriptors A-D as before but this time allowing head-head and tail-tail additions (i.e. (A,A) & (A,C)) but with lower reactivities.

>>> cgsmiles_str = "{#PMMA=[$A]C(C)C[$B]C(=O)OC,#PS=[$C]CC[$D]c1cccc1}"
>>> polymer_reactivities = {'$A':0.25, '$B':0.25, '$C': 0.25, '$D': 0.25}
>>> fragment_reactivities = {'$A': {'$A': 0.1, '$C': 0.1, '$B': 0.4, '$D': 0.4},
                            '$B': {'$A': 0.4, '$C': 0.4, '$B': 0.1, '$D': 0.1},
                            '$C': {'$A': 0.1, '$C': 0.1, '$B': 0.4, '$D': 0.4},
                            '$D': {'$A': 0.4, '$C': 0.4, '$B': 0.1, '$D': 0.1 }}
>>> sampler = MoleculeSampler.from_fragment_string(cgsmiles_str,
                                                   polymer_reactivities=polymer_reactivities,
                                                   fragment_reactivities=fragment_reactivities)
>>> polymer_graph = sampler.sample(target_weight=10000)

Advanced API Examples

Of course the sampler is not limited to linear co-polymers. Also randomly branched molecules such as bottle-brush copolymers can be generated. To steer the branching ratio and branch extension as well as terminal end groups the branch_term_probs, terminal_fragments, and terminal_reactivities can be provided.

For example, To generate a bottle brush polymer that has PMA in the backbone and PEG as side-chain terminated with an OH group the following CGsmiles string in combination with the above mentioned probabilities can be provided.

Note that in this case we declare ‘$A’ and ‘$B’ to be terminal bonding descriptors. That means if they are selected all other bonding descriptors are automatically removed from the previous residue. So if PEG connects to OH ‘>A’ is removed from PEG to avoid a trivalent PEG. Likewise, if the terminal bonding selector is not chosen (i.e. if PEG connects to PEG) it is removed from the source residue. Finally, we need to specify that ‘$A’ has 0 chance of connecting to itself.

>>> cgsmiles_str="{#PMA=[>]CC[<]C(=O)OC[>A],#PEG=[<A]COC[>A][$A],#OH=[$B]O}"
>>> sampler = MoleculeSampler.from_fragment_string(cgsmiles_str,
                                                   terminal_bonds=['$A', '$B],
                                                   polymer_reactivities={'<': 0.1, '>': 0.1,
                                                                         '>A': 0.8, '<A':0.8,
                                                                         '$A':0.3, '$B':0.0},
                                                   fragment_reactivities={'$A': {'$A':0, '$B': 1.0},},
                                                   all_atom=True)
>>> molecule = sampler.sample(target_weight=5008)

This combination results into a dense brush where very PMA molecule has a PEG branch whose length is determined by a 1 in 3 probability to terminate.

In contrast one can also generate sparse branched molecules. For instance, Dextran at low molecular weights is mainly a linear alpha 1-6 polyglucose polymer. Occasionally, a 1-4 branch extends from it which has lengths of 2-4. At the Martini coarse-grained level Dextran is described by three particles A, B, and C. The alpha 1-6 bonds are between A and C whereas the 1-4 bonds are between A and B.

>>> cgsmiles_str="{#GLC=[$A][#A]1[#B][$B][#C]1[$C]}"
>>> sampler = MoleculeSampler.from_fragment_string(cgsmiles_str,
                                                   fragment_masses={'GLC': 165},
                                                   polymer_reactivities={'$A': 0.8, '$C': 0.1, '$B': 0.1},
                                                   fragment_reactivities={'$A': {'$A': 0.0, '$C': 1.0, '$B': 0.0},
                                                                          '$B': {'$A': 1.0, '$C': 0.0, '$B': 0.0},
                                                                          '$C': {'$A':1.0, '$C':0.0, '$B':0.0}},
                                                   all_atom=False)
>>> molecule = sampler.sample(target_weight=5008)
param fragment_dict:

a dict of fragment graphs at one resolution. Each graph must have the same attributes as returned by the cgsmiles.read_fragments function.

type fragment_dict:

dict[str, networkx.Graph]

param polymer_reactivities:

a dict that lists all bonding descriptors in the fragments and assign a probability to them (i.e. that they will react).

type polymer_reactivities:

dict[str, float]

param fragment_reactivities:

probability of given a certain bonding descriptor what should be the next bonding descriptor. For example: {’&A’: {’&A’: 0.2, ‘&B’: 0.8}, ‘&B’: {’&B’: 0.2, ‘&A’: 0.8}}

param terminal_bonds:

a list of bonding descriptors for termini

type terminal_bonds:

list[str]

param fragment_masses:

masses of the molecule fragments; if all_atom is True these can be left out and are automatically computed from the element masses

type fragment_masses:

dict[str, float]

param all_atom:

if the fragments are all-atom resolution

type all_atom:

bool

param seed:

set random seed for all processes; default is None

type seed:

int

add_fragment(molecule, open_bonds, fragments, polymer_reactivities, fragment_reactivities)[source]

Pick an open bonding descriptor according to polymer_reactivities and then pick a fragment that has the complementary bonding descriptor. For the second step conditional probabilities can be given using fragment_reactivites.

Parameters:
  • molecule (networkx.Graph) – the molecule to extend

  • open_bonds (dict[list[collections.abc.Hashable]]) – a dict of bonding active descriptors with list of nodes in molecule as value

  • fragments (dict[list[str]]) – a dict of fragment names indexed by their bonding descriptors

  • polymer_reactivities – the porbabilities that a bonding connector in the polymer forms a bond with the next fragment

  • fragment_reactivities – conditional probabilties that given a bonding connector in the polymer a specific bonding connector in the fragments will be selected

Returns:

  • networkx.Graph – the grown molecule

  • str – the fragment name of the added fragment

classmethod from_fragment_string(cgsmiles_str, **kwargs)[source]

Initiate a MoleculeSampler instance from a cgsmiles string, describing the fragments fragments making up the polymer at one resolution.

Parameters:
  • cgsmiles_str (str)

  • **kwargs – same as MoleculeSampler.__init__

Return type:

MoleculeSampler

sample(target_weight, start_fragment=None)[source]

From a list of cgsmiles fragment graphs generate a new random molecule by stitching them together until a target_weight is reached.

Parameters:
  • target_weight (int) – the weight of the polymer to generate the weight is the minimum weight and can differ by one additional fragment plus terminal

  • start_fragment (str) – the fragment name to start with

Returns:

the graph of the molecule

Return type:

networkx.Graph