cgsmiles.resolve module¶

class cgsmiles.resolve.MoleculeResolver(molecule_graph, fragment_dicts, last_all_atom=True, legacy=True)[source]¶

Bases: object

Resolve the molecule(s) described by a CGsmiles string and return a networkx.Graph of the molecule.

First, this class has to be initiated using one of three class construction methods. When trying to read a CGsmiles string always use the first method. The other constructors can be used in case fragments or the lowest resolution molecule are defined by graphs that come from elsewhere.

self.from_string: use when fragments and lowest resolution are: described in one CGsmiles string.
self.from_graph: use when fragments are described by CGsmiles: strings but the lowest resolution is given as nx.Graph
self.from_fragment_dicts: use when fragments are given as nx.Graphs: and the lowest resolution is provided as CGsmiles string

Once the MoleculeResolver is initiated you can call the resolve_iter to loop over the different levels of resolution. The resolve iter will always return the previous lower resolution graph as well as the current higher resolution graph. For example, if the CGsmiles string describes a monomer sequence of a regular polymer, the lower resolution graph will be the graph of this monomer sequence and the higher resolution graph the full molecule.

Basic Examples¶

Blocky-copolymer of PE and PEO with the first resolution being the sequence of blocks, followed by the monomer graph, and then the full molecule.

>>> cgsmiles_str = "{[#B1][#B2][#B1]}.{#B1=[#PEO]|4,#B2=[#PE]|2}.{#PEO=[>]COC[<],#PE=[>]CC[<]}"
>>> resolver = MoleculeResolver.from_string(cgsmiles_str)
>>> for low_res, high_res in resolver.resolve_iter():
        print(low_res.nodes(data='fragname'))
        print(high_res.nodes(data='atomname))

To only access the final resolution level you can simply call resolve all: >>> monomer_graph, full_molecule = resolver.resolve_all()

Advanced API Examples¶

Alternatively, one could have gotten the block level graph from somewhere else defined as nx.Graph in that case:

>>> # the string only defines the fragments
>>> cgsmiles_str = "{#B1=[#PEO]|4,#B2=[#PE]|2}.{#PEO=[>]COC[<],#PE=[>]CC[<]}"
>>> block_graph = nx.Graph()
>>> block_graph.add_edges_from([(0, 1), (1, 2), (2, 3)])
>>> nx.set_node_attributes(block_graph, {0: "B1", 1: "B2", 2: "B1"}, 'fragname')
>>> resolver = MoleculeResolver.from_graph(cgsmiles_str, block_graph)

Finally, there is the option of having the fragments from elsewhere for example a library. Then only the graph defined as CGsmiles string. In this case the from_fragment_dicts method can be used. Please note that the fragment graphs need to have the following attributes as a graph returned by the cgsmiles.read_fragments function.

>>> fragment_dicts = []
>>> for frag_string in ["{#B1=[#PEO]|4,#B2=[#PE]|2}", "{#PEO=[>]COC[<],#PE=[>]CC[<]}"]:
>>>     frag_dict = read_fragments(frag_string)
>>>     fragment_dicts.append(frag_dict)
>>> cgsmiles_str = "{[#B1][#B2][#B1]}"
>>> resolver = MoleculeResolver.from_fragment_dicts(cgsmiles_str, fragment_dicts)

Subclassing¶

More advanced workflows can easily be implemented by subclassing the MoleculeResolver and adding new constructors that peform more complex preparation instructions for example.

param molecule_graph:: a lower resolution molecule graph to be resolved to higher resolutions molecule graphs. Each node must have the fragname with a dict entry in the next fragment_dicts list.
type molecule_graph:: networkx.Graph
param fragment_dicts:: a dict of fragment graphs per resolution. Each graph must have the same attributes as returned by the cgsmiles.read_fragments function.
type fragment_dicts:: list[dict[str, networkx.Graph]]
param last_all_atom:: if the last resolution is at the all atom level. If True the code will use pysmiles to parse the fragments and return the all-atom molecule. Default: True
type last_all_atom:: bool
param legacy:: which syntax convention to use for matching the bonding descriptors. Legacy syntax adheres to the BigSmiles convention. Default syntax adheres to CGsmiles convention where bonding descriptors ‘$’ match with every ‘$’ and every ‘<’ matches every ‘>’. With the BigSmiles convention a alphanumeric string may be provided that distinguishes these connectors. For example, ‘$A’ would not match ‘$B’. However, such use cases should be rare and the CGsmiles convention facilitates usage of bonding descriptors in the Sampler where the labels are used to assign different probabilities.
type legacy:: bool

edges_from_bonding_descrpt(all_atom=True)[source]¶

Makes edges according to the bonding descriptors stored in the node attributes of meta_molecule residue graph.

If a bonding descriptor is consumed it is removed from the list, however, the meta_graph edge gets an attribute with the bonding descriptors that formed the edge.

Later unconsumed descriptors are discarded and the valence filled in using hydrogen atoms in case of an atomistic molecule.

Parameters:: all_atom (bool) – if the high resolution level graph has all-atom resolution default: False

classmethod from_fragment_dicts(cgsmiles_str, fragment_dicts, last_all_atom=True, legacy=True)[source]¶

Initiate a MoleculeResolver instance from a cgsmiles string, describing one molecule and fragment_dicts containing fragments for each resolution.

Parameters:

cgsmiles_str (str)
fragment_dicts (list[dict[str, networkx.Graph]]) – a dict of fragment graphs per resolution. Each graph must have the same attributes as returned by the cgsmiles.read_fragments function.
last_all_atom (bool) – if the last resolution is all-atom and is read using pysmiles
legacy (bool) – which syntax convention to use for matching the bonding descriptors. Legacy syntax adheres to the BigSmiles convention. Default syntax adheres to CGsmiles convention. A more detailed explanation can be found in the MoleculeResolver.__init__ method.

Return type:

MoleculeResolver

classmethod from_graph(cgsmiles_str, meta_graph, last_all_atom=True, legacy=True)[source]¶

Initiate a MoleculeResolver instance from a cgsmiles string and a meta_graph that describes the lowest resolution.

Parameters:

cgsmiles_str (str)
meta_graph (networkx.Graph) – a graph describing the lowest resolution. All nodes must have the fragname attribute set.
last_all_atom (bool) – if the last resolution is all-atom and is read using pysmiles
legacy (bool) – which syntax convention to use for matching the bonding descriptors. Legacy syntax adheres to the BigSmiles convention. Default syntax adheres to CGsmiles convention. A more detailed explanation can be found in the MoleculeResolver.__init__ method.

Return type:

MoleculeResolver

classmethod from_string(cgsmiles_str, last_all_atom=True, legacy=True)[source]¶

Initiate a MoleculeResolver instance from a cgsmiles string.

Parameters:

cgsmiles_str (str)
last_all_atom (bool) – if the last resolution is all-atom and is read using pysmiles
legacy (bool) – which syntax convention to use for matching the bonding descriptors. Legacy syntax adheres to the BigSmiles convention. Default syntax adheres to CGsmiles convention. A more detailed explanation can be found in the MoleculeResolver.__init__ method.

Return type:

MoleculeResolver

static read_fragment_strings(fragment_strings, last_all_atom=True)[source]¶

Read a list of CGsmiles fragment_strings and return a list of dicts with the fragment graphs. If last_all_atom is True then pysmiles is used to read the last fragment string provided in the list.

Parameters:

fragment_strings (list[str]) – list of CGsmiles fragment strings
last_all_atom (bool) – if the last string in the list is an all atom string and should be read using pysmiles.

Returns:

a list of the fragment dicts composed of the fragment name and a nx.Graph describing the fragment

Return type:

list[dict[str, networkx.Graph]]

resolve()[source]¶: Resolve a CGsmiles string once and return the next resolution.

resolve_all()[source]¶: Resolve all layers and return final moleculs as well as the previous resolution graph.

resolve_disconnected_molecule(fragment_dict)[source]¶

Given a connected graph of nodes with associated fragment graphs generate a disconnected graph of the fragments and annotate each fragment graph to the node in the higher resolution graph.

Parameters:: fragment_dict (dict[str, networkx.Graph]) – a dict of fragment graphs

resolve_iter()[source]¶: Iterator returning all resolutions in oder.

squash_atoms()[source]¶: Applies the squash operator by removing the duplicate node adding, all edges from that node to the remaining one, and annotate the other node with the fragid of the removed node.

cgsmiles.resolve.compatible(left, right, left_data={}, right_data={}, legacy=True)[source]¶

Check bonding descriptor compatibility according to the CGsmiles syntax conventions. With legacy, the BigSmiles convention can be used.

The dicts of left_data and right_data are only used when compatibility-checking uses of the squashing ([!]) operator.

Parameters:

left (str)
right (str)
left_data (dict)
right_data (dict)
legacy (bool)

Return type:

bool

cgsmiles.resolve.match_bonding_descriptors(source, target, bond_attribute='bonding', legacy=True)[source]¶

Given a source and a target graph, which have bonding descriptors stored as node attributes, find a pair of matching descriptors and return the respective nodes. The function also returns the bonding descriptors. If no bonding descriptor is found an instance of LookupError is raised.

Parameters:

source (networkx.Graph)
target (networkx.Graph)
bond_attribute (collections.abc.Hashable) – under which attribute are the bonding descriptors stored.
legacy (bool) – which syntax convention to use when matching bonding descriptors (legacy=BigSmiles)

Returns:

the nodes as well as bonding descriptors

Return type:

((collections.abc.Hashable, collections.abc.Hashable), (str, str))

Raises:

LookupError – if no match is found