Reading & Resolving¶
A CGsmiles string can contain a base-graph (see Syntax Rules) and
multiple enumerations of fragment graphs each corresponding to a
different resolution. The base graph can be read using the
read_cgsmiles function, while the fragments can be read using
the read_fragments function. However, most user will find it
convenient to directly read the entire string and resolve the
different resolutions. This is done using the MoleculeResolver
class.
First we need to import the MoleculeResolver and initiate it
using the from_string or one of the other initiator methods.
Note that we can specify if the last resolution is at the atomic
level by providing last_all_atom=True argument.
from cgsmiles import MoleculeResolver
cgsmiles_string = '{[#TC5]1[#TC5][#TC5]1}.{#TC5=[$]cc[$]}'
resolver = MoleculeResolver.from_string(cgsmiles_string,
last_all_atom=True)
Next we can resolve the atomic resolution from the CG graph by
running the .resolve function once.
cg_graph, aa_graph = resolver.resolve()
For multiple resolutions we can run the resolver function
multiple times. Each time a new set of graphs at a coarse level
and the next finer level is returned. Alternatively, the
resolve_iter can be used to loop over all resolutions. Let’s
take the molecule in Figure 3 of the main paper:
from cgsmiles import MoleculeResolver
# CGsmiles string with 3 resolutions
cgsmiles_str = "{[#hphilic][#hdphob]|3[#hphilic]}.\
{#hphilic=[<][#PEO][>]|3,#hdphob=[<][#PMA][>]([#BUT])}.\
{#PEO=[<][#SN3r][>],#PMA=[<][#TC3][>][#SN4a][$],#BUT=[$][#SC3][$]}.\
{#SN3r=[<]COC[>],#TC3=[<]CC[>][$1],#SN4a=[$1]C(=O)OC[$2],#SC3=[$2]CCC}"
# Generate the MoleculeResolver
resolver = MoleculeResolver.from_string(cgsmiles_str, last_all_atom=True)
# Now we can loop over all resolutions using
for coarse_graph, finer_graph in resolver.resolve_iter():
print(coarse_graph.nodes(data='fragname'))
print(finer_graph.nodes(data='atomname'))
Alternatively, we could just have gotten the final two pairs by calling
.resolve_all().