cgsmiles.read_cgsmiles module

cgsmiles.read_cgsmiles.read_cgsmiles(pattern)[source]

Generate a networkx.Graph from a pattern string according to the CGsmiles line notation.

The syntax scheme consists of two curly braces enclosing the graph sequence. It can contain any enumeration of nodes by writing them as if they were smile atoms but the atomname is given by:

[# + fragname + ]

This input fomat can handle branching as well as cycles following the OpenSmiles syntax. For example, branches are indicated using enclosing them in ( … ). Rings are indicated by placing a after the closing braces of two nodes to be connected. Note that in agreement with OpenSmiles at most two digits an be used.

The general pattern of a CGsmiles string looks like this:

‘{’ + [#fragname_1][#fragname_2]… + ‘}’

In addition to plain enumeration we allow some special operators that simplify the description of large but regular graphs such as needed to describe polymer molecules.

The expansion operator | and an integer number that specifies how many times the given residue should be added within a sequence. For example, a pentamer of Polyethylene oxide can be written as:

{[#PEO][#PEO][#PEO][#PEO][#PEO]}

or using the expansion operator as:

{[#PEO]|5}

The expansion operator also applies to branches. Here the following convention applies. The complete branch including it’s first anchoring node is repeated. For example, to generate a PMA-g-PEG polymer containing 9 residues the following syntax is permitted:

{[#PMA]([#PEO][#PEO])|3}

This is equivalent to the following CGsmiles string:

{[#PMA]([#PEO][#PEO])[#PMA]([#PEO][#PEO])[#PMA]([#PEO][#PEO])}

Parameters:

pattern (str) – a string describing a graph according to CGsmiles syntax

Return type:

networkx.Graph