File Parsing Tools

Extract Data from CML

group_decomposition.utils.all_data_from_cml(data)[source]

Gets symbols, xyz coords, bonds and charge of a mol from cml file

Parameters:

data – lines of a .cml file

Returns:

dictionary with relevant data from cml file. Keys included are ‘geom’, ‘atom_types’, ‘bonds’, ‘labels’, ‘charge’, ‘multiplicity’, ‘smiles’

Note

Not used in group_decomposition.fragfunctions.identify_connected_fragments This is used in an AiiDA workflow employing this package

This is designed to parse files specifically from the Retrievium database https://retrievium.ca

the SMILEs extracted is labelled with tag retrievium:inputSMILES

The geometry extracted is from the third atomArray block in the .cml file

Example Usage:
>>> utils.all_data_from_cml(cml_file)

Extract Data from CML

group_decomposition.utils.data_from_cml(cml_file, bonds=False)[source]

Gets symbols, xyz coords, bonds and charge of a mol from cml file

Parameters:

cml_file – .cml filename

Returns:

list with relevant data from cml file. Elements in order are: molecular geometry, atom types, list of bonds, list of elements, charge

Note

This is designed to parse files specifically from the Retrievium database https://retrievium.ca

the SMILEs extracted is labelled with tag retrievium:inputSMILES

The geometry extracted is from the third atomArray block in the .cml file

Example Usage:
>>> utils.data_from_cml(cml_file)

Extract Atom Types from CML

group_decomposition.utils.get_cml_atom_types(cml_file)[source]

Extract atom types from cml file

Parameters:

cml_file – cml file name

Returns:

Data Frame column whose elements are tuples of atom types as defined by Retrievium. (atom number, type, valence)

Extract SMILEs from cml

group_decomposition.utils.smiles_from_cml(cml_file, smile_tag='retrievium:inputSMILES')[source]

Finds the Retreivium SMILES in a cml file with a given label

Parameters:
  • cml_file – cml file name

  • smile_tag – the label fo the SMILEs in the cml file. Defaults to input SMILEs

Returns:

inputSMILES

Return type:

string of the input SMILES code tagged in the file as retrievium

Note

Must be used on .cml files from the Retrievium database https://retrievium.ca

Get XYZ from CML

group_decomposition.utils.xyz_from_cml(cml_file)[source]

Extract xyz coordinates from cml file

Parameters:

cml_file – cml file name

Returns:

list of length 3 lists containing a molecule’s xyz coordinates

Note

Must be used on .cml files from the Retrievium database https://retrievium.ca

Change XYZ Coordinates from List to String

group_decomposition.utils.xyz_list_to_str(xyz_list)[source]

Convert 2d array to string [[a,b,c],[d,e,f]] -> a, b, c d, e, f

Convert 1D List To String

group_decomposition.utils.list_to_str(lst)[source]

Convert a list to string. [a,b,c] -> a b c