pidibble.pdbparse module¶

class pidibble.pdbparse.PDBParser(input_format: str = 'PDB', overwrite: bool = False, source_db: str = None, source_id: str = None, filepath: str | Path = None, mappers: dict[str, Callable] = None, comment_chars: list[str] = ['#'], pdb_format_file: str = 'pdb_format.yaml', mmcif_format_file: str = 'mmcif_format.yaml', **kwargs)[source]¶

Bases: object

A class for parsing PDB files and extracting structured data. This class handles fetching PDB files, reading them, and parsing their contents into structured records based on predefined formats.

parsed¶

A dictionary containing parsed records, where keys are record types and values are pdbrecord.PDBRecord instances or lists of instances. This dictionary is populated after parsing the PDB or mmCIF file.

Type:: PDBRecordDict

mappers¶

A dictionary of mappers for parsing different data types, including custom formats and delimiters.

Type:: dict

pdb_lines¶

A list of lines read from the PDB file. Empty if no input file is provided.

Type:: list

cif_data¶

A dictionary containing the parsed mmCIF data. Empty if no input file is provided.

Type:: dict

fetch()[source]¶

Fetch the PDB file based on the provided PDB code or AlphaFold ID. This method checks if the PDB code or AlphaFold ID is provided, constructs the appropriate file path, and attempts to download the file from the PDB or AlphaFold API.

Returns:: True if the file was successfully fetched, False otherwise.
Return type:: bool

parse()[source]¶

Parse the PDB or mmCIF file and generate a dictionary of pdbrecord.PDBRecord instances. This method first fetches the PDB or mmCIF file based on the provided PDB code or AlphaFold ID. It then reads the file and parses its contents into structured records. If the input format is mmCIF, it uses the mmcif_parse.MMCIF_Parser to parse the mmCIF data. If the input format is PDB, it uses the pdbrecord.PDBRecord class to parse the PDB lines.

Returns:: self – The instance of pdbrecord.PDBRecord containing the parsed records.
Return type:: PDBParser

parse_PDB()[source]¶: Parse the PDB lines and generate a dictionary of pdbrecord.PDBRecord instances. This method iterates through the PDB lines, identifies the record type based on the first character, and creates a new pdbrecord.PDBRecord instance for each record. It handles different record types, including continuation records and grouped records.

parse_base()[source]¶: Parse the base records from the PDB or mmCIF file. This method initializes the parsing process based on the input format. If the input format is mmCIF, it uses the mmcif_parse.MMCIF_Parser to parse the mmCIF data. If the input format is PDB, it uses the pdbrecord.PDBRecord class to parse the PDB lines.

parse_embedded_records()[source]¶: Parse embedded records within the parsed records. This method iterates through the parsed records and checks if any record has embedded records. If an embedded record is found, it calls the pdbrecord.PDBRecord.parse_embedded() method to parse the embedded records. It updates the PDBParser.parsed dictionary with the new parsed records.

parse_mmCIF()[source]¶: Parse the mmCIF data and generate a dictionary of pdbrecord.PDBRecord instances. This method uses the mmcif_parse.MMCIF_Parser to parse the mmCIF data and store the parsed records in PDBParser.parsed.

parse_tables()[source]¶: Parse tables within the parsed records. This method iterates through the parsed records and checks if any record has table formats. If a table format is found, it calls the pdbrecord.PDBRecord.parse_tables() method to parse the tables. It updates the PDBParser.parsed dictionary with the new parsed records.

parse_tokens()[source]¶: Parse tokens within the parsed records. This method iterates through the parsed records and checks if any record has token formats. If a token format is found, it calls the pdbrecord.PDBRecord.parse_tokens() method to parse the tokens. It updates the PDBParser.parsed dictionary with the new parsed records.

post_process()[source]¶: Post-process the parsed records to handle embedded records, tokens, and tables. This method checks if the input format is mmCIF and processes the records accordingly. If the input format is PDB, it processes the records to handle embedded records, tokens, and tables.

read()[source]¶: Read the PDB or mmCIF file based on the input format. This method checks the input format and calls the appropriate read method.

read_PDB()[source]¶: Read the PDB file and store its lines in PDBParser.pdb_lines. This method opens the PDB file, reads its contents, and splits it into lines. If the last line is empty, it removes it from the list of lines.

read_mmCIF()[source]¶: Read the mmCIF file and store its data in PDBParser.cif_data. This method uses the mmcif.io.IoAdapterCore.IoAdapterCore to read the mmCIF file and store the data in PDBParser.cif_data.

pidibble.pdbparse.get_symm_ops(rec: PDBRecord)[source]¶

Extract the symmetry operations from a PDB record. This function processes the symmetry operations from a PDB record and returns the transformation matrix and translation vector.

Parameters:

rec (pdbrecord.PDBRecord) – The PDBRecord instance containing the symmetry operations.

Returns:

M (numpy.ndarray) – The 3x3 transformation matrix.
T (numpy.ndarray) – The 3x1 translation vector.