Functions

Primary Functions

crema.assign_confidence(psms, score_column=None, desc=None, eval_fdr=0.01, method='tdc', pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]

Assign confidence estimates to a collection of peptide-spectrum matches.

Parameters:

psmsPsmDataset or list of PsmDataset objects: The collections of PSMs
score_columnstr, optional: The score by which to rank the PSMs for confidence estimation. If None, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.
descbool, optional: True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is None, this parameter is ignored.
eval_fdrfloat, optional: The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.
method{“tdc”}, optional: The method for crema to use when calculating the confidence estimates.
pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional: The method for Crema to use when calculating peptide level confidence estimates.
prot_fdr_type{“best”, “combine”}, optional: The method for crema to use when calculating protein level confidence estimates. Default is “best”.
thresholdfloat or “q-value”, optional: The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.

Returns:

Confidence object or List of Confidence objects: The confidence estimates for each PsmDataset.

Parsers

crema.read_tide(txt_files, pairing_file_name=None, decoy_prefix='decoy_', copy_data=True)[source]

Read peptide-spectrum matches (PSMs) from Tide tab-delimited files.

Parameters:

txt_filesstr, pandas.DataFrame or tuple of str: One or more collection of PSMs in the Tide tab-delimited format.
pairing_file_namestr, optional: A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences. This file can be generated by setting –peptide-list=T in tide-index.
decoy_prefixstr, optional: The prefix used to indicate a decoy protein in the protein column. Default value is ‘decoy_’.
copy_databool, optional: If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a pandas.DataFrame

Returns:

PsmDataset: A PsmDataset object containing the parsed PSMs.

crema.read_msamanda(txt_files, pairing_file_name=None, decoy_prefix='REV_', copy_data=True)[source]

Read peptide-spectrum matches (PSMs) from MSAmanda tab-delimited files.

Parameters:

txt_filesstr, pandas.DataFrame or tuple of str: One or more collection of PSMs in the MSAmanda tab-delimited format.
pairing_file_namestr, optional: A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences.
decoy_prefixstr, optional: The prefix used to indicate a decoy protein in the protein column. Default value is ‘REV_’.
copy_databool, optional: If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a pandas.DataFrame

Returns:

PsmDataset: A PsmDataset object containing the parsed PSMs.

crema.read_msfragger(txt_files, pairing_file_name=None, decoy_prefix='rev_', copy_data=True)[source]

Read peptide-spectrum matches (PSMs) from MSFragger pepXML files.

Parameters:

txt_filesstr, pandas.DataFrame or tuple of str: One or more collection of PSMs in the MSFragger tab-delimited format.
pairing_file_namestr, optional: A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences.
decoy_prefixstr, optional: The prefix used to indicate a decoy protein in the protein column. Default value is ‘rev_’.
copy_databool, optional: If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a pandas.DataFrame

Returns:

PsmDataset: A PsmDataset object containing the parsed PSMs.

crema.read_msgf(txt_files, pairing_file_name=None, decoy_prefix='XXX_', copy_data=True)[source]

Read peptide-spectrum matches (PSMs) from MSGF+ tab-delimited files.

Parameters:

txt_filesstr, pandas.DataFrame or tuple of str: One or more collection of PSMs in the MSGF+ tab-delimited format.
pairing_file_namestr, optional: A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences.
decoy_prefixstr, optional: The prefix used to indicate a decoy protein in the protein column. Default value is ‘XXX_’.
copy_databool, optional: If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a pandas.DataFrame

Returns:

PsmDataset: A PsmDataset object containing the parsed PSMs.

crema.read_pepxml(pepxml_files, decoy_prefix)[source]

Read peptide-spectrum matches (PSMs) from pepXML files.

Parameters:

pepxml_filesstr or tuple of str: One or more collections of PSMs in the pepXML format.
decoy_prefixstr: The prefix used to indicate a decoy protein in the description lines of the FASTA file.

Returns:

PsmDataset: A PsmDataset object containing the PSMs from the pepxml file.

crema.read_mztab(mztab_files)[source]

Read peptide-spectrum matches (PSMs) from mzTab files.

Parameters:

mztab_filesstr or tuple of str: One or more collections of PSMs in the mzTab format.

Returns:

PsmDataset: A PsmDataset object containing the PSMs from the mzTab file.

crema.read_txt(txt_files, target_column, spectrum_columns, score_columns, peptide_column, protein_column, protein_delim, sep='\t', pairing_file_name=None, copy_data=True)[source]

Read peptide-spectrum matches (PSMs) from delimited text files.

Parameters:

txt_filesstr, panda.DataFrame, or tuple of str: One or more collection of PSMs in a tabular text format.
target_columnstr: The column that indicates whether a PSM is a target or a decoy.
spectrum_columnsstr or tuple of str: One or more columns that together define a unique mass spectrum.
score_columnsstr or tuple of str: One or more columns that indicate scores by which crema can rank PSMs.
peptide_columnstr: The column that defines a unique peptide. Modifications should be indicated either in square brackets [] or parentheses (). The exact modification format within these entities does not matter, so long as it is consistent.
protein_columnstr: The column that defines a unique protein.
protein_delimstr: The delimiter to separate protein IDs.
sepstr, optional: The delimiter to use.
pairing_file_namestr, optional: A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences. This file can be generated by setting –peptide-list=T in tide-index.
copy_databool, optional: If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when pin_files is a pandas.DataFrame

Returns:

PsmDataset: A PsmDataset object containing the parsed PSMs.

Writers

crema.to_txt(conf, output_dir=None, file_root=None, sep='\t', decoys=False, precision=6)[source]

Save confidence estimates to delimited text files.

Write the confidence estimates for each of the available levels (i.e. PSMs, peptides, proteins) to separate flat text files using the specified delimiter. If more than one collection of confidence estimates is provided, they will be combined, yielding a single file for each level specified by either dataset.

Parameters:

confConfidence object or tuple of Confidence objects: One or more Confidence objects.
output_dirstr or None, optional: The directory in which to save the files. None will use the current working directory.
file_rootstr or None, optional: An optional prefix for the confidence estimate files. The suffix will always be “crema.{level}.txt” where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).
sepstr, optional: The delimiter to use.
decoysbool, optional: Save decoys confidence estimates as well?
precisionint, optional: Precision for float values.

Returns:

list of str: The paths to the saved files.