Functions
Primary Functions
- crema.assign_confidence(psms, score_column=None, desc=None, eval_fdr=0.01, method='tdc', pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]
Assign confidence estimates to a collection of peptide-spectrum matches.
- Parameters:
- psmsPsmDataset or list of PsmDataset objects
The collections of PSMs
- score_columnstr, optional
The score by which to rank the PSMs for confidence estimation. If
None
, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.- descbool, optional
True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is
None
, this parameter is ignored.- eval_fdrfloat, optional
The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.
- method{“tdc”}, optional
The method for crema to use when calculating the confidence estimates.
- pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional
The method for Crema to use when calculating peptide level confidence estimates.
- prot_fdr_type{“best”, “combine”}, optional
The method for crema to use when calculating protein level confidence estimates. Default is “best”.
- thresholdfloat or “q-value”, optional
The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.
- Returns:
- Confidence object or List of Confidence objects
The confidence estimates for each PsmDataset.
Parsers
- crema.read_tide(txt_files, pairing_file_name=None, decoy_prefix='decoy_', copy_data=True)[source]
Read peptide-spectrum matches (PSMs) from Tide tab-delimited files.
- Parameters:
- txt_filesstr, pandas.DataFrame or tuple of str
One or more collection of PSMs in the Tide tab-delimited format.
- pairing_file_namestr, optional
A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences. This file can be generated by setting –peptide-list=T in tide-index.
- decoy_prefixstr, optional
The prefix used to indicate a decoy protein in the protein column. Default value is ‘decoy_’.
- copy_databool, optional
If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a
pandas.DataFrame
- Returns:
- PsmDataset
A
PsmDataset
object containing the parsed PSMs.
- crema.read_msamanda(txt_files, pairing_file_name=None, decoy_prefix='REV_', copy_data=True)[source]
Read peptide-spectrum matches (PSMs) from MSAmanda tab-delimited files.
- Parameters:
- txt_filesstr, pandas.DataFrame or tuple of str
One or more collection of PSMs in the MSAmanda tab-delimited format.
- pairing_file_namestr, optional
A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences.
- decoy_prefixstr, optional
The prefix used to indicate a decoy protein in the protein column. Default value is ‘REV_’.
- copy_databool, optional
If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a
pandas.DataFrame
- Returns:
- PsmDataset
A
PsmDataset
object containing the parsed PSMs.
- crema.read_msfragger(txt_files, pairing_file_name=None, decoy_prefix='rev_', copy_data=True)[source]
Read peptide-spectrum matches (PSMs) from MSFragger pepXML files.
- Parameters:
- txt_filesstr, pandas.DataFrame or tuple of str
One or more collection of PSMs in the MSFragger tab-delimited format.
- pairing_file_namestr, optional
A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences.
- decoy_prefixstr, optional
The prefix used to indicate a decoy protein in the protein column. Default value is ‘rev_’.
- copy_databool, optional
If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a
pandas.DataFrame
- Returns:
- PsmDataset
A
PsmDataset
object containing the parsed PSMs.
- crema.read_msgf(txt_files, pairing_file_name=None, decoy_prefix='XXX_', copy_data=True)[source]
Read peptide-spectrum matches (PSMs) from MSGF+ tab-delimited files.
- Parameters:
- txt_filesstr, pandas.DataFrame or tuple of str
One or more collection of PSMs in the MSGF+ tab-delimited format.
- pairing_file_namestr, optional
A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences.
- decoy_prefixstr, optional
The prefix used to indicate a decoy protein in the protein column. Default value is ‘XXX_’.
- copy_databool, optional
If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a
pandas.DataFrame
- Returns:
- PsmDataset
A
PsmDataset
object containing the parsed PSMs.
- crema.read_pepxml(pepxml_files, decoy_prefix)[source]
Read peptide-spectrum matches (PSMs) from pepXML files.
- Parameters:
- pepxml_filesstr or tuple of str
One or more collections of PSMs in the pepXML format.
- decoy_prefixstr
The prefix used to indicate a decoy protein in the description lines of the FASTA file.
- Returns:
- PsmDataset
A
PsmDataset
object containing the PSMs from the pepxml file.
- crema.read_mztab(mztab_files)[source]
Read peptide-spectrum matches (PSMs) from mzTab files.
- Parameters:
- mztab_filesstr or tuple of str
One or more collections of PSMs in the mzTab format.
- Returns:
- PsmDataset
A
PsmDataset
object containing the PSMs from the mzTab file.
- crema.read_txt(txt_files, target_column, spectrum_columns, score_columns, peptide_column, protein_column, protein_delim, sep='\t', pairing_file_name=None, copy_data=True)[source]
Read peptide-spectrum matches (PSMs) from delimited text files.
- Parameters:
- txt_filesstr, panda.DataFrame, or tuple of str
One or more collection of PSMs in a tabular text format.
- target_columnstr
The column that indicates whether a PSM is a target or a decoy.
- spectrum_columnsstr or tuple of str
One or more columns that together define a unique mass spectrum.
- score_columnsstr or tuple of str
One or more columns that indicate scores by which crema can rank PSMs.
- peptide_columnstr
The column that defines a unique peptide. Modifications should be indicated either in square brackets
[]
or parentheses()
. The exact modification format within these entities does not matter, so long as it is consistent.- protein_columnstr
The column that defines a unique protein.
- protein_delimstr
The delimiter to separate protein IDs.
- sepstr, optional
The delimiter to use.
- pairing_file_namestr, optional
A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences. This file can be generated by setting –peptide-list=T in tide-index.
- copy_databool, optional
If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when pin_files is a
pandas.DataFrame
- Returns:
- PsmDataset
A
PsmDataset
object containing the parsed PSMs.
Writers
- crema.to_txt(conf, output_dir=None, file_root=None, sep='\t', decoys=False, precision=6)[source]
Save confidence estimates to delimited text files.
Write the confidence estimates for each of the available levels (i.e. PSMs, peptides, proteins) to separate flat text files using the specified delimiter. If more than one collection of confidence estimates is provided, they will be combined, yielding a single file for each level specified by either dataset.
- Parameters:
- confConfidence object or tuple of Confidence objects
One or more
Confidence
objects.- output_dirstr or None, optional
The directory in which to save the files.
None
will use the current working directory.- file_rootstr or None, optional
An optional prefix for the confidence estimate files. The suffix will always be “crema.{level}.txt” where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).
- sepstr, optional
The delimiter to use.
- decoysbool, optional
Save decoys confidence estimates as well?
- precisionint, optional
Precision for float values.
- Returns:
- list of str
The paths to the saved files.