Confidence
The Confidence
class is used to define a collection of
peptide-spectrum matches with calculated false discovery rates (FDR) and q-values.
- class crema.confidence.MixmaxConfidence(psms, score_column=None, desc=None, eval_fdr=0.01, pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]
Assign confidence estimates using mix-max competition.
Estimates qvalues using the mix-max competition method. To use this method a separate target and decoy database search using a calibrated score function must be used.
# TODO Describe how mixmax works here
Additional details can be found in this manuscript. U. Keich, A. Kertesz-Farkas, and W. S. Noble. Improved false discovery rate estimation procedure for shotgun proteomics. Journal of Proteome Research, 14(8):3148-3161, 2015.
- Parameters:
- psmsa PsmDataset object
A collection of PSMs
- score_columnstr, optional
The score by which to rank the PSMs for confidence estimation. If
None
, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.- descbool, optional
True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is
None
, this parameter is ignored.- eval_fdrfloat, optional
The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.
- pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional
The method for crema to use when calculating peptide level confidence estimates. Default is “psm-peptide”.
- prot_fdr_type{“best”, “combine”}, optional
The method for crema to use when calculating protein level confidence estimates. Default is “best”. estimates.
- thresholdfloat or “q-value”, optional
The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.
- Attributes:
data
pandas.DataFrameThe collection of PSMs as a
pandas.DataFrame
.dataset
crema.PsmDatasetThe underlying
PsmDataset
levels
list of strThe available levels of confidence estimates
- confidence_estimatesDict
A dictionary containing the confidence estimates at each level, each as a
pandas.DataFrame
.- decoy_confidence_estimatesDict
A dictionary containing the confidence estimates for the decoy hits at each level, each as a
pandas.DataFrame
Methods
to_txt
([output_dir, file_root, sep, decoys])Save confidence estimates to delimited text files.
- to_txt(output_dir=None, file_root=None, sep='\t', decoys=False)
Save confidence estimates to delimited text files.
- Parameters:
- output_dirstr or None, optional
The directory in which to save the files. None will use the current working directory.
- file_rootstr or None, optional
An optional prefix for the confidence estimate files. The suffix will always be “crema.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).
- sepstr, optional
The delimiter to use.
- decoysbool, optional
Save decoys confidence estimates as well?
- Returns:
- list of str
The paths to the saved files.
- property data
The collection of PSMs as a
pandas.DataFrame
.
- property dataset
The underlying
PsmDataset
- property levels
The available levels of confidence estimates
- class crema.confidence.TdcConfidence(psms, score_column=None, desc=None, eval_fdr=0.01, pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]
Assign confidence estimates using target decoy competition.
Estimates q-values using the target decoy competition method. For set of target and decoy PSMs meeting a specified score threshold, the false discovery rate (FDR) is estimated as:
\[FDR = \frac{Decoys + 1}{Targets}\]More formally, let the scores of target and decoy PSMs be indicated as \(f_1, f_2, ..., f_{m_f}\) and \(d_1, d_2, ..., d_{m_d}\), respectively. For a score threshold \(t\), the false discovery rate is estimated as:
\[E\{FDR(t)\} = \frac{|\{d_i > t; i=1, ..., m_d\}| + 1} {\{|f_i > t; i=1, ..., m_f|\}}\]The reported q-value for each PSM is the minimum FDR at which that PSM would be accepted.
- Parameters:
- psmsa PsmDataset object
A collection of PSMs
- score_columnstr, optional
The score by which to rank the PSMs for confidence estimation. If
None
, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.- descbool, optional
True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is
None
, this parameter is ignored.- eval_fdrfloat, optional
The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.
- pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional
The method for crema to use when calculating peptide level confidence estimates.
- prot_fdr_type{“best”, “combine”}, optional
The method for crema to use when calculating protein level confidence estimates. Default is “best”.
- thresholdfloat or “q-value”, optional
The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.
- Attributes:
data
pandas.DataFrameThe collection of PSMs as a
pandas.DataFrame
.dataset
crema.PsmDatasetThe underlying
PsmDataset
levels
list of strThe available levels of confidence estimates
- confidence_estimatesDict
A dictionary containing the confidence estimates at each level, each as a
pandas.DataFrame
.- decoy_confidence_estimatesDict
A dictionary containing the confidence estimates for the decoy hits at each level, each as a
pandas.DataFrame
Methods
to_txt
([output_dir, file_root, sep, decoys])Save confidence estimates to delimited text files.
- to_txt(output_dir=None, file_root=None, sep='\t', decoys=False)
Save confidence estimates to delimited text files.
- Parameters:
- output_dirstr or None, optional
The directory in which to save the files. None will use the current working directory.
- file_rootstr or None, optional
An optional prefix for the confidence estimate files. The suffix will always be “crema.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).
- sepstr, optional
The delimiter to use.
- decoysbool, optional
Save decoys confidence estimates as well?
- Returns:
- list of str
The paths to the saved files.
- property data
The collection of PSMs as a
pandas.DataFrame
.
- property dataset
The underlying
PsmDataset
- property levels
The available levels of confidence estimates
- crema.confidence.assign_confidence(psms, score_column=None, desc=None, eval_fdr=0.01, method='tdc', pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]
Assign confidence estimates to a collection of peptide-spectrum matches.
- Parameters:
- psmsPsmDataset or list of PsmDataset objects
The collections of PSMs
- score_columnstr, optional
The score by which to rank the PSMs for confidence estimation. If
None
, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.- descbool, optional
True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is
None
, this parameter is ignored.- eval_fdrfloat, optional
The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.
- method{“tdc”}, optional
The method for crema to use when calculating the confidence estimates.
- pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional
The method for Crema to use when calculating peptide level confidence estimates.
- prot_fdr_type{“best”, “combine”}, optional
The method for crema to use when calculating protein level confidence estimates. Default is “best”.
- thresholdfloat or “q-value”, optional
The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.
- Returns:
- Confidence object or List of Confidence objects
The confidence estimates for each PsmDataset.