Confidence

The Confidence class is used to define a collection of peptide-spectrum matches with calculated false discovery rates (FDR) and q-values.

class crema.confidence.MixmaxConfidence(psms, score_column=None, desc=None, eval_fdr=0.01, pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]

Assign confidence estimates using mix-max competition.

Estimates qvalues using the mix-max competition method. To use this method a separate target and decoy database search using a calibrated score function must be used.

# TODO Describe how mixmax works here

Additional details can be found in this manuscript. U. Keich, A. Kertesz-Farkas, and W. S. Noble. Improved false discovery rate estimation procedure for shotgun proteomics. Journal of Proteome Research, 14(8):3148-3161, 2015.

Parameters:

psmsa PsmDataset object: A collection of PSMs
score_columnstr, optional: The score by which to rank the PSMs for confidence estimation. If None, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.
descbool, optional: True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is None, this parameter is ignored.
eval_fdrfloat, optional: The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.
pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional: The method for crema to use when calculating peptide level confidence estimates. Default is “psm-peptide”.
prot_fdr_type{“best”, “combine”}, optional: The method for crema to use when calculating protein level confidence estimates. Default is “best”. estimates.
thresholdfloat or “q-value”, optional: The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.

Attributes:

datapandas.DataFrame: The collection of PSMs as a pandas.DataFrame.
datasetcrema.PsmDataset: The underlying PsmDataset
levelslist of str: The available levels of confidence estimates
confidence_estimatesDict: A dictionary containing the confidence estimates at each level, each as a pandas.DataFrame.
decoy_confidence_estimatesDict: A dictionary containing the confidence estimates for the decoy hits at each level, each as a pandas.DataFrame

Methods

to_txt([output_dir, file_root, sep, decoys])

Save confidence estimates to delimited text files.

to_txt(output_dir=None, file_root=None, sep='\t', decoys=False)

Save confidence estimates to delimited text files.

Parameters:

output_dirstr or None, optional: The directory in which to save the files. None will use the current working directory.
file_rootstr or None, optional: An optional prefix for the confidence estimate files. The suffix will always be “crema.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).
sepstr, optional: The delimiter to use.
decoysbool, optional: Save decoys confidence estimates as well?

Returns:

list of str: The paths to the saved files.

property data: The collection of PSMs as a pandas.DataFrame.

property dataset: The underlying PsmDataset

property levels: The available levels of confidence estimates

class crema.confidence.TdcConfidence(psms, score_column=None, desc=None, eval_fdr=0.01, pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]

Assign confidence estimates using target decoy competition.

Estimates q-values using the target decoy competition method. For set of target and decoy PSMs meeting a specified score threshold, the false discovery rate (FDR) is estimated as:

\[FDR = \frac{Decoys + 1}{Targets}\]

More formally, let the scores of target and decoy PSMs be indicated as \(f_1, f_2, ..., f_{m_f}\) and \(d_1, d_2, ..., d_{m_d}\), respectively. For a score threshold \(t\), the false discovery rate is estimated as:

\[E\{FDR(t)\} = \frac{|\{d_i > t; i=1, ..., m_d\}| + 1} {\{|f_i > t; i=1, ..., m_f|\}}\]

The reported q-value for each PSM is the minimum FDR at which that PSM would be accepted.

Parameters:

psmsa PsmDataset object: A collection of PSMs
score_columnstr, optional: The score by which to rank the PSMs for confidence estimation. If None, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.
descbool, optional: True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is None, this parameter is ignored.
eval_fdrfloat, optional: The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.
pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional: The method for crema to use when calculating peptide level confidence estimates.
prot_fdr_type{“best”, “combine”}, optional: The method for crema to use when calculating protein level confidence estimates. Default is “best”.
thresholdfloat or “q-value”, optional: The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.

Attributes:

datapandas.DataFrame: The collection of PSMs as a pandas.DataFrame.
datasetcrema.PsmDataset: The underlying PsmDataset
levelslist of str: The available levels of confidence estimates
confidence_estimatesDict: A dictionary containing the confidence estimates at each level, each as a pandas.DataFrame.
decoy_confidence_estimatesDict: A dictionary containing the confidence estimates for the decoy hits at each level, each as a pandas.DataFrame

Methods

to_txt([output_dir, file_root, sep, decoys])

Save confidence estimates to delimited text files.

to_txt(output_dir=None, file_root=None, sep='\t', decoys=False)

Save confidence estimates to delimited text files.

Parameters:

output_dirstr or None, optional: The directory in which to save the files. None will use the current working directory.
file_rootstr or None, optional: An optional prefix for the confidence estimate files. The suffix will always be “crema.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).
sepstr, optional: The delimiter to use.
decoysbool, optional: Save decoys confidence estimates as well?

Returns:

list of str: The paths to the saved files.

property data: The collection of PSMs as a pandas.DataFrame.

property dataset: The underlying PsmDataset

property levels: The available levels of confidence estimates

crema.confidence.assign_confidence(psms, score_column=None, desc=None, eval_fdr=0.01, method='tdc', pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]

Assign confidence estimates to a collection of peptide-spectrum matches.

Parameters:

psmsPsmDataset or list of PsmDataset objects: The collections of PSMs
score_columnstr, optional: The score by which to rank the PSMs for confidence estimation. If None, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.
descbool, optional: True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is None, this parameter is ignored.
eval_fdrfloat, optional: The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.
method{“tdc”}, optional: The method for crema to use when calculating the confidence estimates.
pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional: The method for Crema to use when calculating peptide level confidence estimates.
prot_fdr_type{“best”, “combine”}, optional: The method for crema to use when calculating protein level confidence estimates. Default is “best”.
thresholdfloat or “q-value”, optional: The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.

Returns:

Confidence object or List of Confidence objects: The confidence estimates for each PsmDataset.