Confidence

The Confidence class is used to define a collection of peptide-spectrum matches with calculated false discovery rates (FDR) and q-values.

class crema.confidence.MixmaxConfidence(psms, score_column=None, desc=None, eval_fdr=0.01, pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]

Assign confidence estimates using mix-max competition.

Estimates qvalues using the mix-max competition method. To use this method a separate target and decoy database search using a calibrated score function must be used.

# TODO Describe how mixmax works here

Additional details can be found in this manuscript. U. Keich, A. Kertesz-Farkas, and W. S. Noble. Improved false discovery rate estimation procedure for shotgun proteomics. Journal of Proteome Research, 14(8):3148-3161, 2015.

Parameters:
psmsa PsmDataset object

A collection of PSMs

score_columnstr, optional

The score by which to rank the PSMs for confidence estimation. If None, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.

descbool, optional

True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is None, this parameter is ignored.

eval_fdrfloat, optional

The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.

pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional

The method for crema to use when calculating peptide level confidence estimates. Default is “psm-peptide”.

prot_fdr_type{“best”, “combine”}, optional

The method for crema to use when calculating protein level confidence estimates. Default is “best”. estimates.

thresholdfloat or “q-value”, optional

The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.

Attributes:
datapandas.DataFrame

The collection of PSMs as a pandas.DataFrame.

datasetcrema.PsmDataset

The underlying PsmDataset

levelslist of str

The available levels of confidence estimates

confidence_estimatesDict

A dictionary containing the confidence estimates at each level, each as a pandas.DataFrame.

decoy_confidence_estimatesDict

A dictionary containing the confidence estimates for the decoy hits at each level, each as a pandas.DataFrame

Methods

to_txt([output_dir, file_root, sep, decoys])

Save confidence estimates to delimited text files.

to_txt(output_dir=None, file_root=None, sep='\t', decoys=False)

Save confidence estimates to delimited text files.

Parameters:
output_dirstr or None, optional

The directory in which to save the files. None will use the current working directory.

file_rootstr or None, optional

An optional prefix for the confidence estimate files. The suffix will always be “crema.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).

sepstr, optional

The delimiter to use.

decoysbool, optional

Save decoys confidence estimates as well?

Returns:
list of str

The paths to the saved files.

property data

The collection of PSMs as a pandas.DataFrame.

property dataset

The underlying PsmDataset

property levels

The available levels of confidence estimates

class crema.confidence.TdcConfidence(psms, score_column=None, desc=None, eval_fdr=0.01, pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]

Assign confidence estimates using target decoy competition.

Estimates q-values using the target decoy competition method. For set of target and decoy PSMs meeting a specified score threshold, the false discovery rate (FDR) is estimated as:

\[FDR = \frac{Decoys + 1}{Targets}\]

More formally, let the scores of target and decoy PSMs be indicated as \(f_1, f_2, ..., f_{m_f}\) and \(d_1, d_2, ..., d_{m_d}\), respectively. For a score threshold \(t\), the false discovery rate is estimated as:

\[E\{FDR(t)\} = \frac{|\{d_i > t; i=1, ..., m_d\}| + 1} {\{|f_i > t; i=1, ..., m_f|\}}\]

The reported q-value for each PSM is the minimum FDR at which that PSM would be accepted.

Parameters:
psmsa PsmDataset object

A collection of PSMs

score_columnstr, optional

The score by which to rank the PSMs for confidence estimation. If None, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.

descbool, optional

True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is None, this parameter is ignored.

eval_fdrfloat, optional

The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.

pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional

The method for crema to use when calculating peptide level confidence estimates.

prot_fdr_type{“best”, “combine”}, optional

The method for crema to use when calculating protein level confidence estimates. Default is “best”.

thresholdfloat or “q-value”, optional

The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.

Attributes:
datapandas.DataFrame

The collection of PSMs as a pandas.DataFrame.

datasetcrema.PsmDataset

The underlying PsmDataset

levelslist of str

The available levels of confidence estimates

confidence_estimatesDict

A dictionary containing the confidence estimates at each level, each as a pandas.DataFrame.

decoy_confidence_estimatesDict

A dictionary containing the confidence estimates for the decoy hits at each level, each as a pandas.DataFrame

Methods

to_txt([output_dir, file_root, sep, decoys])

Save confidence estimates to delimited text files.

to_txt(output_dir=None, file_root=None, sep='\t', decoys=False)

Save confidence estimates to delimited text files.

Parameters:
output_dirstr or None, optional

The directory in which to save the files. None will use the current working directory.

file_rootstr or None, optional

An optional prefix for the confidence estimate files. The suffix will always be “crema.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).

sepstr, optional

The delimiter to use.

decoysbool, optional

Save decoys confidence estimates as well?

Returns:
list of str

The paths to the saved files.

property data

The collection of PSMs as a pandas.DataFrame.

property dataset

The underlying PsmDataset

property levels

The available levels of confidence estimates

crema.confidence.assign_confidence(psms, score_column=None, desc=None, eval_fdr=0.01, method='tdc', pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]

Assign confidence estimates to a collection of peptide-spectrum matches.

Parameters:
psmsPsmDataset or list of PsmDataset objects

The collections of PSMs

score_columnstr, optional

The score by which to rank the PSMs for confidence estimation. If None, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.

descbool, optional

True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is None, this parameter is ignored.

eval_fdrfloat, optional

The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.

method{“tdc”}, optional

The method for crema to use when calculating the confidence estimates.

pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional

The method for Crema to use when calculating peptide level confidence estimates.

prot_fdr_type{“best”, “combine”}, optional

The method for crema to use when calculating protein level confidence estimates. Default is “best”.

thresholdfloat or “q-value”, optional

The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.

Returns:
Confidence object or List of Confidence objects

The confidence estimates for each PsmDataset.