Functions

Primary Functions

crema.assign_confidence(psms, score_column=None, desc=None, eval_fdr=0.01, method='tdc', pep_fdr_type='psm-peptide', prot_fdr_type='best', threshold=0.01)[source]

Assign confidence estimates to a collection of peptide-spectrum matches.

Parameters:
psmsPsmDataset or list of PsmDataset objects

The collections of PSMs

score_columnstr, optional

The score by which to rank the PSMs for confidence estimation. If None, the score that yields the most PSMs at the specified false discovery rate threshold (eval_fdr), will be used.

descbool, optional

True if higher scores better, False if lower scores are better. If None, crema will try both and use the choice that yields the most PSMs at the specified false discovery rate threshold (eval_fdr). If score_column is None, this parameter is ignored.

eval_fdrfloat, optional

The false discovery rate threshold used to evaluate the best score_column and desc to choose. This should range from 0 to 1.

method{“tdc”}, optional

The method for crema to use when calculating the confidence estimates.

pep_fdr_type{“psm-only”,”peptide-only”,psm-peptide”}, optional

The method for Crema to use when calculating peptide level confidence estimates.

prot_fdr_type{“best”, “combine”}, optional

The method for crema to use when calculating protein level confidence estimates. Default is “best”.

thresholdfloat or “q-value”, optional

The FDR threshold for accepting discoveries. Default is 0.01. If “q-value” is chosen, then “accept” column is replaced with “crema q-value”.

Returns:
Confidence object or List of Confidence objects

The confidence estimates for each PsmDataset.

Parsers

crema.read_tide(txt_files, pairing_file_name=None, decoy_prefix='decoy_', copy_data=True)[source]

Read peptide-spectrum matches (PSMs) from Tide tab-delimited files.

Parameters:
txt_filesstr, pandas.DataFrame or tuple of str

One or more collection of PSMs in the Tide tab-delimited format.

pairing_file_namestr, optional

A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences. This file can be generated by setting –peptide-list=T in tide-index.

decoy_prefixstr, optional

The prefix used to indicate a decoy protein in the protein column. Default value is ‘decoy_’.

copy_databool, optional

If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a pandas.DataFrame

Returns:
PsmDataset

A PsmDataset object containing the parsed PSMs.

crema.read_msamanda(txt_files, pairing_file_name=None, decoy_prefix='REV_', copy_data=True)[source]

Read peptide-spectrum matches (PSMs) from MSAmanda tab-delimited files.

Parameters:
txt_filesstr, pandas.DataFrame or tuple of str

One or more collection of PSMs in the MSAmanda tab-delimited format.

pairing_file_namestr, optional

A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences.

decoy_prefixstr, optional

The prefix used to indicate a decoy protein in the protein column. Default value is ‘REV_’.

copy_databool, optional

If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a pandas.DataFrame

Returns:
PsmDataset

A PsmDataset object containing the parsed PSMs.

crema.read_msfragger(txt_files, pairing_file_name=None, decoy_prefix='rev_', copy_data=True)[source]

Read peptide-spectrum matches (PSMs) from MSFragger pepXML files.

Parameters:
txt_filesstr, pandas.DataFrame or tuple of str

One or more collection of PSMs in the MSFragger tab-delimited format.

pairing_file_namestr, optional

A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences.

decoy_prefixstr, optional

The prefix used to indicate a decoy protein in the protein column. Default value is ‘rev_’.

copy_databool, optional

If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a pandas.DataFrame

Returns:
PsmDataset

A PsmDataset object containing the parsed PSMs.

crema.read_msgf(txt_files, pairing_file_name=None, decoy_prefix='XXX_', copy_data=True)[source]

Read peptide-spectrum matches (PSMs) from MSGF+ tab-delimited files.

Parameters:
txt_filesstr, pandas.DataFrame or tuple of str

One or more collection of PSMs in the MSGF+ tab-delimited format.

pairing_file_namestr, optional

A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences.

decoy_prefixstr, optional

The prefix used to indicate a decoy protein in the protein column. Default value is ‘XXX_’.

copy_databool, optional

If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when txt_files is a pandas.DataFrame

Returns:
PsmDataset

A PsmDataset object containing the parsed PSMs.

crema.read_pepxml(pepxml_files, decoy_prefix)[source]

Read peptide-spectrum matches (PSMs) from pepXML files.

Parameters:
pepxml_filesstr or tuple of str

One or more collections of PSMs in the pepXML format.

decoy_prefixstr

The prefix used to indicate a decoy protein in the description lines of the FASTA file.

Returns:
PsmDataset

A PsmDataset object containing the PSMs from the pepxml file.

crema.read_mztab(mztab_files)[source]

Read peptide-spectrum matches (PSMs) from mzTab files.

Parameters:
mztab_filesstr or tuple of str

One or more collections of PSMs in the mzTab format.

Returns:
PsmDataset

A PsmDataset object containing the PSMs from the mzTab file.

crema.read_txt(txt_files, target_column, spectrum_columns, score_columns, peptide_column, protein_column, protein_delim, sep='\t', pairing_file_name=None, copy_data=True)[source]

Read peptide-spectrum matches (PSMs) from delimited text files.

Parameters:
txt_filesstr, panda.DataFrame, or tuple of str

One or more collection of PSMs in a tabular text format.

target_columnstr

The column that indicates whether a PSM is a target or a decoy.

spectrum_columnsstr or tuple of str

One or more columns that together define a unique mass spectrum.

score_columnsstr or tuple of str

One or more columns that indicate scores by which crema can rank PSMs.

peptide_columnstr

The column that defines a unique peptide. Modifications should be indicated either in square brackets [] or parentheses (). The exact modification format within these entities does not matter, so long as it is consistent.

protein_columnstr

The column that defines a unique protein.

protein_delimstr

The delimiter to separate protein IDs.

sepstr, optional

The delimiter to use.

pairing_file_namestr, optional

A tab-delimited file that explicity pairs target and decoy peptide sequences. Requires one column labled ‘target’ that contains target sequences and a second colun labeled ‘decoy’ that contains decoy sequences. This file can be generated by setting –peptide-list=T in tide-index.

copy_databool, optional

If true, a deep copy of the data is created. This uses more memory, but is safer because it prevents accidental modification of the underlying data. This argument only has an effect when pin_files is a pandas.DataFrame

Returns:
PsmDataset

A PsmDataset object containing the parsed PSMs.

Writers

crema.to_txt(conf, output_dir=None, file_root=None, sep='\t', decoys=False, precision=6)[source]

Save confidence estimates to delimited text files.

Write the confidence estimates for each of the available levels (i.e. PSMs, peptides, proteins) to separate flat text files using the specified delimiter. If more than one collection of confidence estimates is provided, they will be combined, yielding a single file for each level specified by either dataset.

Parameters:
confConfidence object or tuple of Confidence objects

One or more Confidence objects.

output_dirstr or None, optional

The directory in which to save the files. None will use the current working directory.

file_rootstr or None, optional

An optional prefix for the confidence estimate files. The suffix will always be “crema.{level}.txt” where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).

sepstr, optional

The delimiter to use.

decoysbool, optional

Save decoys confidence estimates as well?

precisionint, optional

Precision for float values.

Returns:
list of str

The paths to the saved files.