Getting Started

Crema produces confidence estimates for peptide detection in mass spectrometry proteomics experiments. It takes as input one or more files containing peptide-spectrum matches (PSMs), executes the desired estimation method, and produces as output confidence estimates at the PSM, peptide and protein levels.

Introduction

One of the fundamental tasks in mass spectrometry proteomics is detecting peptides on the basis of the observed mass spectra. Many tools exist to assign peptides to spectra, but unfortunately this matching is never 100% accurate, meaning that there is always uncertainty about whether a given peptide detection is correct or a false positive. We want to be able to quantify this uncertainty so that we can be confident in our conclusions and ensure that expensive downstream validation experiments use relevant and accurate data.

Crema is a Python package that implements various methods to estimate false discovery rates (FDRs) in mass spectrometry proteomics experiments. Crema focuses on methods that rely on the concept of “target-decoy competition.” The sole purposes of crema is to do decoy-based FDR estimation, and to do it well. As a result, crema is lightweight and flexible. It has minimal dependencies and supports a wide range of input and output formats. On top of that, it is extremely simple to use.

Ready to try crema for your analyses? See below for details on how to install and use crema.

Installation

Before you can install and use crema, you’ll need to have Python 3.6+ installed. If you think it may be installed, you can check with:

$ python3 --version

If you need to install Python, we recommend using the Anaconda Python distribution. This distribution comes with most of the crema dependencies installed and provides the conda package manager.

Crema also depends on several Python packages:

We recommend using pip to install crema. Missing dependencies will also be installed automatically:

$ pip3 install crema-ms

Basic Usage

Use crema from the Command Line

If your input files are in one of crema’s supported file formats, such mzTab or Tide tab-delimited, then simple crema analyses can be performed straight from the command line.

Suppose your mzTab file is located at the directory “data/psms.mztab”. Simply run the following command:

$ crema data/psms.mztab

Alternatively, if your Tide files are located in “data/target_psms.txt” and “data/decoy_psms.txt”, then you would run the following command:

$ crema data/target_psms.txt data/decoy_psms.txt

That’s it. The software will run the target-decoy competition FDR estimation method using information from your files to calculate confidence estimates for the given data.

Your results will be saved in your working directory as .txt files named “crema.psms.txt”, “crema.peptides.txt”, and “crema.proteins.txt”. These files will contain an additional column (“crema q-value”) that is appended to several columns (specifically those that identify the PSM, peptide sequence, and score) parsed from the input file.

For a full list of parameters, see the Command Line Interface.

Use crema as a Python Package

Here is a simple demonstration of how to use crema as an API:

>>> import crema
>>> input_files = ["data/target_psms.txt", "decoy_psms/decoys.txt"]
>>> pairing_file = "pairing_file.txt"
>>> psms = crema.read_tide(input_files, pairing_file_name=pairing_file)
>>> results =  psms.assign_confidence(score_column="combined p-value", pep_fdr_type="psm-peptide")
>>> results.to_txt(output_dir="example_output_dir", file_root="test", sep="\t", decoys=False)

Let’s break this down and see what’s really happening.

First, start up the Python interpreter:

$ python3

Next, import crema as a package:

>>> import crema

Call the read_tide() method and pass in the desired input files. The files “data/target_psms.txt” and “data/decoy_psms.txt” contains PSMs and are in the required Tide file format. In addition, the pairing_file is an optional argument that explicitly pairs target and decoy peptides. The read_tide() method will return a dataset object that we will save as “psms” in this example:

>>> input_files = ["data/target_psms.txt", "decoy_psms/decoys.txt"]
>>> pairing_file = "pairing_file.txt"
>>> psms = crema.read_tide(input_files. pairing_file_name=pairing_file)

Note that you can replace read_tide() with other methods such as read_txt() and read_msgf(). Also note that while, in this example, the target and decoy PSMs are separate files, they can combined together and passed as a single file.

Execute the desired FDR estimation method by calling the assign_confidence method on the dataset object that we created above. This operation will return a confidence object that we will save as “results”:

>>> results =  psms.assign_confidence(score_column="combined p-value", pep_fdr_type="psm-peptide")

Note that the parameters passed here are optional and are only specified here for demonstration. Further details can be found in the documentation for the dataset class.

Also note that the pep_fdr_type argument denotes the method used to estimate peptide-level FDR. This argument supports three options: psm-only, peptide-only, and psm-peptide. A pairing file is required to run the peptide-only or psm-peptide options. Also, note that peptide-only requires a separate target and decoy database search. If peptide-only is used in conjunction with a concatenated target-decoy search, then it becomes equivalent to psm-peptide.

Confidence objects contain a to_txt() method that allows you to write your results to a text file. Your results will be saved in your working directory (unless otherwise specified) as text files named “crema.psms.txt”, “crema.peptides.txt”, and “crema.proteins.txt”. These files will contain an additional column (“crema q-value”) that is appended to several columns (specifically those that identify the PSM, peptide sequence, and score) parsed from the input file.

>>> results.to_txt(output_dir="example_output_dir", file_root=None, sep="\t", decoys=False)

Note that the parameters passed here are optional and are only specified here for demonstration. Further details can be found in the documentation for the confidence class.

That’s all there is to it. You have successfully used crema as an API to calculate confidence estimates for your data.

Supported Database Search Engines

Crema currently supports output generated from Tide, MSGF+, MSAmanda, Comet, MSFragger.

In addition, crema supports input files from any search engine that are in the following file formats: mzTab, pepXML, and generic tab-delimited text files.