Parse elements of a taxonomy vector

These are provided as both example and default functions for parsing a character vector of taxonomic rank information for a single taxa. As default functions, these are intended for cases where the data adheres to the naming convention used by greengenes the naming convention used by greengenes and silva. (http://greengenes.lbl.gov/cgi-bin/nph-index.cgi) or where the convention is unknown, respectively. To work, these functions -- and any similar custom function you may want to create and use -- must take as input a single character vector of taxonomic ranks for a single OTU, and return a named character vector that has been modified appropriately (according to known naming conventions, desired length limits, etc. The length (number of elements) of the output named vector does not need to be equal to the input, which is useful for the cases where the source data files have extra meaningless elements that should probably be removed, like the ubiquitous ``Root'' element often found in greengenes/QIIME taxonomy labels. In the case of parse_taxonomy_default, no naming convention is assumed and so dummy rank names are added to the vector. More usefully if your taxonomy data is based on greengenes, the parse_taxonomy_greengenes function clips the first 3 characters that identify the rank, and uses these to name the corresponding element according to the appropriate taxonomic rank name used by greengenes (e.g. "p__" at the beginning of an element means that element is the name of the phylum to which this OTU belongs). If you taxonomy data is based on SILVA, the parse_taxonomy_silva_128 function clips the first 5 characters that identify rank, and uses these to name the corresponding element according to the appropriate taxonomic rank name used by SILVA (e.g. "D_1__" at the beginning of an element means that element is the name of the phylum to which this OTU belongs. Alternatively you can create your own function to parse this data. Most importantly, the expectations for these functions described above make them compatible to use during data import, specifically the import_biom function, but it is a flexible structure that will be implemented soon for all phyloseq import functions that deal with taxonomy (e.g. import_qiime).

parse_taxonomy_silva_128(char.vec)

Arguments

char.vec	(Required). A single character vector of taxonomic ranks for a single OTU, unprocessed (ugly).

Value

A character vector in which each element is a different taxonomic rank of the same OTU, and each element name is the name of the rank level. For example, an element might be "Firmicutes" and named "phylum". These parsed, named versions of the taxonomic vector should reflect embedded information, naming conventions, desired length limits, etc; or in the case of parse_taxonomy_default, not modified at all and given dummy rank names to each element.

Details

This function is currently under PR review by phyloseq in a well supported pull request: https://github.com/joey711/phyloseq/pull/854. If you use this function, then please comment on the GitHub PR to encourage merging this feature.

Examples

# NOT RUN {
 > taxvec1 = c("Root", "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales",
 "f__Staphylococcaceae")
 > parse_taxonomy_default(taxvec1)
 > parse_taxonomy_greengenes(taxvec1)
 > taxvec2 = c("Root;k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae")
 > parse_taxonomy_qiime(taxvec2)
 > taxvec3 = c("D_0__Bacteria", "D_1__Firmicutes", "D_2__Bacilli", "D_3__Staphylococcaceae")
 > parse_taxonomy_silva_128(taxvec3)
 
# }

Arguments

Value

Details

See also

Examples

Contents