Kielce, Poland 2007

gp2fasta

gp2fasta is converting gp files from NCBI GenPept or GenBank format to fasta. Its main purpose is to create fasta files with short, but still accurate headers for sequence.

For example:
>Strpur-115729834-h
PNQILMQFRLDDNGSSYYKELASIIYGASPEFELAIFTVCFKENPNALSTFTMAGGITQKVQTWDYNGGYIGSAYFSV

stands for:
gi: 115729834
organism: Strongylocentrotus purpuratus
additional: h (hypothetical protein)
sequence:PNQILMQFRLDDNGSSYYKELASIIYGASPEFELAIFTVCFKENPNALSTFTMAGGITQKVQTWDYNGGYIGSAYFSV

Options:
- for id: GI or LOCUS;
- for organism: e.g. Mus musculus, M.musculeu or Musmus;
- detailed definition;
- additional information:
    P -> PREDICTED
    s -> similar
    h -> hypothetical protein
    u -> unnamed protein product
    n -> novel
    p -> putative
    o -> open reading frame

Each option is separated with "separator" (in this case "-").

Example file
Example file2
Mycoplasmoides genitalium G37 (proteome, 603 proteins)
hsp23 (Heat Shock Protein 23, 228 proteins)

Free QT4 version of gp2fasta with GUI

My other projects:

MetaDisorder - Prediction of Intrinsically Unstructured Proteins (protein disorder) from amino acid sequence only
GeneSilico fold recognition server - development and maintenance (over 100 bioinformatics tools integrated, 3000 registered users)
CompaRNA - continuous benchmarking of RNA structure prediction methods
GDFuzz3D - protein contact map to 3D structure retrieval service
Protein isoelectric point calculator - isoelectric point and molecular weight from protein sequence
Shannon entropy calculator - Real example how to calculate and interpret information entropy
RNA metaserver - Meta-tool for prediction of RNA secondary structure