Sequence Similarity Networks

From EFI
Jump to: navigation, search

Description

Sequence Similarity Networks (SSNs) use a combination of programs (blast, cdhit, etc) to generate xgmml files and graphs for creating networks in Cytoscape.  Below is documentation on each part of the pipeline and each step associated with it.

Pipeline

Step 1 - generatedata.pl

Creates a Similarity Data Folder (SDF) that contains all blast results and annotations needed by subsequent pipeline programs.   Also creates graphs to allow for easy analysis of data.  This program essentially creates several bash scripts and submits them to a cluster.  The expected queuing system on this is torque.

Step 2 - filterandgraph.pl (optional)

Rarely used and creates graphs based of different possible filterings without creating xgmml files.  Not documented well due to upcoming major code changes.

Step 3 - analyzedata.pl

Filters data based off combination of evalue, bitscore, or perent id and the lengths of sequences.  The proceeds to make graphs based off this new filtered folder and create xgmml files.

Step 4 - recreate-from-xgmml.pl (optional)

Creates a SDF using an existing SDF and a xgmml file.  Used to do further processing of a xgmml that has been manually altered.

Perl Programs

The meat of created SSNs is created by Perl programs.  These programs are broken into three diffent areas.

  1. SDF generation scripts (triggered by generatedata.pl)
  2. SDF analyzing scripts (triggered by analyzedata.pl)
  3. SDF recreation script (triggered by regreate from xgmml.pl)

Database

Database creation is still under extremely heavy development.  More documentation on how to create a database to come later.


Downloads

Soon downloads of databases and code releases will be here.