#Readme file for datasets and Resultsets formats

Datasets
--------
Protein groups analyzed
1) Yeast KEGG Pathway sets
--------------------------
Each pathway record has first line containing pathway identifier, name, and number of proteins. This is followed by list of UniProt Ids of proteins in the pathway. And a pathway record ends with ####.

2) Yeast Protein complex sets
-----------------------------
Each Protein complex record has first line containing protein complex identifier, name, and number of proteins. This is followed by a list of SGD ids and corresponding mapped UniProt Ids for proteins in that complex. And the record ends with ####.

3) Yeast GOcc (Gene Ontology cellular component based) sets
-----------------------------------------------------------
Each GOcc set record has first line containing GO CC term on which the set is based, name, and number of proteins. This is followed by list of UniProt Ids of proteins in the GOcc set. And a GOcc record ends with ####.

4) Yeast Random protein sets
----------------------------
Each random record has first line containing random set identifier, name, and number of proteins. This is followed by list of UniProt Ids of proteins in the random set. And a random set record ends with ####.

Protein-Protein Interactions (PPIs)
-----------------------------------
Each line for yeast and human interaction files contains a pair of interacting proteins. For yeast the protein pairs are represented using SGD ids and for human using Entrez gene ids. We provide the actual interaction, true negative interactions used for ROC curve and background random interactions


Resultsets
----------
Coherence computation results using various techniques on above datasets
1) CAS_coherence
----------------
The file contains set identifier, set name, number of protein in the set, CAS_coherence score, p-value each separated by '|'.

2) PAS_coherence
----------------
The file contains set identifier, set name, number of protein in the set, PAS_coherence score, p-value each separated by '|'.

3) funsim_coherence
-------------------
The file contains set identifier, set name, number of protein in the set, funsim_coherence score, BP_coherence score, MF_coherence score, CC_coherence score, p-value for funsim_coherence score each separated by '|'.

4) Chagoyen_coherence
---------------------
The file contains set identifier, set name, number of protein in the set, number of annotated proteins considered, Chagoyen_coherence score, p-value each separated by '|'.

5) Pandey_coherence
-------------------
The file contains set identifier, set name, number of protein in the set, number of annotated proteins considered, Pandey_coherence score, p-value each separated by '|'.

Interacting pair similarity results using various techniques
Each file is based on the name of technique used for computing protein pairwise similarity. It contains the pairs of proteins and the corresponding similarity score.