Skip to content
Snippets Groups Projects

A multitask transfer learning framework for the virus-human framework for the prediction of virus-human protein-protein interactions

model.png

The code was tested on python 3.7+. All required packages are put in requirements.txt All data used in our experiments are included in the ``data'' folder. The UniRep folder contains the code and pre-trained UniREP model parameters.

  1. To generate the UniREP embedding from protein sequences, please run:

python getUniRep_emb.py --data_path SEQ_FILE_PATH

where SEQ_FILE_PATH is the path to the sequence file which in csv format: the first column contains the protein Id and the second column contains its sequence.

  1. To evaluate MTT model, run:

python mtt.py --data_dir dataset/XXX

where XXX is either [h1n1, ebola, denovo, barman, bacteria, deepviral, sar]. However, for [barman, bacteria, deepviral, sar], the paths to the positive + negative training and testing are needed to be changed also. Other configurable parameters include:

  • fixval: is True if we use a fix validation set, False otherwise (the system will randomly deduct 10% from the training set for validation). The default value is False
  1. To evaluate MTT model for the Novel H1N1 and Novel Ebola datasets, run:

python mtt_novel.py --data_dir dataset/XXX

where XXX is either [h1n1, ebola]

  1. To evaluate the other models (Doc2vec, Denovo, Generalized, STT), follows similar procedure like the one mentioned in step 2 and 3, respectively. The XXX_novel.py files are always used to evaluate the corresponding models on the two novel testing datasets.

  2. To generate the line plots with error bar, run:

python utils/lineplot_with_errorbar.py

More details about UniREP model can be found at: https://github.com/churchlab/UniRep

All Deepviral prediction scores were retrieved from: https://zenodo.org/record/4429824#.YPV1LSWxWV4.

All code and pre-trained model for Doc2vec method was taken from http://zzdlab.com/InterSPPI/