A multitask transfer learning framework for the virus-human framework for the prediction of virus-human protein-protein interactions
The code was tested on python 3.7+. All required packages are put in requirements.txt
All data used in our experiments are included in the ``data'' folder. The UniRep
folder contains the code and pre-trained UniREP model parameters.
- To generate the UniREP embedding from protein sequences, please run:
python getUniRep_emb.py --data_path SEQ_FILE_PATH
where SEQ_FILE_PATH is the path to the sequence file which in csv format: the first column contains the protein Id and the second column contains its sequence.
- To evaluate MTT model, run:
python mtt.py --data_dir dataset/XXX
where XXX
is either [h1n1, ebola, denovo, barman, bacteria, deepviral, sar]. However, for [barman, bacteria, deepviral, sar], the paths to the positive + negative training and testing are needed to be changed also. Other configurable parameters include:
- fixval: is True if we use a fix validation set, False otherwise (the system will randomly deduct 10% from the training set for validation). The default value is False
- To evaluate MTT model for the Novel H1N1 and Novel Ebola datasets, run:
python mtt_novel.py --data_dir dataset/XXX
where XXX
is either [h1n1, ebola]
-
To evaluate the other models (Doc2vec, Denovo, Generalized, STT), follows similar procedure like the one mentioned in step 2 and 3, respectively. The
XXX_novel.py
files are always used to evaluate the corresponding models on the two novel testing datasets. -
To generate the line plots with error bar, run:
python utils/lineplot_with_errorbar.py
More details about UniREP model can be found at: https://github.com/churchlab/UniRep
All Deepviral prediction scores were retrieved from: https://zenodo.org/record/4429824#.YPV1LSWxWV4.
All code and pre-trained model for Doc2vec method was taken from http://zzdlab.com/InterSPPI/