Ngan Thi Dong
multitask_transfer

Repository



A multitask transfer learning framework for the virus-human framework for the prediction of virus-human protein-protein interactions

The code was tested on python 3.7+. All required packages are put in requirements.txt
All data used in our experiments are included in the ``data'' folder. The UniRep folder contains the code and pre-trained UniREP model parameters.

To generate the UniREP embedding from protein sequences, please run:

python getUniRep_emb.py --data_path SEQ_FILE_PATH
where SEQ_FILE_PATH is the path to the sequence file which in csv format: the first column contains the protein Id and the second column contains its sequence.

To evaluate MTT model, run:

python mtt.py --data_dir dataset/XXX
where XXX is either [h1n1, ebola, denovo, barman, bacteria, deepviral, sar]. However, for [barman, bacteria, deepviral, sar], the paths to the positive + negative training and testing are needed to be changed also. Other configurable parameters include:


fixval: is True if we use a fix validation set, False otherwise (the system will randomly deduct 10% from the training set for validation). The default value is False


To evaluate MTT model for the Novel H1N1 and Novel Ebola datasets, run:

python mtt_novel.py --data_dir dataset/XXX
where XXX is either [h1n1, ebola]


To evaluate the other models (Doc2vec, Denovo, Generalized, STT), follows similar procedure like the one mentioned in step 2 and 3, respectively. The XXX_novel.py files are always used to evaluate the corresponding models on the two novel testing datasets.


To generate the line plots with error bar, run:


python utils/lineplot_with_errorbar.py
More details about UniREP model can be found at: https://github.com/churchlab/UniRep
All Deepviral prediction scores were retrieved from: https://zenodo.org/record/4429824#.YPV1LSWxWV4.
All code and pre-trained model for Doc2vec method was taken from http://zzdlab.com/InterSPPI/