README.md 2.9 KB
Newer Older
Ngan Thi Dong's avatar
Ngan Thi Dong committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Simplifying miRNA-disease association prediction

To replecate our result, use the following random seeds: [123, 456, 789, 101, 112]

The code was tested on python 3.7+, all required packages are put in requirements.txt

1. To **generate data for evaluation**, run 

`genFoldsData.py --data_dir data/XXX --save_dir data/XXX/folds --randseed RRR`

where XXX is the dataset(either hmdd2 or hmdd3), RRR is the random seed used

2. To evaluate **nimgcn** model, run:

`python eval_nimgcn.py --data_dir data/XXX --fold_dir data/XXX/folds --save_dir data/XXX/results`

other configurable parameters include:
- _sim_type_ should be one of ['functional1', 'functional2', 'gip', 'seq']. The default set up is 'functional2'
- _faulty_ should be either True or False. True means use the faulty calculated similarities
- _save_score_: should be either True or False, corresponding to whether to save to predicted scores or not
- _method_: should be one of ['nimgcn', 'nimgcn1', 'nimgcn2', 'nimgcn3']
- _randseed_: the random seed used to generate train/test split

2. To evaluate **dbmda** model, run:

`python eval_dbmda.py --data_dir data/XXX --fold_dir data/XXX/folds --save_dir data/XXX/results`

other configurable parameters include:
- _sim_type_ should be one of ['functional1', 'functional2', 'gip', 'seq']. The default set up is 'functional2'
- _faulty_ should be either True or False. True means use the faulty calculated similarities
- _save_score_: should be either True or False, corresponding to whether to save to predicted scores or not
- _use_autoencoder_: whether to use autoencoder or not
- _use_seq_sim_: whether to use seq sim or not
- _randseed_: the random seed used to generate train/test split

Ngan Thi Dong's avatar
Ngan Thi Dong committed
36 37 38 39 40 41 42
3. For **EPMDA** since the features took a lot of time to run, we provide all calculated features in epmda/data folder
Please run `eval_epmda_balance.py` with the corresponding arguments for evaluating EPMDA with the balance set up 
and `eval_epmda_original.py` with corresponding arguments for the original evaluation set up

For feature calculation, please refer to the *.py files in epmda folder. 

4. To run all the model with the **NEW** dataset, the following files are needed:
Ngan Thi Dong's avatar
Ngan Thi Dong committed
43 44 45 46 47 48
- m-d.csv: store the association matrix where rows are miRNAs and columns are diseases
- miRNA-disease.txt: store miRNA-disease association list
- miRNA_seq.csv: the miRNAs sequence similarity matrix
- disease_sim.csv: the disease semantic similarity. 
- disease_sim2.csv: the disease semantic + phenotype similarity
- disease_not_found_list.txt: list of ids of diseases that are not found in MESH
Ngan Thi Dong's avatar
Ngan Thi Dong committed
49 50 51

To calculate disease semantic similarity from MESH ontology, disease GIP or miRNA sequence/functiona/GIP kernel similarity, please use the code provided in data/preparation folder.

Ngan Thi Dong's avatar
Ngan Thi Dong committed
52 53 54 55


If you use the code in your work, please cite the following paper:
Dong, Thi Ngan, and Megha Khosla. "A consistent evaluation of miRNA-disease association prediction models." bioRxiv (2020).