项目作者: brochier

项目描述 :
Python package for the paper "New datasets and a benchmark of document network embedding methods for scientific expert finding" (https://arxiv.org/pdf/2004.03621.pdf)
高级语言: Python
项目地址: git://github.com/brochier/expert_finding.git
创建时间: 2020-04-13T20:10:43Z
项目社区:https://github.com/brochier/expert_finding

开源协议:

下载


Python package for the paper New datasets and a benchmark of document network embedding methods for scientific expert finding (BIR20@ECIR20)

Install:

You need python 3.7 installed on your computer with pip. Optionally create a new environment (with conda):

  1. conda create --name expert_finding python=3.7 pip
  2. conda activate expert_finding

Then run:

  1. pip install git+https://github.com/brochier/expert_finding

Or:

  1. git clone https://github.com/brochier/expert_finding
  2. cd expert_finding
  3. pip install -r requirements.txt

Example script

  1. """
  2. An example script to use expert_finding as a package. Its shows how to load a dataset, create a model and run an evaluation.
  3. """
  4. import expert_finding.io
  5. import expert_finding.evaluation
  6. import expert_finding.models.random_model
  7. import expert_finding.models.panoptic_model
  8. import expert_finding.models.propagation_model
  9. import expert_finding.models.voting_model
  10. import numpy as np
  11. # Print the list of available datasets
  12. dataset_names = expert_finding.io.get_list_of_dataset_names()
  13. print("Names of the datasets available:")
  14. for dn in dataset_names:
  15. print(dn)
  16. print()
  17. # Load one dataset
  18. A_da, A_dd, T, L_d, L_d_mask, L_a, L_a_mask, tags = expert_finding.io.load_dataset("stats.stackexchange.com")
  19. """
  20. A_da : adjacency matrix of the document-candidate network (scipy.sparse.csr_matrix)
  21. A_dd : adjacency matrix of the document-document network (scipy.sparse.csr_matrix)
  22. T : raw textual content of the documents (numpy.array)
  23. L_d : labels associated to the document (corresponding to T[L_d_mask]) (numpy.array)
  24. L_d_mask : mask to select the labeled documents (numpy.array)
  25. L_a : labels associated to the candidates (corresponding to A_da[:,L_d_mask]) (numpy.array)
  26. L_a_mask : mask to select the labeled candidates (numpy.array)
  27. tags : names of the labels of expertise (numpy.array)
  28. """
  29. # You can load a model
  30. #model = expert_finding.models.panoptic_model.Model()
  31. #model = expert_finding.models.voting_model.Model()
  32. #model = expert_finding.models.propagation_model.Model()
  33. # You can create a model
  34. class Model:
  35. def __init__(self):
  36. self.num_candidates = 0
  37. def fit(self, A_da, A_dd, T):
  38. self.num_candidates = A_da.shape[1]
  39. def predict(self, d, mask = None):
  40. if mask is not None:
  41. self.num_candidates = len(mask)
  42. return np.random.rand(self.num_candidates)
  43. model = Model()
  44. # Run an evaluation
  45. eval_batches, merged_eval = expert_finding.evaluation.run(model, A_da, A_dd, T, L_d, L_d_mask, L_a, L_a_mask, tags)
  46. # This last function actually performs 3 sub functions:
  47. # 1) run all available querries and compute the metrics for each of them
  48. eval_batches = expert_finding.evaluation.run_all_evaluations(model, A_da, A_dd, T, L_d, L_d_mask, L_a, L_a_mask)
  49. # 2) Merge the evaluations by averaging over the metrics
  50. merged_eval = expert_finding.evaluation.merge_evaluations(eval_batches, tags)
  51. #3) Plot the evaluation. If path is not None, the plot is not shown but saved in an image on disk.
  52. expert_finding.evaluation.plot_evaluation(merged_eval, path=None)

Citing

If you use this code, please consider citing the paper:

  1. @inproceedings{brochier20new,
  2. title={New Datasets and a Benchmark of Document Network Embedding Methods for Scientific Expert Finding},
  3. author={Brochier, Robin and Gourru, Antoine and Guille, Adrien and Velcin, Julien},
  4. booktitle={Bibliometric-enhanced Information Retrieval: 10th International BIR Workshop at ECIR},
  5. year={2020}
  6. }