DoSSIER: Marie Curie PhD in Professional Search
I completed my Marie Skłodowska Curie PhD as an Early Stage Researcher (ESR) in DoSSIER, focusing on professional search for health and legal domains.
My work combined information retrieval methods with interaction aware and multidimensional relevance modelling to improve how systems estimate what is useful for real professional users.
During the programme, I collaborated with international research and industry partners and worked with colleagues and students from diverse backgrounds.
I presented results at conferences and research visits, and delivered peer reviewed publications and open access artifacts, including system prototypes.
DoSSIER was funded by the EU Horizon 2020 programme under the Marie Skłodowska Curie grant agreement No 860721.
Official project outputs are available on the
DoSSIER results page (CORDIS).
Selected Sub-projects within DoSSIER
Clinical trial matching prototype (CRUISE) Lead
Prototype that matches patients to clinical trials using a custom information retrieval pipeline informed by multidimensional relevance estimation and user centred design choices.
Impact: Demonstrates an applied patient to trial matching workflow using findings from my PhD research in multidimensional ranking systems and user-aspect
considerations.
Decision theoretic relevance model (DtMRF) Lead
Formal framework for multidimensional relevance estimation that complements similarity based retrieval. It models both positive and negative relevance factors and aggregates them with a decision theoretic method to produce an overall relevance estimate with low computational overhead.
Impact: Improves P@10 by 5 to 28 percent over standard ad hoc retrieval on three clinical trial retrieval benchmarks, with statistically significant gains, while keeping rankings interpretable and efficient.
Additional DoSSIER sub projects are documented through open access artifacts linked to my PhD publications, including reproducibility code, research prototypes, and accompanying papers listed on my
Google Scholar profile
from 2020 to 2023.
Selected Artifacts, Systems & Publications
A systematic review of multidimensional relevance estimation in information retrieval Lead
Systematic review of 72 studies on how relevance is modelled as multidimensional and dynamic, influenced by user, task, and domain factors.
Organises relevance aspects, aggregation approaches, and benchmark resources, and highlights how large language models may shape future work via relevance labelling.
Takeaway: Relevance modelling benefits from clearer factor definitions and more unified estimation methods across domains.
Investigating the impact of query representation on medical information retrieval Lead
Study on how patient information extracted from clinical notes affects clinical trial allocation and medical literature retrieval.
Compares rule based and transformer based extraction, disambiguation, negation handling, and concept expansion to form query representations for retrieval.
Takeaway: Strong rule based pipelines remain a solid and practical choice for medical retrieval, and careful information selection can matter more than added complexity.
Leibi@COLIEE 2022: aggregating tuned lexical models with a cluster driven BERT based model for case law retrieval Co-lead
Submission to COLIEE 2022 case law retrieval: query reformulation for long legal cases, first stage lexical retrieval, neural reranking, and score aggregation.
Explores statistical extraction methods and embedding based strategies, plus a cluster driven reranking approach and linear score aggregation.
Takeaway: Information extraction and compression before retrieval is essential for long legal documents, and strong statistical approaches provide robust gains.
DtMRF: Decision theoretic multidimensional relevance estimation Lead
Introduces the Decision theoretic Multidimensional Relevance Framework (DtMRF), a formal method for estimating relevance using both positive and negative relevance factors. The framework avoids the computational complexity of data driven approaches while providing interpretable rankings.
Takeaway: Multidimensional relevance estimation can yield measurable retrieval gains with limited overhead, while keeping the ranking process transparent and controllable.
A broader set of DoSSIER publications and open access artifacts is available via my
Google Scholar profile,
my GitHub repositories,
and the official DoSSIER results page (CORDIS).
Acknowledgement
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska Curie grant agreement No 860721.