Identify more patients, including undiagnosed ones, for drug discovery, clinical trials and commercial marketing

Interested in applying our novel unsupervised AI algorithms to your patient electronic health record (EHR) data?


Pangaea provides a machine learning based software product to its customers from the biopharmaceutical and healthcare industry for faster identification of patient cohorts based on phenotypes (clinical characteristics and symptoms) from electronic health records (EHRs) and unstructured doctors’ notes. 

This is critical for detecting patients at risk of diseases, finding genes linked to a phenotype in the context of drug or biomarker discovery, recruiting patients for clinical trials and real world evidence (RWE) studies.


Find new patients across diseases in an EHR dataset

50 times faster and more accurate than NLP, indexing, semantic searches and keyword matching

Scalable to 5000 diseases

Higher Return on Investment for our customers who spend to acquire EHR data


Recent results from our work have shown that by using our machine learning based algorithms we can automatically label EHRs based on medical notes so that finding patients based on specific phenotypes is easier, quicker and much more accurate.


We have applied our algorithms to a dataset of 52,722 EHRs and were able to label them with HPO (Human Phenotype Ontology) terms in 40 mins with more than 90% accuracy. This is at least 50 times faster and 30% more accurate than current natural language processing (NLP) tools and keyword search approaches. Additionally, we were also able to discern HPO terms for 5,000 EHRs, which were not detected using keyword searches.



Our technology is based on the founders’ work over the last 20 years in industry and at Imperial College London. Pangaea is advised by renowned scientific and technical experts from the biopharmaceutical domain and Stanford University. The company is supported by teams in London, San Francisco and China.

Executive leadership

Scientific advisors

Professor Mark Musen

Stanford University

  • LinkedIn Social Icon
  • Head of Biomedical Informatics

  • Founded semantic technologies like Bioportal and Protege receiving 1 million hits per day

  • Chairs WHO committee for releasing ICD-11 and NIH's BD2K

Dr. Paul-Michael Agapow


  • LinkedIn Social Icon
  • Health Informatics Director at AstraZeneca

Dr. Rick Sax


  • LinkedIn Social Icon
  • Senior VP, Design & Delivery Innovation, IQVIA

  • Clinical Vice President, AstraZeneca

  • Led the team at Merck Research Labs for successful development and registration of the antiplatelet agent, Aggrastat®

Dr. Vibhor Gupta

Director and Founder

  • LinkedIn Social Icon
  • Bioinformatics and Molecular Biology background over 20 years

  • Formerly SVP at Seven Bridges Genomics (out of Harvard University) and VP at Quantum Secure (San Jose)

Professor Yi-Ke Guo

Chief of Technology and Science

  • LinkedIn Social Icon
  • Founded Inforsense (acquired in 2010)

  • Founded and heads Imperial College London's Data Science Institute

  • Built data management technologies through £130M in funding over 30 years

  • Fellow of Royal Academy of Engineers

Dr. Carlos Pittol

Chief Operating Officer

  • LinkedIn Social Icon
  • Over 20 years of corporate finance and VC experience in helping create and fund new and successful global biotechnology companies

  • Fellow of Royal Society of Chemistry


We are always looking for talented people to join our team. See our current job opportunities below or contact us at

Cambridge Innovations Summit

30th June 2020


22th- 24th June 2020


28th - 29th April 2020

Upcoming events



We are looking to collaborate with scientists and clinicians from life sciences who are interested in applying our algorithms to the EHR datasets they are building or accessing through partners. Get in touch at or send us a message using the contact form below.