Antske Fokkens – Professor in Computational Linguistic Methods


Our InDeep NWA project started! With (among others) Jelle Zuidema, Afra Alishahi, Arianna Bisazza and Iris Hendrickx.

A recent version of my CV can be found here

My Research

My main interest lies in methodological aspects of research in Computational Linguistics. I am driven by the question of how computational models of language work: what patterns and systems are found in natural language? How can they be modeled computationally? Which computational methods are suitable for modeling or analyzing which phenomena.

I am currently working on two main topics. Together with my PhD student, Pia Sommerauer, I am exploring semantic models in order to gain a better understanding of (word) embeddings. What is the impact of the algorithm chosen, distribution in the data or random settings as part of the neural network used? This interest came out of my main research topic: investigating in what (subtle) ways people express perspectives. Here, I design and implement tools that extract patterns of how specific groups of people, events or concepts are described in large amounts of text. For instance, do media systematically talk differently about events when the actors have a certain ethnic background? What is said about health and weight and how did that change over time? The basic system extracts transparent patterns, using labels that historians, social scientist and other interested researchers can read directly. In the final stage of this research, I plan to combine this with the observations done about aspects of meaning captured by embeddings.

Over the last five years, I have mainly focused on methodological aspects is the application of NLP to digital humanities. This work is mainly carried out as part of the BiographyNet project. In this project, we (a historian, computer scientist and me) work together to see how we can use NLP and Semantic Web technology to enhance historic research on the Biography Portal of the Netherlands. My research addresses how we can identify information that is useful for historians from text and how we can make sure that historians can assess the reliability of the output of tools of which they do not know the working.
The Network Institute projects Time will tell a different story and Political Discourse in the News also addressed the question of how NLP can be used in historic research and communication science, respectively.
As part of investigating methodological issues, I have also worked on issues regarding the system architecture in NewsReader and am coordinating the Enlighten Your Research project Can we Handle the News, where we pushed the limits of large scale processing and investigate what would be needed to process all the news that is published every day.

My PhD thesis proposed a new methodology for developing linguistic precision grammars. The main idea of the proposal, storing alternative analyses in a metagrammar so that they may be compared at different stages of the development process, can be applied in any theory. I particularly looked at grammars developed as part of the DELPH-IN consortium in which context open-source HPSG-based grammars are developed. The method is also closely related to the LinGO Grammar Matrix.

I am currently working as a researcher at the Computational Lexicology and Terminology Lab and visiting researcher at the Web and Media group at VU University Amsterdam. I am also part of the the Network Institute.

Recently started projects:

  • CLARIAH project HHuCap: mining careers in text
  • College voor de Rechten van de Mens: Identifying age discrimination in job advertisements
  • Academy Assistant project: A linguistic and behavioral assessment of a possible generic optimistic bias in individuals

Past Events

  • The 3rd Edition of BD2019 is taking place in Varna, Bulgaria 5-6 September!

Recent invited talks

  • MediaFutures Annual Meeting Keynote Speaker 30 September 2021, Bergen, Norway
  • Data for History Invited Speaker (moved from Berlin to Online)
  • CLARIN2020 Keynote (moved from Madrid to Online)
  • DFG Young Researchers FUTURE Workshop. Taking place 12-13 September 2019 in Siegburg, Bonn, Germany.
  • Keynote speaker at the Historical Network Research Conference
  • 12 March 2018. Moslims in het News. Duolezing met Abdessamad Bouabid. Bij: Over Beeldvorming gesproken, hoe kunnen we moslimdiscriminatie voorkomen? Den Haag.

Events I (co-)organized

Other Events

  • April 23 2017: Radio interview at Reporter Radio together with Piek Vossen about fake news. You can listen to the interview (in Dutch) here (from 23:39 onwards).

Older Events

  • 7-8 December 2017. An introduction to distributional semantics. Invited Speaker Reading like a human workshop. Amsterdam
  • 12 October 2017. A closer look at distant reading. At: National eScience Symposium. Science in a Digital World.
  • 11 September 2017. Possibilities and risks of using distributional semantics for identifying concept drift. Keynote Speaker Drift-a-LOD workshop. Semantic Conference, Amsterdam
  • 8 June 2017. Panel member “Unhinging the National Framework: Diaries and the Digital”. IABA Conference, Kings College, London.
  • 16 May 2017: Presenting at the CLS Speaker Series, University of Amsterdam
  • 10-11 March 2017: Invited Talk about investigating Biographical data, Suwon, Korea.
  • 6 Februari 2017: Presentation on Linked Data and Linguistic Research LD4LR.
  • 22 November 2016: DESIDERIA workshop on concept drift
  • 30 November 2016: Invited talk at the Tilburg Center for Cognition and Communication
  • 16 November 2016: Keynote presentation at the SWE-CLARIN Workshop
  • 23 September 2016: Guest speaker at LAP launch in Oslo, Norway
  • 26 September 2016: Guest speaker at CRETA-Werkstatt in Stuttgart, Germany.

2 thoughts on “Antske Fokkens – Professor in Computational Linguistic Methods

  1. Hi Antske,

    My name is Armin and I’m a freelance data scientist.

    I’m currently working on a commercial NER project and came across the The NewsReader MEANTIME corpus and CLIN26. The links on the CLIN website are dead, for example:

    Entity Recognition and Disambiguation in CoNLL format: corpus-entities (

    May I check if you know other sources for the conll files? Also would it be possible to use the data commercially?

    Thanks a lot

    • Hi Armin,

      The data should be available in the github repository.

      Did you check there? Otherwise, please contact my colleague Marten Postma (his contact details can be found by searching for him together with ‘Vrije Universiteit’).

      Best regards,

Leave a Reply

Your email address will not be published. Required fields are marked *