Workshop for Natural Language Processing Open Source Software (NLP-OSS)

With great scientific breakthrough comes solid engineering and open communities. The Natural Language Processing (NLP) community has benefited greatly from the open culture in sharing knowledge, data, and software. The primary objective of this workshop is to further the sharing of insights on the engineering and community aspects of creating, developing, and maintaining NLP open source software (OSS) which we seldom talk about in scientific publications. Our secondary goal is to promote synergies between different open source projects and encourage cross-software collaborations and comparisons.

We refer to Natural Language Processing OSS as an umbrella term that not only covers traditional syntactic, semantic, phonetic, and pragmatic applications; we extend the definition to include task-specific applications (e.g., machine translation, information retrieval, question-answering systems), low-level string processing that contains valid linguistic information (e.g. Unicode creation for new languages, language-based character set definitions) and machine learning/artificial intelligence frameworks with functionalities focusing on text applications.

There are many workshops focusing open language resource/annotation creation and curation (e.g. BUCC, GWN, LAW, LOD, WAC). Moreover, we have the flagship LREC conference dedicated to linguistic resources. However, the engineering aspects of NLP OSS is overlooked and under-discussed within the community. There are open source conferences and venues (such as FOSDEM, OSCON, Open Source Summit) where discussions range from operating system kernels to air traffic control hardware but the representation of NLP related presentations is limited. In the Machine Learning (ML) field, the Journal of Machine Learning Research - Machine Learning Open Source Software (JMLR-MLOSS) is a forum for discussions and dissemination of ML OSS topics. We envision that the Workshop for NLP-OSS becomes a similar avenue for NLP OSS discussions.

To our best knowledge, this is the first workshop proposal in the recent years that focuses more on the building aspect of NLP and less on scientific novelty or state-of-art development. A decade ago, there was the SETQA-NLP (Software Engineering, Testing, and Quality Assurance for Natural Language Processing) workshop that raised awareness of the need for good software engineering practices in NLP. In the earlier days of NLP, linguistic software was often monolithic and the learning curve to install, use, and extend the tools was steep and frustrating. More often than not, NLP OSS developers/users interact in siloed communities within the ecologies of their respective projects. In addition to engineering aspects of NLP software, the open source movement has brought a community aspect that we often overlook in building impactful NLP technologies.

An example of precious OSS knowledge comes from SpaCy developer Montani (2017), who shared her thoughts and challenges of maintaining commercial NLP OSS, such as handling open issues on the issue tracker, model release and packaging strategy and monetizing NLP OSS for sustainability.

Řehůřek (2017) shared another example of insightful discussion on bridging the gap between the gap between academia and industry through creating open source and student incubation programs. Řehůřek discussed the need to look beyond the publish-or-perish culture to avoid the brittle “mummy effect” in SOTA research code/techniques.

We hope that the NLP-OSS workshop becomes the intellectual forum to collate various open source knowledge beyond the scientific contribution, announce new software/features, promote the open source culture and best practices that go beyond the conferences.


Sponsorship helps keep NLP-OSS sustainable to widest possible audience. The NLP-OSS workshop is organized by volunteers from both academia and industry. Sponsorship goes to covering the cost of invited speakers.

If you or your company or institution are interested in sponsoring the NLP-OSS, please send us an email at

Gold Level Sponsors

Silver Level Sponsors

Call for Papers

We invite full papers (8 pages) or short papers (4 pages) on topics related to NLP-OSS broadly categorized into (i) software development, (ii) scientific contribution and (iii) NLP-OSS case studies.

  • Software Development
    • Designing and developing NLP-OSS
    • Licensing issues in NLP-OSS
    • Backwards compatibility and stale code in NLP-OSS
    • Growing an NLP-OSS community
    • Maintaining and motivating an NLP-OSS community
    • Best practices for NLP-OSS documentation and testing
    • Contribution to NLP-OSS without coding
    • Incentivizing OSS contributions in NLP
    • Commercialization and Intellectual Property of NLP-OSS
    • Defining and managing NLP-OSS project scope
    • Issues in API design for NLP
    • NLP-OSS software interoperability
    • Analysis of the NLP-OSS community
  • Scientific Contribution
    • Surveying OSS for specific NLP task(s)
    • Demonstration and tutorial of NLP-OSS
    • New NLP-OSS introductions
    • Small but useful NLP-OSS
    • NLP components in ML OSS
    • Citations and references for NLP-OSS
    • OSS vs experiment replicability
    • Gaps between existing NLP-OSS
    • Task-generic vs task-specific software
  • Case studies
    • Case studies of how a specific bug is fixed or feature is added
    • Writing wrappers for other NLP-OSS
    • Writing open-source APIs for open data
    • Teaching NLP with OSS
    • NLP-OSS in the industry

Submission Information

Authors are invited to submit a

  • Full paper up to 8 pages of content
  • Short paper up to 4 pages of content

All papers are allowed unlimited but sensible pages for references. Final camera ready versions will be allowed an additional page of content to address reviewers’ comments.

Submission should be formatted according to the ACL2018 templates. We strongly recommend you to prepare your manuscript using LaTeX:

Submissions should be uploaded to Softconf conference management system at

Invited Speakers
Important Dates

The NLP-OSS workshop will be co-located with the ACL 2018 conference.

  • Paper Submission: 25th March (Sun) 8th April 23:59 American Samoa Time
  • Notification of Acceptance: 29th April 6th May (Sun)
  • Camera-Ready Version: 13th May 20th May (Sun)
  • Workshop: 19th/20th July (Thu/Fri)
Program Committee
  • Martin Andrews, Red Cat Labs
  • Francis Bond, Nanyang Technological University
  • Jason Baldridge, Google
  • Steven Bethard, University of Arizona
  • Fred Blain, University of Sheffield
  • James Bradbury, Salesforce Research
  • Denny Britz, Prediction Machines
  • Marine Carpuat, University of Maryland
  • Kyunghyun Cho, New York University
  • Grzegorz Chrupała, Tilburg University
  • Hal Daumé III, University of Maryland
  • Jon Dehdari, Think Big Analytics
  • Christian Federmann, Microsoft Research
  • Mary Ellen Foster, University of Glasgow
  • Michael Wayne Goodman, University of Washington
  • Arwen Twinkle Griffioen, Zendesk Inc.
  • Joel Grus, Allen Institute for Artificial Intelligence
  • Chris Hokamp, Aylien Inc.
  • Matthew Honnibal, Explosion AI
  • Sung Kim, Hong Kong University of Science and Technology
  • Philipp Koehn, Johns Hopkins University
  • Taku Kudo, Google
  • Christopher Manning, Stanford University
  • Diana Maynard, University of Sheffield
  • Tomas Mikolov, Facebook AI Research (FAIR)
  • Ines Montani, Explosion AI
  • Andreas Müller, Columbia University
  • Graham Neubig, Carnegie Mellon University
  • Vlad Niculae, Cornell CIS
  • Joel Nothman, University of Sydney
  • Matt Post, Johns Hopkins University
  • David Przybilla, Idio
  • Amandalynne Paullada, University of Washington
  • Delip Rao, Joostware AI Research Corp
  • Radim Řehůřek, RaRe Technologies
  • Elijah Rippeth, MITRE Corporation
  • Abigail See, Stanford University
  • Carolina Scarton, University of Sheffield
  • Rico Sennrich, University of Edinburgh
  • Dan Simonson, Georgetown University
  • Vered Shwartz, Bar-Ilan University
  • Ian Soboroff, NIST
  • Pontus Stenetorp, University College London
  • Rachael Tatman, Kaggle
  • Tommaso Teofili, Adobe
  • Emiel van Miltenburg, Vrije Universiteit Amsterdam
  • Maarten van Gompel, Radboud University
  • Gaël Varoquaux, INRIA
  • KhengHui Yeo, Institute for Infocomm Research
  • Marcos Zampieri, University of Wolverhampton