3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS)
6 Dec 2023 @ EMNLP 2023 in Singapore
With great scientific breakthrough comes solid engineering and open communities. The Natural Language Processing (NLP) community has benefited greatly from the open culture in sharing knowledge, data, and software. The primary objective of this workshop is to further the sharing of insights on the engineering and community aspects of creating, developing, and maintaining NLP open source software (OSS), which we seldom talk about in scientific publications. Our secondary goal is to promote synergies between different open source projects and encourage cross-software collaborations and comparisons.
There are many workshops focusing on the creation and curation of open language resources and annotations (e.g. BUCC, GWN, LAW, LOD, WAC). Moreover, we have the flagship LREC conference dedicated to linguistic resources. However, the engineering aspects of NLP-OSS are overlooked and under-discussed within the community. There are open source conferences and venues (such as FOSDEM, OSCON, Open Source Summit) where discussions range from operating system kernels to air traffic control hardware but the representation of NLP related presentations is limited. In the Machine Learning (ML) field, the Journal of Machine Learning Research - Machine Learning Open Source Software (JMLR-MLOSS) is a forum for discussions and dissemination of ML OSS topics. We envision that the Workshop for NLP-OSS becomes a similar avenue for NLP-OSS discussions.
Recently there have been successful Big Science workshop series which examine and promote open science in NLP. While important and complementary, the goals of Big Science are distinct from those of NLP-OSS which focuses more on the community of practice in open-source software in support of NLP and language technologies. We expect many who participated in the BigScience workshop to participate in NLP-OSS as many of the participants are former PC members in past editions of NLP-OSS. Another grassroot community movement, Eleuther AI started with the researchers attempting to replicate commercial language models and has since grown to an active decentralized community of volunteer researchers, engineers, and developers focused on AI alignment, scaling, and open source AI research.
With the rise of open source startups like Huggingface, the democratization of NLP gives researchers and the general public easy access to language models once available only to a handful of industrial research labs. This acceleration of NLP tools availability creates new synergies between cloud integrations, e.g. Huggingface x AWS Sagemaker, that allows engineers and researchers to train and deploy live applications with minimal infrastructure setups. Building on the shoulders of giants, the scikit-learn and Huggingface ecosystems are now interoperable under the skops framework. We want to highlight these emergent communities and synergies in the NLP-OSS workshop and promote future collaborations with like-minded open source NLP researchers in the third NLP-OSS workshop. We hope that the NLP-OSS workshop could also be hosted in an *ACL conference, and be the intellectual forum to collate this type of knowledge, announce new software/features, promote the open source culture and OSS best practices.
Call for Papers
We invite full papers (8 pages) or short papers (4 pages) on topics related to NLP-OSS broadly categorized into (i) software development, (ii) scientific contribution and (iii) NLP-OSS case studies.
- Software Development
- Designing and developing NLP-OSS
- Licensing issues in NLP-OSS
- Backward compatibility and stale code in NLP-OSS
- Growing, maintaining and motivating an NLP-OSS community
- Best practices for NLP-OSS documentation and testing
- Contribution to NLP-OSS without coding
- Incentivizing OSS contributions in NLP
- Commercialization and Intellectual Property of NLP-OSS
- Defining and managing NLP-OSS project scope
- Issues in API design for NLP
- NLP-OSS software interoperability
- Analysis of the NLP-OSS community
- Scientific Contribution
- Surveying OSS for specific NLP task(s)
- Demonstration, introductions and/or tutorial of NLP-OSS
- Small but useful NLP-OSS
- NLP components in ML OSS
- Citations and references for NLP-OSS
- OSS and experiment replicability
- Gaps between existing NLP-OSS
- Task-generic vs task-specific software
- Case studies
- Case studies of how a specific bug is fixed or feature is added
- Writing wrappers for other NLP-OSS
- Writing open-source APIs for open data
- Teaching NLP with OSS
- NLP-OSS in the industry
Submission information
Authors are invited to submit a
- Full paper up to 8 pages of content or
- Short paper up to 4 pages of content
Submissions can be non-archival and be presented in the NLP-OSS workshop, but we would still require at least a 4-page submission so that reviewers have enough information to make the acceptance/rejection decision. This non-archival option is helpful for author(s) who wants to publish or had published the work elsewhere and would like to present/discuss pertinent NLP-OSS related work to the workshop PCs and attendees.
All papers are allowed unlimited but sensible pages for references. Final camera-ready versions will be allowed an additional page of content to address reviewers’ comments.
Due to the nature of open source software, we find it a bit tricky to “anonymize” “open source”. For this reason, we don’t require your publication to be anonymous. However, if you prefer your paper to be anonymized, please mask any identifiable phrase with REDACTED.
Submission should be formatted according to the EMNLP 2023 LaTeX or MS Word templates at https://2023.emnlp.org/calls/style-and-formatting/
Submissions should be uploaded to OpenReview conference management system at https://openreview.net/group?id=EMNLP/2023/Workshop/NLP-OSS
Important dates
The 3rd NLP-OSS workshop will be co-located with the EMNLP 2023 conference.
- Paper submission: 09 August, 2023
- Paper Reviews Starts: 25 August 2023
- Paper Reviews Due: 01 October 2023
- Notification of Acceptance: 10 October 2023
- Camera-Ready Version: 25 October 2023
- Workshop: 6-10 Dec 2023
Invited Speakers
Jordan Meyer, Spawning AI
Bio
Jordan Meyer is the co-founder at Spawaning AI. You may have used their haveibeentrained.com tool to see if you are present in popular training datasets, and opt out or opt in to future training, or you might have seen Holly+, the first project to experiment with consensual interactions around an artist AI model.
Organizers
- Geeticka Chauhan, Massachusetts Institute of Technology
- Dmitrijs Milajevs, Grayscale AI
- Elijah Rippeth, University of Maryland
- Jeremy Gwinnup, Air Force Research Laboratory
- Liling Tan, Amazon
Programme Committee
- Aakanksha Naik, Allen Institute for Artificial Intelligence
- Aitor Soroa, University of the Basque Country
- Alexander Rush, Cornell / Hugging Face
- Aline Paes, Universidade Federal Fluminense
- Amittai Axelrod, Apple AI
- Anish Mohan, Nvidia
- Arun Balajiee Lekshmi Narayanan, University of Pittsburgh
- Atnafu lambebo Tonja, Instituto Politécnico Nacional
- Atul Kr. Ojha, University of Galway
- Cassandra Jacobs, University at Buffalo
- Christoph Teichmann, Bloomberg LP
- Daniel Braun, University of Twente
- Dave Howcroft, Edinburgh Napier University
- Diana Maynard, University of Sheffield
- Fabio Kepler, Unbabel
- Flammie a Pirinen, UiT The Arctic University of Norway
- Francis Bond, Palacký University Olomouc
- Gérard Dupont, Mavenoid
- Guillaume Becquin, Bloomberg
- Ignatius Ezeani, Lancaster University
- Jana Götze, University of Potsdam
- Jack Morris, Cornell University
- Jörg Tiedemann, University of Helsinki
- Karin Sim, Language Weaver
- Kevin Cohen, University of Colorado
- Lane Schwartz, University of Alaska Fairbanks
- Leo Boytsov, Amazon
- Lucy Park, Upstage
- Maarten van Gompel, Radboud University
- Maheshwar Ghankot, Indira Gandhi National Open University
- Mallika Singh, Harvard Medical School
- Marcel Bollmann, Linköping University
- Marco Cognetta, Tokyo Institute of Technology, Google
- Marzieh Fadaee, Zeta Alpha Vector
- Matt Post, Microsoft
- Micah Shlain, Allen Institute for Artificial Intelligence
- Michael Wayne Goodman, LivePerson Inc.
- Mohd Sanad Zaki Rizvi, University of Edinburgh
- Nelson F. Liu, Stanford University
- Nitin Madnani, Educational Testing Service
- Ogundepo Odunayo, University of Waterloo
- Pasquale Lisena, EURECOM
- Philipp Koehn, Johns Hopkins University
- Phu Mon Htut, AWS AI Labs
- Raeid Saqur, Princeton University
- Raphael Tang, Comcast Applied AI
- Sagnik Ray Choudhury, University of Michigan
- Shilpa Suresh, Harvard Medical School
- Shubhanshu Mishra, Shubhanshu.com
- Sina Ahmadi, George Mason University
- Steve DeNeefe, RWS Language Weaver
- Steven Bethard, University of Arizona
- Taha Zerrouki, Bouira University Algeria
- Tenzin Bhotia, Delhi Technological University
- Thomas Kober, Zalando SE
- Tomas Mikolov, Czech Institute of Informatics
- Tommaso Teofili, Roma Tre University
- Vlad Niculae, University of Amsterdam
- Won Ik Cho, Seoul National University
- Zaid Alyafeai, King Fahd University of Petroleum and Minerals
- Ziv Litmanovitz, University of Haifa
Previous Workshops
Second Workshop for Natural Language Processing Open Source Software (NLP-OSS 2020)
[Proceedings]
First Workshop for Natural Language Processing Open Source Software (NLP-OSS 2018)
[Proceedings]