Vidas Daudaravičius, VTeX
Speech at the conference Lithuanian Journals Meeting the Needs of Digital Scholarly Communication on 25 October 2016 in Vilnius, Lithuania
Elektroninė mokslo žurnalų leidyba: trys tendencijos PDF ir VIDEO
During a recent decade the transformation of science publications into digital forms has taken place. Recently, the majority of science publications is being published digitally aiming to cut down expenses, to shorten the time span of the article preparation for publishing, increasing their dissemination on the Internet. This transformation has changed the conception of a scientific article and journal. Therefore, the publishers of science have to overcome new challenges that are partially similar to the transformation of the mass media of a recent decade:
– Scientific publishing becomes accessible to everyone. Both large companies of scientific publishing and small scientific communities can efficiently practise scientific publishing.
– Mega-journals appear, which connect various scientific topics and journals.
– The publishing of journals is not restricted to the extent that used to be important in a printed form.
– Internet has become a huge library and the main source of knowledge.
– Journals have become the data for research.
– The publishing of scientific journals includes articles, data sets, digital tools, information linking and referencing.
In scientific publishing one can distinguish three main trends that are grounding the publishing principles of journals:
• The preparation of the article for publishing is not only the validation of the quality requirements of the printed form in PDF format, but it is also the preparation of the full text XML article for semantic publishing. Apart from generating necessary formats (ePub, HTML5, etc.), the application of machine learning, natural language processing and corpus methods become necessary for the development of new automated services: key-phrase extraction from articles and journals, compilation of databases, knowledge gathering, topic classification of an article or journal, bibliography tagging, annotating terminology, the selection of reviewers, etc.
• Authors are included in all processes of publishing production: authoring, bringing corrections, responding to the remarks of editors, proofing, managing supplementary material. Our success story shortly: creating and adjusting SkyLaTeX, which has been favourably accepted and positively evaluated by authors.
• HTML5 is universal mark-up standard, meets all the requirements of semantic publishing and is widely used. The standard is suitable for the both widely recognized publishing processes: XML-first and LaTeX-first. VTeX pays a special attention to the LaTeX-first production process, which is specifically relevant to scientists, whose articles contain a lot of mathematical expressions, i. e. mathematicians, physicists, ICT specialists, etc.
Vidas Daudaravičius
Research project manager at VTeX
Since joining the VTeX company in 2011, Vidas Daudaravičius was the principal researcher and developer in several R&D projects, including ones supported by EU structural funds.
His competence in Natural Language Processing and Machine Learning allowed to introduce the fruitful practice of data reuse within the company and create the research infrastructure for development of new services for publishing companies and academic writers. He created and published corpus LEDAT (Language Editing Dataset of Academic Texts) and was one of organizers of a Shared Task and the related workshop (at NAACL HLT 2015 conference) devoted for automated evaluation of scientific writing.
Vidas got PhD degree from Vytautas Magnus University, where he worked at the Centre of Computational Linguistics for more than ten years and participated in the successful projects on creation of Lithuanian morphological tagger, compilation of the corpus of the contemporary Lithuanian usage, and development of the first online English–Lithuanian machine translation system.