Suchergebnisse

The german reference corpus DeReKo : a primordial sample for linguistic research

Autor*in: Kupietz, Marc

Erschienen: 2014

Verlag: Institut für Deutsche Sprache, Bibliothek, Mannheim

Zugang:

Resolving-System

Langzeitarchivierung Nationalbibliothek

Verlag (kostenfrei)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Verbundkataloge
Beteiligt:	Belica, Cyril (Verfasser); Keibel, Holger (Verfasser); Witt, Andreas (Verfasser)
Sprache:	Englisch
Medientyp:	Buch (Monographie)
Format:	Online
Weitere Identifier:	urn: urn:nbn:de:bsz:mh39-28379
DDC Klassifikation:	Sprache (400)
Schlagworte:	Deutsch; Textkorpus; Korpus <Linguistik>
Weitere Schlagworte:	Deutsches Referenzkorpus (DeReKo); Institut für Deutsche Sprache <Mannheim>
Umfang:	Online-Ressource
Bemerkung(en):	In: Proceedings of the 7th International Conference on Language Resources and Evaluation : Workshops & Tutorials May 17-18, May 22-23, Main Conference May 19-21, Valletta . - Paris : ELRA, 2010., S. 1848-1854, ISBN 2-9517408-6-7

Avoiding Data Graveyards : from Heterogeneous Data Collected in Multiple Research Projects to Sustainable Linguistic Resources

Autor*in: Schmidt, Thomas ; Chiarcos, Christian ; Lehmberg, Timm ; Rehm, Georg ; Witt, Andreas ; Hinrichs, Erhard

Erschienen: 2014

This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. The initiative is a cooperation between three collaborative research centres in Germany – the SFB 441 “Linguistic Data Structures” in... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/2268 https://ids-pub.bsz-bw.de/files/2268/Schmidt%20etc_Avoiding%20Data%20Graveyards_2006.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-22687

This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. The initiative is a cooperation between three collaborative research centres in Germany – the SFB 441 “Linguistic Data Structures” in Tübingen, the SFB 538 “Multilingualism” in Hamburg, and the SFB 632 “Information Structure” in Potsdam/Berlin. The aim of the project is to develop methods for sustainable archiving of the diverse bodies of linguistic data used at the three sites. In the first half of the paper, the data handling solutions developed so far at the three centres are briefly introduced. This is followed by an assessment of their commonalities and differences and of what these entail for the work of the new joint initiative. The second part then sketches seven areas of open questions with respect to sustainable data handling and gives a more detailed account of two of them – integration of linguistic terminologies and development of best practice guidelines.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Forschungsdaten; Linguistik; Standardisierung; Langzeitarchivierung
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Sustainability of Linguistic Resources

Autor*in: Dipper, Stefanie ; Hinrichs, Erhard ; Schmidt, Thomas ; Wagner, Andreas ; Witt, Andreas

Erschienen: 2014

This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. This initiative is a cooperation between three linguistic collaborative research centres in Germany, which comprise more than 40 individual... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/2271 https://ids-pub.bsz-bw.de/files/2271/Schmidt%20etc_Sustainability_2006.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-22718

This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. This initiative is a cooperation between three linguistic collaborative research centres in Germany, which comprise more than 40 individual research projects altogether. These projects are involved in creating manifold language resources, especially corpora, tailored to their particular needs. The aim of the project described here is to ensure an effective and sustainable access of these data by third-party researchers beyond the termination of these projects. This goal involves a number of measures, such as the definition of a common data format to completely capture the heterogeneous information encoded in the individual corpora, the development of user-friendly and sustainably usable tools for processing (e.g. querying) the data, and the specification of common inventories of metadata and terminology. Moreover, the project aims at formulating general rules of best practice for creating, accessing, and archiving linguistic resources.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Forschungsdaten; Linguistik; Computerlinguistik; Langzeitarchivierung
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

The german reference corpus DeReKo : a primordial sample for linguistic research

Autor*in: Kupietz, Marc ; Belica, Cyril ; Keibel, Holger ; Witt, Andreas

Erschienen: 2014

Verlag: Paris : ELRA

^This paper describes DeReKo (Deutsches Referenzkorpus), the Archive of General Reference Corpora of Contemporary Written German at the Institut für Deutsche Sprache (IDS) in Mannheim, and the rationale behind its development. We discuss its design,... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/2837 https://ids-pub.bsz-bw.de/files/2837/Kupietz_Belica_Keibel_Witt_The%20German%20Reference_Corpus.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-28379

^This paper describes DeReKo (Deutsches Referenzkorpus), the Archive of General Reference Corpora of Contemporary Written German at the Institut für Deutsche Sprache (IDS) in Mannheim, and the rationale behind its development. We discuss its design, its legal background, how to access it, available metadata, linguistic annotation layers, underlying standards, ongoing developments, and aspects of using the archive for empirical linguistic research. The focus of the paper is on the advantages of DEREKO’s design as a primordial sample from which virtual corpora can be drawn for the specific purposes of individual studies. Both concepts, primordial sample and virtual corpus are explained and illustrated in detail. Furthermore, we describe in more detail how DEREKO deals with the fact that all its texts are subject to third parties’ intellectual property rights, and how it deals with the issue of replicability, which is particularly challenging given DEREKO’s dynamic growth and the possibility to construct from it an open number of virtual corpora.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Deutsch; Textkorpus; Korpus
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Access control by query rewriting: the case of KorAP

Autor*in: Banski, Piotr ; Diewald, Nils ; Hanl, Michael ; Kupietz, Marc ; Witt, Andreas

Erschienen: 2014

Verlag: Reykjavik : European Language Resources Association (ELRA)

We present an approach to an aspect of managing complex access scenarios to large and heterogeneous corpora that involves handling user queries that, intentionally or due to the complexity of the queried resource, target texts or annotations outside... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/3136 https://ids-pub.bsz-bw.de/files/3136/Banski_Diewald_Hanl_Kupietz_Witt_Access%20control_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-31366

We present an approach to an aspect of managing complex access scenarios to large and heterogeneous corpora that involves handling user queries that, intentionally or due to the complexity of the queried resource, target texts or annotations outside of the given user’s permissions. We first outline the overall architecture of the corpus analysis platform KorAP, devoting some attention to the way in which it handles multiple query languages, by implementing ISO CQLF (Corpus Query Lingua Franca), which in turn constitutes a component crucial for the functionality discussed here. Next, we look at query rewriting as it is used by KorAP and zoom in on one kind of this procedure, namely the rewriting of queries that is forced by data access restrictions.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Korpus
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Proceedings of the LREC 2014 workshop challenges in the management of large corpora (CMLC2)

Autor*in: Kupietz, Marc ; Biber, Hanno ; Lüngen, Harald ; Bański, Piotr ; Breiteneder, Evelyn ; Mörth, Karlheinz ; Witt, Andreas ; Takhsha, Jani

Erschienen: 2014

Verlag: Reykjavik : ELRA

Bibliographische Angaben
Zugang

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/3164 https://ids-pub.bsz-bw.de/files/3164/Challenges%20in%20the%20Management%20of%20Large%20Corpora_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-31646

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Korpus; Textkorpus
Lizenz:	creativecommons.org/licenses/by-nc/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

KorAP: the new corpus analysis platform at IDS Mannheim

Autor*in: Bański, Piotr ; Bingel, Joachim ; Diewald, Nils ; Frick, Elena ; Hanl, Michael ; Kupietz, Marc ; Pȩzik, Piotr ; Schnober, Carsten ; Witt, Andreas

Erschienen: 2014

Verlag: Poznań : Uniwersytet im. Adama Mickiewicza w Poznaniu

The KorAP project (“Korpusanalyseplattform der nächste Generation”, “Corpus-analysis platform of the next generation”), carried out at the Institut fUr Deutsche Sprache (IDS) in Mannheim, Germany, has as its goal the development of a modem,... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/3261 https://ids-pub.bsz-bw.de/files/3261/Banski_KorAP_2013.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-32617

The KorAP project (“Korpusanalyseplattform der nächste Generation”, “Corpus-analysis platform of the next generation”), carried out at the Institut fUr Deutsche Sprache (IDS) in Mannheim, Germany, has as its goal the development of a modem, state-of-the-art corpus-analysis platform, capable of handling very large corpora and opening the perspectives for innovative linguistic research. The platform will facilitate new linguistic findings by making it possible to manage and analyse extremely large amounts of primary data and annotations, while at the same time allowing an undistorted view of the primary un-annotated text, and thus fully satisfying expectations associated with a scientific tool. The project started in July 2011 and is funded till June 2014. The demo presentation in December will be the first version following a preliminary feature freeze, and will open the alpha testing phase of the project.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Korpus
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

The german reference corpus DeReKo : a primordial sample for linguistic research

Avoiding Data Graveyards : from Heterogeneous Data Collected in Multiple Research Projects to Sustainable Linguistic Resources

Sustainability of Linguistic Resources

The german reference corpus DeReKo : a primordial sample for linguistic research

Access control by query rewriting: the case of KorAP

Proceedings of the LREC 2014 workshop challenges in the management of large corpora (CMLC2)

KorAP: the new corpus analysis platform at IDS Mannheim

Kontakt

Partner