Gigaword: Difference between revisions
No edit summary |
|||
Line 34: | Line 34: | ||
(CD 2008)))) | (CD 2008)))) | ||
(. .))) | (. .))) | ||
</nowiki> | </nowiki>// | ||
<font size="3"> | <font size="3"> | ||
Gert | Gert |
Revision as of 11:40, 29 January 2023
How to search the Gigaword corpus with tgrep2 at the University of Frankfurt
What is the Gigaword corpus?
https://catalog.ldc.upenn.edu/LDC2011T07
What is tgrep2?
tgrep2 is a program to search corpora that are in the format of the Penn Treebank. Here is an example of a sentence in this format:
(ROOT (S (NP (NP (DT The) (NN duo)) (, ,) (RRC (ADVP (RB together)) (PP (IN with) (NP (CD 11) (NNS accomplices)))) (, ,)) (VP (VBD raised) (NP (QP (JJR more) (IN than) (CD 4) (CD billion)) (NN yuan)) (PP (IN from) (NP (NNP April) (CD 2004))) (PP (TO to) (NP (NNP July) (CD 2008)))) (. .))) // Gert