Gigaword: Difference between revisions

From English Grammar
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
<font face="Arial, Helvetica, sans-serif">
<font size="3">
===How to search the Gigaword corpus with tgrep2 at the University of Frankfurt===
===How to search the Gigaword corpus with tgrep2 at the University of Frankfurt===
==What is the Gigaword corpus?==
==What is the Gigaword corpus?==
Line 8: Line 11:
tgrep2 is a program to search corpora that are in the format of the Penn Treebank. Here is an example of a sentence in this format:
tgrep2 is a program to search corpora that are in the format of the Penn Treebank. Here is an example of a sentence in this format:


<font size="2">
<nowiki>
<nowiki>
(ROOT (S (NP (NP (DT The)
(ROOT (S (NP (NP (DT The)
Line 31: Line 35:
         (. .)))
         (. .)))
</nowiki>
</nowiki>
<font size="3">
=The structure of a tgrep2 command=
tgrep2 [Options] "Search-term" -c Corpus file

Latest revision as of 11:43, 29 January 2023

How to search the Gigaword corpus with tgrep2 at the University of Frankfurt

What is the Gigaword corpus?

https://catalog.ldc.upenn.edu/LDC2011T07

What is tgrep2?

tgrep2 is a program to search corpora that are in the format of the Penn Treebank. Here is an example of a sentence in this format:

(ROOT (S (NP (NP (DT The) (NN duo)) (, ,) (RRC (ADVP (RB together)) (PP (IN with) (NP (CD 11) (NNS accomplices)))) (, ,)) (VP (VBD raised) (NP (QP (JJR more) (IN than) (CD 4) (CD billion)) (NN yuan)) (PP (IN from) (NP (NNP April) (CD 2004))) (PP (TO to) (NP (NNP July) (CD 2008)))) (. .)))

The structure of a tgrep2 command

tgrep2 [Options] "Search-term" -c Corpus file