How to use
cc.mallet.extract.HierarchicalTokenizationFilter
constructor

Best Java code snippets using cc.mallet.extract.HierarchicalTokenizationFilter.<init> (Showing top 2 results out of 315)

public void testNestedXMLTokenizationFilter ()
{
 LabelAlphabet dict = new LabelAlphabet ();
 String document = "the quick brown fox leapt over the lazy dog";
 StringTokenization toks = new StringTokenization (document, new CharSequenceLexer ());
 Label O = dict.lookupLabel ("O");
 Label ANML = dict.lookupLabel ("ANIMAL");
 Label ANML_MAMM = dict.lookupLabel ("ANIMAL|MAMMAL");
 Label VB = dict.lookupLabel ("VERB");
 Label ANML_JJ = dict.lookupLabel ("ANIMAL|ADJ");
 Label ANML_JJ_MAMM = dict.lookupLabel ("ANIMAL|ADJ|MAMMAL");
 LabelSequence tags = new LabelSequence (new Label[] { O, ANML, ANML, ANML_MAMM, VB, O, ANML, ANML_JJ, ANML_JJ_MAMM });
 DocumentExtraction extr = new DocumentExtraction ("Test", dict, toks, tags, null, "O", new HierarchicalTokenizationFilter ());
 String actualXml = extr.toXmlString();
 String expectedXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" +
     "<doc>the <ANIMAL>quick brown <MAMMAL>fox </MAMMAL></ANIMAL><VERB>leapt </VERB>over <ANIMAL>the <ADJ>lazy <MAMMAL>dog</MAMMAL></ADJ></ANIMAL></doc>\r\n";
 assertEquals (expectedXml, actualXml);
 // Test the ignore function
 extr = new DocumentExtraction ("Test", dict, toks, tags, null, "O", new HierarchicalTokenizationFilter (Pattern.compile ("AD.*")));
 actualXml = extr.toXmlString();
 expectedXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" +
     "<doc>the <ANIMAL>quick brown <MAMMAL>fox </MAMMAL></ANIMAL><VERB>leapt </VERB>over <ANIMAL>the lazy <MAMMAL>dog</MAMMAL></ANIMAL></doc>\r\n";
 assertEquals (expectedXml, actualXml);
}

public void testNestedXMLTokenizationFilter ()
{
 LabelAlphabet dict = new LabelAlphabet ();
 String document = "the quick brown fox leapt over the lazy dog";
 StringTokenization toks = new StringTokenization (document, new CharSequenceLexer ());
 Label O = dict.lookupLabel ("O");
 Label ANML = dict.lookupLabel ("ANIMAL");
 Label ANML_MAMM = dict.lookupLabel ("ANIMAL|MAMMAL");
 Label VB = dict.lookupLabel ("VERB");
 Label ANML_JJ = dict.lookupLabel ("ANIMAL|ADJ");
 Label ANML_JJ_MAMM = dict.lookupLabel ("ANIMAL|ADJ|MAMMAL");
 LabelSequence tags = new LabelSequence (new Label[] { O, ANML, ANML, ANML_MAMM, VB, O, ANML, ANML_JJ, ANML_JJ_MAMM });
 DocumentExtraction extr = new DocumentExtraction ("Test", dict, toks, tags, null, "O", new HierarchicalTokenizationFilter ());
 String actualXml = extr.toXmlString();
 String expectedXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" +
     "<doc>the <ANIMAL>quick brown <MAMMAL>fox </MAMMAL></ANIMAL><VERB>leapt </VERB>over <ANIMAL>the <ADJ>lazy <MAMMAL>dog</MAMMAL></ADJ></ANIMAL></doc>\r\n";
 assertEquals (expectedXml, actualXml);
 // Test the ignore function
 extr = new DocumentExtraction ("Test", dict, toks, tags, null, "O", new HierarchicalTokenizationFilter (Pattern.compile ("AD.*")));
 actualXml = extr.toXmlString();
 expectedXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" +
     "<doc>the <ANIMAL>quick brown <MAMMAL>fox </MAMMAL></ANIMAL><VERB>leapt </VERB>over <ANIMAL>the lazy <MAMMAL>dog</MAMMAL></ANIMAL></doc>\r\n";
 assertEquals (expectedXml, actualXml);
}

Popular methods of HierarchicalTokenizationFilter

Popular in Java

Making http post requests using okhttp
setRequestProperty (URLConnection)
scheduleAtFixedRate (ScheduledExecutorService)
notifyDataSetChanged (ArrayAdapter)
InputStreamReader (java.io)
A class for turning a byte stream into a character stream. Data read from the source input stream is
ConnectException (java.net)
A ConnectException is thrown if a connection cannot be established to a remote host on a specific po
Collections (java.util)
This class consists exclusively of static methods that operate on or return collections. It contains
CountDownLatch (java.util.concurrent)
A synchronization aid that allows one or more threads to wait until a set of operations being perfor
Cipher (javax.crypto)
This class provides access to implementations of cryptographic ciphers for encryption and decryption
Table (com.google.common.collect)
A collection that associates an ordered pair of keys, called a row key and a column key, with a sing
Top 12 Jupyter Notebook extensions

How to use cc.mallet.extract.HierarchicalTokenizationFilterconstructor

Best Java code snippets using cc.mallet.extract.HierarchicalTokenizationFilter.<init> (Showing top 2 results out of 315)

How to use
cc.mallet.extract.HierarchicalTokenizationFilter
constructor