edu.stanford.nlp.process.WordShapeClassifier.wordShapeChris4Short java code examples

/**
 * This one picks up on Dan2 ideas, but seeks to make less distinctions
 * mid sequence by sorting for long words, but to maintain extra
 * distinctions for short words, by always recording the class of the
 * first and last two characters of the word.
 * Compared to chris2 on which it is based,
 * it uses more Unicode classes, and so collapses things like
 * punctuation more, and might work better with real unicode.
 *
 * @param s The String to find the word shape of
 * @param omitIfInBoundary If true, character classes present in the
 *                         first or last two (i.e., BOUNDARY_SIZE) letters
 *                         of the word are not also registered
 *                         as classes that appear in the middle of the word.
 * @param knownLCWords If non-null and non-empty, tag with a "k" suffix words
 *                    that are in this list when lowercased (representing
 *                    that the word is "known" as a lowercase word).
 * @return A word shape for the word.
 */
private static String wordShapeChris4(String s, boolean omitIfInBoundary, Collection<String> knownLCWords) {
 int len = s.length();
 if (len <= BOUNDARY_SIZE * 2) {
  return wordShapeChris4Short(s, len, knownLCWords);
 } else {
  return wordShapeChris4Long(s, omitIfInBoundary, len, knownLCWords);
 }
}

/**
 * This one picks up on Dan2 ideas, but seeks to make less distinctions
 * mid sequence by sorting for long words, but to maintain extra
 * distinctions for short words, by always recording the class of the
 * first and last two characters of the word.
 * Compared to chris2 on which it is based,
 * it uses more Unicode classes, and so collapses things like
 * punctuation more, and might work better with real unicode.
 *
 * @param s The String to find the word shape of
 * @param omitIfInBoundary If true, character classes present in the
 *                         first or last two (i.e., BOUNDARY_SIZE) letters
 *                         of the word are not also registered
 *                         as classes that appear in the middle of the word.
 * @param knownLCWords If non-null and non-empty, tag with a "k" suffix words
 *                    that are in this list when lowercased (representing
 *                    that the word is "known" as a lowercase word).
 * @return A word shape for the word.
 */
private static String wordShapeChris4(String s, boolean omitIfInBoundary, Collection<String> knownLCWords) {
 int len = s.length();
 if (len <= BOUNDARY_SIZE * 2) {
  return wordShapeChris4Short(s, len, knownLCWords);
 } else {
  return wordShapeChris4Long(s, omitIfInBoundary, len, knownLCWords);
 }
}

/**
 * This one picks up on Dan2 ideas, but seeks to make less distinctions
 * mid sequence by sorting for long words, but to maintain extra
 * distinctions for short words, by always recording the class of the
 * first and last two characters of the word.
 * Compared to chris2 on which it is based,
 * it uses more Unicode classes, and so collapses things like
 * punctuation more, and might work better with real unicode.
 *
 * @param s The String to find the word shape of
 * @param omitIfInBoundary If true, character classes present in the
 *                         first or last two (i.e., BOUNDARY_SIZE) letters
 *                         of the word are not also registered
 *                         as classes that appear in the middle of the word.
 * @param knownLCWords If non-null and non-empty, tag with a "k" suffix words
 *                    that are in this list when lowercased (representing
 *                    that the word is "known" as a lowercase word).
 * @return A word shape for the word.
 */
private static String wordShapeChris4(String s, boolean omitIfInBoundary, Collection<String> knownLCWords) {
 int len = s.length();
 if (len <= BOUNDARY_SIZE * 2) {
  return wordShapeChris4Short(s, len, knownLCWords);
 } else {
  return wordShapeChris4Long(s, omitIfInBoundary, len, knownLCWords);
 }
}

/**
 * This one picks up on Dan2 ideas, but seeks to make less distinctions
 * mid sequence by sorting for long words, but to maintain extra
 * distinctions for short words, by always recording the class of the
 * first and last two characters of the word.
 * Compared to chris2 on which it is based,
 * it uses more Unicode classes, and so collapses things like
 * punctuation more, and might work better with real unicode.
 *
 * @param s The String to find the word shape of
 * @param omitIfInBoundary If true, character classes present in the
 *                         first or last two (i.e., BOUNDARY_SIZE) letters
 *                         of the word are not also registered
 *                         as classes that appear in the middle of the word.
 * @param knownLCWords If non-null and non-empty, tag with a "k" suffix words
 *                    that are in this list when lowercased (representing
 *                    that the word is "known" as a lowercase word).
 * @return A word shape for the word.
 */
private static String wordShapeChris4(String s, boolean omitIfInBoundary, Collection<String> knownLCWords) {
 int len = s.length();
 if (len <= BOUNDARY_SIZE * 2) {
  return wordShapeChris4Short(s, len, knownLCWords);
 } else {
  return wordShapeChris4Long(s, omitIfInBoundary, len, knownLCWords);
 }
}

/**
 * This one picks up on Dan2 ideas, but seeks to make less distinctions
 * mid sequence by sorting for long words, but to maintain extra
 * distinctions for short words, by always recording the class of the
 * first and last two characters of the word.
 * Compared to chris2 on which it is based,
 * it uses more Unicode classes, and so collapses things like
 * punctuation more, and might work better with real unicode.
 *
 * @param s The String to find the word shape of
 * @param omitIfInBoundary If true, character classes present in the
 *                         first or last two (i.e., BOUNDARY_SIZE) letters
 *                         of the word are not also registered
 *                         as classes that appear in the middle of the word.
 * @param knownLCWords If non-null and non-empty, tag with a "k" suffix words
 *                    that are in this list when lowercased (representing
 *                    that the word is "known" as a lowercase word).
 * @return A word shape for the word.
 */
private static String wordShapeChris4(String s, boolean omitIfInBoundary, Collection<String> knownLCWords) {
 int len = s.length();
 if (len <= BOUNDARY_SIZE * 2) {
  return wordShapeChris4Short(s, len, knownLCWords);
 } else {
  return wordShapeChris4Long(s, omitIfInBoundary, len, knownLCWords);
 }
}

Popular methods of WordShapeClassifier

chris4equivalenceClass
containsGreekLetter
Somewhat ad-hoc list of only greek letters that bio people use, partly to avoid false positives on s
dontUseLC
Returns true if the specified word shaper doesn't use known lower case words, even if a list of them
lookupShaper
Look up a shaper by a short String name.
wordShape
Specify the string and the int identifying which word shaper to use and this returns the result of u
wordShapeChris1
This one equivalence classes all strings into one of 24 semantically informed classes, somewhat simi
wordShapeChris2
This one picks up on Dan2 ideas, but seeks to make less distinctions mid sequence by sorting for lon
wordShapeChris2Long
wordShapeChris2Short
wordShapeChris4
This one picks up on Dan2 ideas, but seeks to make less distinctions mid sequence by sorting for lon
wordShapeChris4Long
wordShapeDan1
A fairly basic 5-way classifier, that notes digits, and upper and lower case, mixed, and non-alphanu

Popular in Java

Making http requests using okhttp
scheduleAtFixedRate (Timer)
setRequestProperty (URLConnection)
addToBackStack (FragmentTransaction)
ServerSocket (java.net)
This class represents a server-side socket that waits for incoming client connections. A ServerSocke
BlockingQueue (java.util.concurrent)
A java.util.Queue that additionally supports operations that wait for the queue to become non-empty
ThreadPoolExecutor (java.util.concurrent)
An ExecutorService that executes each submitted task using one of possibly several pooled threads, n
Servlet (javax.servlet)
Defines methods that all servlets must implement. A servlet is a small Java program that runs within
JCheckBox (javax.swing)
JOptionPane (javax.swing)
Top plugins for WebStorm

How to use wordShapeChris4Shortmethodin edu.stanford.nlp.process.WordShapeClassifier

Best Java code snippets using edu.stanford.nlp.process.WordShapeClassifier.wordShapeChris4Short (Showing top 5 results out of 315)

How to use
wordShapeChris4Short
method
in
edu.stanford.nlp.process.WordShapeClassifier