NodeFilter filter1 = new AndFilter(new TagNameFilter("IMG"), new HasParentFilter(new HasAttributeFilter("id", "featured_story_1"), true)); NodeList list = parser.parse(filter1); for(int i = 0; i < list.size(); i++) { Node node = list.elementAt(i); ImageTag image = (ImageTag)node; System.out.println(image.getImageURL()); }
public static void main(String[] args) throws Exception { Parser parser = new Parser("file:test.html"); CssSelectorNodeFilter cssFilter = new CssSelectorNodeFilter("td[class=\"xx\"]"); NodeList nodes = parser.parse(cssFilter); String[][] resultSet = new String[nodes.size()][2]; for (int i=0;i<nodes.size();i++) { Node n = nodes.elementAt(i); System.out.println(n); // DEBUG remove me! resultSet[i][0]=n.toPlainTextString().trim(); resultSet[i][1]=null; Node c = n.getFirstChild(); while( c!=null ) { if( c instanceof LinkTag ) { resultSet[i][1] = ((LinkTag) c).getLink(); break; } c = c.getNextSibling(); } System.out.println(i+" text :"+resultSet[i][0]); // DEBUG remove me! System.out.println(i+" link :"+resultSet[i][1]); // DEBUG remove me! } }
NodeList nl = parser.parse(null); // you can also filter here NodeList divs = nl.extractAllNodesThatMatch( new AndFilter(new TagNameFilter("DIV"), new HasAttributeFilter("id", "OBJ123"))); if( divs.size() > 0 ) { Tag div = divs.elementAt(0); String text = div.getText(); // this is the text of the div }
// make some sort of constants for all the positions const int OPEN_PRICE = 0; const int HIGH_PRICE = 1; const int LOW_PRICE = 2; // .... NodeList nl = parser.parse(null); // you can also filter here NodeList values = nl.extractAllNodesThatMatch( new AndFilter(new TagNameFilter("TD"), new HasAttributeFilter("class", "t1"))); if( values.size() > 0 ) { Tag openPrice = values.elementAt(OPEN_PRICE); String openPriceValue = openPrice.getText(); // this is the text of the div }
NodeList nl = parser.parse(null); // you can also filter here NodeList divs = nl.extractAllNodesThatMatch( new AndFilter(new TagNameFilter("DIV"), new HasAttributeFilter("class", "txt"))); if( divs.size() > 0 ) { Tag div = divs.elementAt(0); String text = div.getText(); // this is the text of the div }
public static List<String> getLinksOnPage(final String url) { final Parser htmlParser = new Parser(url); final List<String> result = new LinkedList<String>(); try { final NodeList tagNodeList = htmlParser.extractAllNodesThatMatch(new NodeClassFilter(LinkTag.class)); for (int j = 0; j < tagNodeList.size(); j++) { final LinkTag loopLink = (LinkTag) tagNodeList.elementAt(j); final String loopLinkStr = loopLink.getLink(); result.add(loopLinkStr); } } catch (ParserException e) { e.printStackTrace(); // TODO handle error } return result; }
public static List<String> getLinksOnPage(final String url) { final Parser htmlParser = new Parser(url); final List<String> result = new LinkedList<String>(); try { final NodeList tagNodeList = htmlParser.extractAllNodesThatMatch(new NodeClassFilter(LinkTag.class)); for (int j = 0; j < tagNodeList.size(); j++) { final LinkTag loopLink = (LinkTag) tagNodeList.elementAt(j); final String loopLinkStr = loopLink.getLink(); result.add(loopLinkStr); } } catch (ParserException e) { e.printStackTrace(); // TODO handle error } return result; }
parser.setResource("http://www.youtube.com"); NodeList list = parser.parse(filter); Node node = list.elementAt(0);
parser.setResource("http://www.youtube.com"); NodeList list = parser.parse(filter); Node node = list.elementAt(0);
parser.setResource(url); NodeList list = parser.parse(filter); Node node = list.elementAt(0); if (node instanceof BodyTag) { BodyTag tag = (BodyTag) node;
for(int i=0;i<list1.size();i++) Node n=list1.elementAt(i); if(n.getText().contains("PPFT"))