public static Fields copyComparators( Fields toFields, Fields... fromFields ) { for( Fields fromField : fromFields ) { for( Comparable field : fromField ) { Comparator comparator = fromField.getComparator( field ); if( comparator != null ) toFields.setComparator( field, comparator ); } } return toFields; }
private Fields createFields() { Fields fields = new Fields(); RelNode child = getChild(); RelDataType inputRowType = child.getRowType(); for( RexNode exp : fieldExps ) { int index = ( (RexInputRef) exp ).getIndex(); RelDataTypeField typeField = inputRowType.getFieldList().get( index ); String name = typeField.getName(); fields = fields.append( new Fields( name ) ); } for( RelFieldCollation fieldCollation : collation.getFieldCollations() ) { String name = inputRowType.getFieldList().get( fieldCollation.getFieldIndex() ).getName(); boolean isDescending = fieldCollation.getDirection() == RelFieldCollation.Direction.Descending; boolean isNullsFirst = fieldCollation.nullDirection == RelFieldCollation.NullDirection.FIRST; Comparator<Comparable> comparator = Functions.<Comparable>nullsComparator( isNullsFirst, isDescending ); if( comparator != null ) fields.setComparator( name, comparator ); } return fields; } }
public Branch visitChild( Stack stack ) { Branch lhsBranch = ( (CascadingRelNode) left ).visitChild( stack ); Branch rhsBranch = ( (CascadingRelNode) right ).visitChild( stack ); Pipe leftPipe = new Pipe( "lhs", lhsBranch.current ); leftPipe = stack.addDebug( this, leftPipe, "lhs" ); Pipe rightPipe = new Pipe( "rhs", rhsBranch.current ); rightPipe = stack.addDebug( this, rightPipe, "rhs" ); Fields lhsGroup = createTypedFieldsSelectorFor( getCluster(), leftKeys, left.getRowType(), true ); Fields rhsGroup = createTypedFieldsSelectorFor( getCluster(), rightKeys, right.getRowType(), true ); NullNotEquivalentComparator comparator = new NullNotEquivalentComparator(); for( int i = 0; i < lhsGroup.size(); i++ ) lhsGroup.setComparator( i, comparator ); Joiner joiner = getJoiner(); Fields declaredFields = RelUtil.createTypedFieldsFor( this, false ); // need to parse lhs rhs fields from condition String name = stack.getNameFor( CoGroup.class, leftPipe, rightPipe ); Pipe coGroup = new CoGroup( name, leftPipe, lhsGroup, rightPipe, rhsGroup, declaredFields, joiner ); coGroup = stack.addDebug( this, coGroup ); return new Branch( coGroup, lhsBranch, rhsBranch ); }
groupFields.setComparator( 0, comparator );
groupFields.setComparator( 0, comparator );
@Test public void testCompare() { Fields fields = new Fields( "a" ); fields.setComparator( "a", comparator ); Tuple aTuple = new Tuple( "a" ); Tuple bTuple = new Tuple( "b" ); assertTrue( "not less than: aTuple < bTuple", fields.compare( aTuple, bTuple ) < 0 ); assertTrue( "not less than: bTuple < aTuple", fields.compare( bTuple, aTuple ) > 0 ); aTuple.add( "b" ); assertTrue( "not greater than: aTuple > bTuple", fields.compare( aTuple, bTuple ) > 0 ); aTuple = new Tuple( bTuple, "a" ); assertTrue( "not greater than: aTuple > bTuple", fields.compare( aTuple, bTuple ) > 0 ); }
@Test public void testGroupByInsensitive() throws Exception { getPlatform().copyFromLocal( inputFileLower ); getPlatform().copyFromLocal( inputFileUpper ); Tap sourceLower = getPlatform().getDelimitedFile( new Fields( "num", "char" ), " ", inputFileLower ); Tap sourceUpper = getPlatform().getDelimitedFile( new Fields( "num", "char" ), " ", inputFileUpper ); Map sources = new HashMap(); sources.put( "lower", sourceLower ); sources.put( "upper", sourceUpper ); Tap sink = getPlatform().getTextFile( new Fields( "line" ), getOutputPath( "insensitivegrouping" + NONDETERMINISTIC ), SinkMode.REPLACE ); Pipe pipeLower = new Pipe( "lower" ); Pipe pipeUpper = new Pipe( "upper" ); Pipe merge = new Merge( pipeLower, pipeUpper ); Fields charFields = new Fields( "char" ); charFields.setComparator( "char", new LowerComparator() ); Pipe splice = new GroupBy( "groupby", merge, charFields ); splice = new Every( splice, new Fields( "char" ), new Count() ); Flow flow = getPlatform().getFlowConnector().connect( sources, sink, splice ); flow.complete(); // we can't guarantee if the grouping key will be upper or lower validateLength( flow, 5, 1, Pattern.compile( "^\\w+\\s2$" ) ); } }
@Test public void testGroupByInsensitive() throws Exception { getPlatform().copyFromLocal( inputFileLower ); getPlatform().copyFromLocal( inputFileUpper ); Tap sourceLower = getPlatform().getDelimitedFile( new Fields( "num", "char" ), " ", inputFileLower ); Tap sourceUpper = getPlatform().getDelimitedFile( new Fields( "num", "char" ), " ", inputFileUpper ); Map sources = new HashMap(); sources.put( "lower", sourceLower ); sources.put( "upper", sourceUpper ); Tap sink = getPlatform().getTextFile( new Fields( "line" ), getOutputPath( "insensitivegrouping" + NONDETERMINISTIC ), SinkMode.REPLACE ); Pipe pipeLower = new Pipe( "lower" ); Pipe pipeUpper = new Pipe( "upper" ); Pipe merge = new Merge( pipeLower, pipeUpper ); Fields charFields = new Fields( "char" ); charFields.setComparator( "char", new LowerComparator() ); Pipe splice = new GroupBy( "groupby", merge, charFields ); splice = new Every( splice, new Fields( "char" ), new Count() ); Flow flow = getPlatform().getFlowConnector().connect( sources, sink, splice ); flow.complete(); // we can't guarantee if the grouping key will be upper or lower validateLength( flow, 5, 1, Pattern.compile( "^\\w+\\s2$" ) ); } }
num.setComparator( "num", new AllComparator() );
num.setComparator( "num", new AllComparator() );
@Test public void testFirstBy() throws IOException { getPlatform().copyFromLocal( inputFileCross ); Tap source = getPlatform().getDelimitedFile( new Fields( "num", "lower", "upper" ), " ", inputFileCross ); Tap sink = getPlatform().getDelimitedFile( new Fields( "num", "lower", "upper" ), "\t", new Class[]{Integer.TYPE, String.class, String.class}, getOutputPath( "firstnfields" ), SinkMode.REPLACE ); Pipe pipe = new Pipe( "first" ); Fields charFields = new Fields( "lower", "upper" ); charFields.setComparator( "lower", Collections.reverseOrder() ); pipe = new FirstBy( pipe, new Fields( "num" ), charFields, 2 ); Flow flow = getPlatform().getFlowConnector().connect( source, sink, pipe ); flow.complete(); Tuple[] results = new Tuple[]{ new Tuple( 1, "c", "A" ), new Tuple( 2, "d", "B" ), new Tuple( 3, "c", "C" ), new Tuple( 4, "d", "B" ), new Tuple( 5, "e", "A" ) }; TupleEntryIterator iterator = flow.openSink(); int count = 0; while( iterator.hasNext() ) assertEquals( results[ count++ ], iterator.next().getTuple() ); assertTrue( !iterator.hasNext() ); iterator.close(); }
@Test public void testFirstBy() throws IOException { getPlatform().copyFromLocal( inputFileCross ); Tap source = getPlatform().getDelimitedFile( new Fields( "num", "lower", "upper" ), " ", inputFileCross ); Tap sink = getPlatform().getDelimitedFile( new Fields( "num", "lower", "upper" ), "\t", new Class[]{Integer.TYPE, String.class, String.class}, getOutputPath( "firstnfields" ), SinkMode.REPLACE ); Pipe pipe = new Pipe( "first" ); Fields charFields = new Fields( "lower", "upper" ); charFields.setComparator( "lower", Collections.reverseOrder() ); pipe = new FirstBy( pipe, new Fields( "num" ), charFields, 2 ); Flow flow = getPlatform().getFlowConnector().connect( source, sink, pipe ); flow.complete(); Tuple[] results = new Tuple[]{ new Tuple( 1, "c", "A" ), new Tuple( 2, "d", "B" ), new Tuple( 3, "c", "C" ), new Tuple( 4, "d", "B" ), new Tuple( 5, "e", "A" ) }; TupleEntryIterator iterator = flow.openSink(); int count = 0; while( iterator.hasNext() ) assertEquals( results[ count++ ], iterator.next().getTuple() ); assertTrue( !iterator.hasNext() ); iterator.close(); }
groupApache.setComparator( "octet", getPlatform().getLongComparator( reverseSort ) ); groupIP.setComparator( "rawoctet", getPlatform().getLongComparator( reverseSort ) );
private void runComprehensiveCase( Boolean[] testCase, boolean useCollectionsComparator ) throws IOException { getPlatform().copyFromLocal( inputFileCrossNulls ); String test = Util.join( testCase, "_", true ) + "_" + useCollectionsComparator; String path = "comprehensive/" + test; Tap source = getPlatform().getTextFile( new Fields( "line" ), inputFileCrossNulls ); Tap sink = getPlatform().getDelimitedFile( new Fields( "num", "lower", "upper" ).applyTypes( Long.class, String.class, String.class ), " ", getOutputPath( path ), SinkMode.REPLACE ); sink.getScheme().setNumSinkParts( 1 ); Pipe pipe = new Pipe( "comprehensivesort" ); pipe = new Each( pipe, new Fields( "line" ), new RegexSplitter( new Fields( "num", "lower", "upper" ), "\\s" ) ); pipe = new Each( pipe, new Fields( "num" ), new Identity( Long.class ), Fields.REPLACE ); Fields groupFields = new Fields( "num" ); if( testCase[ 0 ] ) groupFields.setComparator( "num", useCollectionsComparator ? new NullSafeReverseComparator() : getPlatform().getLongComparator( true ) ); Fields sortFields = null; if( testCase[ 1 ] != null ) { sortFields = new Fields( "upper" ); if( testCase[ 1 ] ) sortFields.setComparator( "upper", useCollectionsComparator ? new NullSafeReverseComparator() : getPlatform().getStringComparator( true ) ); } pipe = new GroupBy( pipe, groupFields, sortFields, testCase[ 2 ] ); Map<Object, Object> properties = getProperties(); if( getPlatform().isMapReduce() && getPlatform().getNumMapTasks( properties ) != null ) getPlatform().setNumMapTasks( properties, 13 ); Flow flow = getPlatform().getFlowConnector().connect( source, sink, pipe ); flow.complete(); validateCase( test, testCase, sink ); }
private void runComprehensiveCase( Boolean[] testCase, boolean useCollectionsComparator ) throws IOException { getPlatform().copyFromLocal( inputFileCrossNulls ); String test = Util.join( testCase, "_", true ) + "_" + useCollectionsComparator; String path = "comprehensive/" + test; Tap source = getPlatform().getTextFile( new Fields( "line" ), inputFileCrossNulls ); Tap sink = getPlatform().getDelimitedFile( new Fields( "num", "lower", "upper" ).applyTypes( Long.class, String.class, String.class ), " ", getOutputPath( path ), SinkMode.REPLACE ); sink.getScheme().setNumSinkParts( 1 ); Pipe pipe = new Pipe( "comprehensivesort" ); pipe = new Each( pipe, new Fields( "line" ), new RegexSplitter( new Fields( "num", "lower", "upper" ), "\\s" ) ); pipe = new Each( pipe, new Fields( "num" ), new Identity( Long.class ), Fields.REPLACE ); Fields groupFields = new Fields( "num" ); if( testCase[ 0 ] ) groupFields.setComparator( "num", useCollectionsComparator ? new NullSafeReverseComparator() : getPlatform().getLongComparator( true ) ); Fields sortFields = null; if( testCase[ 1 ] != null ) { sortFields = new Fields( "upper" ); if( testCase[ 1 ] ) sortFields.setComparator( "upper", useCollectionsComparator ? new NullSafeReverseComparator() : getPlatform().getStringComparator( true ) ); } pipe = new GroupBy( pipe, groupFields, sortFields, testCase[ 2 ] ); Map<Object, Object> properties = getProperties(); if( getPlatform().isMapReduce() && getPlatform().getNumMapTasks( properties ) != null ) getPlatform().setNumMapTasks( properties, 13 ); Flow flow = getPlatform().getFlowConnector().connect( source, sink, pipe ); flow.complete(); validateCase( test, testCase, sink ); }
groupApache.setComparator( "octet", getPlatform().getLongComparator( reverseSort ) ); groupIP.setComparator( "rawoctet", getPlatform().getLongComparator( reverseSort ) );
@Test public void testSimpleGroupOnBytes() throws Exception { getPlatform().copyFromLocal( inputFileApache ); Tap source = new Hfs( new TextLine( new Fields( "offset", "line" ) ), inputFileApache ); Pipe pipe = new Pipe( "test" ); pipe = new Each( pipe, new Fields( "line" ), new RegexParser( new Fields( "ip" ), "^[^ ]*" ), new Fields( "ip" ) ); pipe = new Each( pipe, new InsertRawBytes( new Fields( "bytes" ), "inserted text as bytes", true, true ), Fields.ALL ); Fields bytes = new Fields( "bytes" ); bytes.setComparator( "bytes", new BytesComparator() ); pipe = new GroupBy( pipe, bytes ); pipe = new Every( pipe, new Count(), new Fields( "bytes", "count" ) ); Tap sink = new Hfs( new SequenceFile( Fields.ALL ), getOutputPath( "grouponbytes" ), SinkMode.REPLACE ); Map<Object, Object> properties = getProperties(); TupleSerializationProps.addSerialization( properties, BytesSerialization.class.getName() ); Flow flow = getPlatform().getFlowConnector( properties ).connect( source, sink, pipe ); flow.complete(); validateLength( flow, 10 ); // 10 unique counts }
num.setComparator( "num", Collections.reverseOrder() );
@Test public void testSimpleGroupOnBytes() throws Exception { getPlatform().copyFromLocal( inputFileApache ); Tap source = new Hfs( new TextLine( new Fields( "offset", "line" ) ), inputFileApache ); Pipe pipe = new Pipe( "test" ); pipe = new Each( pipe, new Fields( "line" ), new RegexParser( new Fields( "ip" ), "^[^ ]*" ), new Fields( "ip" ) ); pipe = new Each( pipe, new InsertRawBytes( new Fields( "bytes" ), "inserted text as bytes", true, true ), Fields.ALL ); Fields bytes = new Fields( "bytes" ); bytes.setComparator( "bytes", new BytesComparator() ); pipe = new GroupBy( pipe, bytes ); pipe = new Every( pipe, new Count(), new Fields( "bytes", "count" ) ); Tap sink = new Hfs( new SequenceFile( Fields.ALL ), getOutputPath( "grouponbytes" ), SinkMode.REPLACE ); Map<Object, Object> properties = getProperties(); TupleSerializationProps.addSerialization( properties, BytesSerialization.class.getName() ); Flow flow = getPlatform().getFlowConnector( properties ).connect( source, sink, pipe ); flow.complete(); validateLength( flow, 10 ); // 10 unique counts }
@Test public void testEquals() { assertEquals( new TupleEntry( new Fields( "a", "b" ), new Tuple( "a", "b" ) ), new TupleEntry( new Fields( "a", "b" ), new Tuple( "a", "b" ) ) ); Fields fields = new Fields( "a", "b" ); fields.setComparator( "b", new StringComparator() ); assertEquals( new TupleEntry( fields, new Tuple( "a", "b" ) ), new TupleEntry( new Fields( "a", "b" ), new Tuple( "a", "B" ) ) ); assertNotSame( new TupleEntry( new Fields( "a", "b" ), new Tuple( "a", "B" ) ), new TupleEntry( new Fields( "a", "b" ), new Tuple( "a", "b" ) ) ); assertNotSame( new TupleEntry( new Fields( "a", "B" ), new Tuple( "a", "b" ) ), new TupleEntry( new Fields( "a", "b" ), new Tuple( "a", "b" ) ) ); }