scalding-base/src/main/scala/com/twitter/scalding/mathematics/Matrix2.scala (7 lines): - line 88: // TODO: complete the rest of the API to match the old Matrix API (many methods are effectively on the TypedPipe) - line 95: * monoids, such as sketchs like HyperLogLog, BloomFilters or CountMinSketch. TODO This is a special kind of - line 109: .group // TODO we could be lazy with this group and combine with a sum - line 240: // TODO: - line 488: // TODO: optimize / combine with Sums: https://github.com/tomtau/scalding/issues/14#issuecomment-22971582 - line 543: // TODO: optimize sums of scalars like sums of matrices: - line 597: // TODO: FunctionMatrix[R,C,V](fn: (R,C) => V) and a Literal scalar is just: FuctionMatrix[Unit, Unit, V]({ (_, _) => v }) scalding-core/src/main/scala/com/twitter/scalding/mathematics/Matrix.scala (6 lines): - line 338: // TODO continually evaluate if this is needed to avoid OOM - line 515: // TODO: Optimize this later and be lazy on groups and joins. - line 644: // TODO optimize the number of reducers - line 660: // TODO optimize the number of reducers - line 677: // TODO optimize the number of reducers - line 958: // TODO this should be tunable: scalding-core/src/main/scala/com/twitter/scalding/typed/cascading_backend/CascadingBackend.scala (6 lines): - line 142: // TODO we could probably optimize this further by just composing - line 313: // TODO we can optimize a flatmapped input directly and skip some tupleconverters - line 360: // TODO: a better optimization is to not materialize this - line 400: // TODO: with diamonds in the graph, this might not be correct - line 655: // TODO: this indirection may not be needed anymore, we could directly track config changes - line 783: * TODO: most of the complexity of this method should be rewritten as an optimization rule that works on the scalding-core/src/main/scala/com/twitter/scalding/FileSource.scala (5 lines): - line 247: * TODO: consider writing a more in-depth version of this method in [[TimePathedSource]] that looks for - line 248: * TODO: missing days / hours etc. - line 263: // TODO support strict in Local - line 308: * TODO this only does something for HDFS now. Maybe we should do the same for LocalMode - line 433: // TODO Cascading doesn't support local mode yet scalding-core/src/main/scala/com/twitter/scalding/mathematics/MatrixProduct.scala (5 lines): - line 76: * TODO: Muliplication is the expensive stuff. We need to optimize the methods below: This object holds the - line 309: // TODO: remove in 0.9.0, only here just for compatibility. - line 381: // TODO: we should use the size hints to set the number of reducers: - line 415: // TODO: we should use the size hints to set the number of reducers: - line 451: // TODO: we should use the size hints to set the number of reducers: scalding-base/src/main/scala/com/twitter/scalding/typed/WritePartitioner.scala (4 lines): - line 373: // TODO: it is a bit unclear if a trap is allowed on the back of a reduce? - line 385: // TODO: hashJoins may not be allowed in a reduce step in cascading, - line 393: // TODO: hashJoins may not be allowed in a reduce step in cascading, - line 405: // TODO: hashJoins may not be allowed in a reduce step in cascading, scalding-spark/src/main/scala/com/twitter/scalding/spark_backend/SparkBackend.scala (4 lines): - line 172: // TODO set a default in a better place - line 187: // TODO we could optionally print out the descriptions - line 350: // TODO handle descriptions in some way - line 353: // TODO handle the number of reducers, maybe by setting the number of partitions scalding-core/src/main/scala/com/twitter/scalding/Job.scala (3 lines): - line 343: // TODO: Why the two ways to do stats? Answer: jank-den. - line 362: // TODO design a better way to test stats. - line 425: * TODO: once we have a mechanism to access FlowProcess from user functions, we can use this build.sbt (3 lines): - line 392: // TODO: split into scalding-protobuf - line 400: // TODO: split this out into scalding-thrift - line 403: // TODO: split this out into a scalding-scrooge scalding-core/src/main/scala/com/twitter/scalding/TupleUnpacker.scala (2 lines): - line 98: // TODO: filter by isAccessible, which somehow seems to fail - line 105: // TODO: filter by isAccessible, which somehow seems to fail scalding-base/src/main/scala/com/twitter/scalding/typed/TypedPipe.scala (2 lines): - line 773: // TODO: literals like this defeat caching in the planner - line 777: // TODO: literals like this defeat caching in the planner scalding-core/src/main/scala/com/twitter/scalding/TuplePacker.scala (2 lines): - line 51: * Packs a tuple into any object with set methods, e.g. thrift or proto objects. TODO: verify that protobuf - line 90: // TODO: filter by isAccessible, which somehow seems to fail scalding-core/src/main/scala/com/twitter/scalding/source/MaxFailuresCheck.scala (2 lines): - line 22: // TODO: this should actually increment an read a Hadoop counter - line 32: // TODO: use proper logging scalding-base/src/main/scala/com/twitter/scalding/typed/memory_backend/MemoryPlanner.scala (2 lines): - line 22: // TODO: counters not yet supported, but can be with an concurrent hashmap - line 123: // TODO we could optionally print out the descriptions scalding-hadoop-test/src/main/scala/com/twitter/scalding/platform/LocalCluster.scala (2 lines): - line 121: // TODO I desperately want there to be a better way to do this. I'd love to be able to run ./sbt assembly and depend - line 184: // TODO is there a way to know if we need to wait on anything to shut down, etc? scalding-json/src/main/scala/com/twitter/scalding/JsonLine.scala (2 lines): - line 33: * TODO: it would be nice to have a way to add read/write transformations to pipes that doesn't require - line 84: * TODO: at the next binary incompatible version remove the AbstractFunction2/scala.Serializable jank which scalding-commons/src/main/scala/com/twitter/scalding/commons/scheme/CombinedSequenceFileScheme.scala (2 lines): - line 8: // TODO Cascading doesn't support local mode yet - line 15: // TODO Cascading doesn't support local mode yet scalding-dagon/src/main/scala/com/twitter/scalding/dagon/Dag.scala (2 lines): - line 452: // TODO: this computation is really expensive, 60% of CPU in a recent benchmark - line 575: * Return the number of nodes that depend on the given Id, TODO we might want to cache these. We need to scalding-core/src/main/scala/com/twitter/scalding/typed/cascading_backend/CascadingExtensions.scala (2 lines): - line 221: // TODO we should do smarter logging here - line 229: ) /* FIXME: should we start printing deprecation warnings ? It's okay to set manually c.f.*.class though */ scalding-base/src/main/scala/com/twitter/scalding/typed/OptimizationRules.scala (2 lines): - line 317: * TODO: this could be more precise by combining more complex mapping operations into one large flatMap - line 1105: ) => // TODO it is not clear this is safe in cascading 3, since oncomplete is an each scalding-core/src/main/scala/com/twitter/scalding/FieldConversions.scala (2 lines): - line 67: // TODO get the comparator also - line 199: // TODO We could provide a reasonable conversion here by designing a rich type hierarchy such as scalding-commons/src/main/scala/com/twitter/scalding/commons/source/TsvWithHeader.scala (2 lines): - line 53: // TODO: move this method to make it a util function. - line 79: // TODO: move this method to make it a util function. scalding-base/src/main/scala/com/twitter/scalding/typed/memory_backend/Op.scala (1 line): - line 177: // TODO this is not by any means optimal. scalding-date/src/main/scala/com/twitter/scalding/Duration.scala (1 line): - line 28: // TODO: remove this in 0.9.0 scalding-spark/src/main/scala/com/twitter/scalding/spark_backend/Op.scala (1 line): - line 222: // TODO: spark has some thing to send replicated data to nodes scalding-core/src/main/scala/com/twitter/scalding/JobTest.scala (1 line): - line 74: // TODO: Switch the following maps and sets from Source to String keys scalding-hadoop-test/src/main/scala/com/twitter/scalding/platform/HadoopSharedPlatformTest.scala (1 line): - line 36: // TODO is there a way to buffer such that we see test results AFTER afterEach? Otherwise the results scalding-commons/src/main/scala/com/twitter/scalding/commons/source/VersionedKeyValSource.scala (1 line): - line 47: // TODO: have two apply methods here for binary compatibility purpose. Need to clean it up in next release. scalding-core/src/main/scala/com/twitter/scalding/Tracing.scala (1 line): - line 34: // TODO: remove this once we no longer want backwards compatibility scalding-base/src/main/scala/com/twitter/scalding/typed/Grouped.scala (1 line): - line 133: // TODO: implement blockJoin scalding-core/src/main/scala/com/twitter/scalding/WritableSequenceFile.scala (1 line): - line 34: // TODO Cascading doesn't support local mode yet scalding-parquet/src/main/java/com/twitter/scalding/parquet/tuple/ParquetTupleScheme.java (1 line): - line 39: * Currently, only primitive types are supported. TODO: allow nested fields in the Parquet schema to be scalding-core/src/main/scala/com/twitter/scalding/JoinAlgorithms.scala (1 line): - line 410: // TODO: try replacing this with a Count-Min sketch. scalding-hadoop-test/src/main/scala/com/twitter/scalding/platform/Scalatest.scala (1 line): - line 40: // TODO is there a way to buffer such that we see test results AFTER afterEach? Otherwise the results scalding-core/src/main/scala/com/twitter/scalding/Tool.scala (1 line): - line 75: // TODO use proper logging scalding-hraven/src/main/scala/com/twitter/scalding/hraven/estimation/HRavenHistoryService.scala (1 line): - line 98: * TODO: query hRaven for successful jobs (first need to add ability to filter results in hRaven REST API) scalding-core/src/main/scala/com/twitter/scalding/GroupBuilder.scala (1 line): - line 191: // TODO this may be fixed in cascading later scalding-serialization/src/main/scala/com/twitter/scalding/serialization/macros/impl/ordered_serialization/providers/StringOrderedBuf.scala (1 line): - line 87: // TODO: investigate faster ways to encode UTF-8, if scalding-core/src/main/scala/com/twitter/scalding/CascadingMode.scala (1 line): - line 122: // TODO unlike newFlowConnector, this does not look at the Job.config scalding-core/src/main/scala/com/twitter/scalding/source/CheckedInversion.scala (1 line): - line 24: * stopping the job TODO: probably belongs in Bijection scalding-dagon/src/main/scala/com/twitter/scalding/dagon/Expr.scala (1 line): - line 34: * TODO: see the approach here: https://gist.github.com/pchiusano/1369239 Which seems to show a way to do maple/src/main/java/com/twitter/maple/hbase/HBaseTap.java (1 line): - line 222: // TODO: for now we don't do anything just to be safe scalding-core/src/main/scala/com/twitter/scalding/RichFlowDef.scala (1 line): - line 129: // TODO: make sure we handle checkpoints correctly scalding-core/src/main/scala/com/twitter/scalding/typed/cascading_backend/HashJoiner.scala (1 line): - line 55: // TODO: it might still be good to count how many there are and materialize scalding-base/src/main/scala/com/twitter/scalding/typed/MultiJoinFunction.scala (1 line): - line 95: // TODO: it might make sense to cache this in memory as an IndexedSeq and not scripts/scald.rb (1 line): - line 108: #TODO: Add error checking on CONFIG["default_mode"]? scalding-core/src/main/scala/com/twitter/scalding/estimation/memory/SmoothedHistoryMemoryEstimator.scala (1 line): - line 63: // TODO handle gc scalding-serialization/src/main/scala/com/twitter/scalding/serialization/macros/impl/ordered_serialization/providers/TraversablesOrderedBuf.scala (1 line): - line 121: // TODO it would be nice to capture one instance of this rather scalding-base/src/main/scala/com/twitter/scalding/Execution.scala (1 line): - line 845: ) // TODO do i need to do something here to make this cancellable? scalding-core/src/main/scala/com/twitter/scalding/CoGroupBuilder.scala (1 line): - line 37: // TODO: move the automatic renaming of fields here scalding-args/src/main/scala/com/twitter/scalding/Args.scala (1 line): - line 157: // TODO: if there are spaces in the keys or values, this will not round-trip scalding-base/src/main/scala/com/twitter/scalding/typed/memory_backend/MemoryWriter.scala (1 line): - line 62: * TODO If we have a typed pipe rooted twice, it is not clear it has fanout. If it does not we will not scalding-beam/src/main/scala/com/twitter/scalding/beam_backend/BeamJoiner.scala (1 line): - line 131: // TODO: it might make sense to cache this in memory as an IndexedSeq and not scalding-date/src/main/scala/com/twitter/scalding/DateRange.scala (1 line): - line 71: * TODO: This should be Range[RichDate, Duration] for an appropriate notion of Range