State of Cassandra, 2011

Technology

jbellis
  • 1. State of Cassandra 2011Jonathan EllisApache ChairCTODataStax
  • 2. Job Trends from Indeed.com 2  
  • 3. Customer and Verticals•  Financial•  Social Media•  Advertising•  Entertainment•  Energy•  E-tail•  Health care•  Government 3  
  • 4. Why?Why?   4   4  
  • 5. 5  
  • 6. Why Cassandra?Why  Cassandra?   6   6  
  • 7. Better technology•  Multi-master, multi-DC•  Linearly scalable•  Larger-than-memory datasets•  Best-in-class performance (not just writes!)•  Fully durable•  Integrated caching•  Tuneable consistency 7  
  • 8. Tunable Consistency WRITE READ Level Level ANY ONE ONE LOCAL_QUORUM LOCAL_QUORUM QUORUM QUORUM ALL ALL
  • 9. Generalizes Easily to Multi-DC 9  
  • 10. 0.7•  CREATE COLUMN FAMILY•  Expiring columns (TTL)•  Secondary (column) indexes•  Efficient streaming 1 0  
  • 11. 0.8•  CQL•  Counters•  Automatic memtable tuning•  New bulk load interface 1 1  
  • 12. A performance retrospective 1 2  
  • 13. October 8, 2011Road to 1.0 13  
  • 14. Theme: polish•  Repair•  Compaction•  Optimize reads for update-heavy workloads•  CQL 1.1 1 4  
  • 15. Repair•  Consistency is checked per-ColumnFamily but data is transferred per-Keyspace•  Merkle trees requests are sent en masse, but may not execute start at the same time 1 5  
  • 16. Compression•  Rows-per-block or blocks-per-row 1 6  
  • 17. Read Performance: Compaction 1 7  
  • 18. Level-based Compaction•  SSTables are non-overlapping within a level•  Bounds the number that can contain a given row L2: 1000 MB L1: 100 MB L0: newly flushed 1 8  
  • 19. Read performance: maxtimestamp•  Sort sstables by maximum (client-provided) timestamp•  Only merge sstables until we have the columns request•  Allows pre-merging highly fragmented rows without waiting for compaction 1 9  
  • 20. CQLcqlsh> SELECT * FROM users WHERE state=UT AND birth_date > 1970;!
        KEY | birth_date |         full_name | state |
 bsanderson |       1975 | Brandon Sanderson |    UT |   2 0  
  • 21. CQL 1.1•  ALTER•  Counter support•  TTL support•  Compound columns•  Prepared statements 2 1  
  • 22. Post-1.0•  Ease of use• Ease of use• Ease of use 2 2  
  • 23. Post-1.0 features•  “Native” CQL transport•  Triggers•  Entity groups•  Smarter range queries 2 3  
  • 24. Brisk•  Analytics for your realtime data without ETL•  Widens scope of Cassandra’s applicability•  Also: Solandra 2 4  
  • 25. QuestionsQues/ons?   25   25  
Please download to view
26
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Description
 
Text
  • 1. State of Cassandra 2011Jonathan EllisApache ChairCTODataStax
  • 2. Job Trends from Indeed.com 2  
  • 3. Customer and Verticals•  Financial•  Social Media•  Advertising•  Entertainment•  Energy•  E-tail•  Health care•  Government 3  
  • 4. Why?Why?   4   4  
  • 5. 5  
  • 6. Why Cassandra?Why  Cassandra?   6   6  
  • 7. Better technology•  Multi-master, multi-DC•  Linearly scalable•  Larger-than-memory datasets•  Best-in-class performance (not just writes!)•  Fully durable•  Integrated caching•  Tuneable consistency 7  
  • 8. Tunable Consistency WRITE READ Level Level ANY ONE ONE LOCAL_QUORUM LOCAL_QUORUM QUORUM QUORUM ALL ALL
  • 9. Generalizes Easily to Multi-DC 9  
  • 10. 0.7•  CREATE COLUMN FAMILY•  Expiring columns (TTL)•  Secondary (column) indexes•  Efficient streaming 1 0  
  • 11. 0.8•  CQL•  Counters•  Automatic memtable tuning•  New bulk load interface 1 1  
  • 12. A performance retrospective 1 2  
  • 13. October 8, 2011Road to 1.0 13  
  • 14. Theme: polish•  Repair•  Compaction•  Optimize reads for update-heavy workloads•  CQL 1.1 1 4  
  • 15. Repair•  Consistency is checked per-ColumnFamily but data is transferred per-Keyspace•  Merkle trees requests are sent en masse, but may not execute start at the same time 1 5  
  • 16. Compression•  Rows-per-block or blocks-per-row 1 6  
  • 17. Read Performance: Compaction 1 7  
  • 18. Level-based Compaction•  SSTables are non-overlapping within a level•  Bounds the number that can contain a given row L2: 1000 MB L1: 100 MB L0: newly flushed 1 8  
  • 19. Read performance: maxtimestamp•  Sort sstables by maximum (client-provided) timestamp•  Only merge sstables until we have the columns request•  Allows pre-merging highly fragmented rows without waiting for compaction 1 9  
  • 20. CQLcqlsh> SELECT * FROM users WHERE state=UT AND birth_date > 1970;!
        KEY | birth_date |         full_name | state |
 bsanderson |       1975 | Brandon Sanderson |    UT |   2 0  
  • 21. CQL 1.1•  ALTER•  Counter support•  TTL support•  Compound columns•  Prepared statements 2 1  
  • 22. Post-1.0•  Ease of use• Ease of use• Ease of use 2 2  
  • 23. Post-1.0 features•  “Native” CQL transport•  Triggers•  Entity groups•  Smarter range queries 2 3  
  • 24. Brisk•  Analytics for your realtime data without ETL•  Widens scope of Cassandra’s applicability•  Also: Solandra 2 4  
  • 25. QuestionsQues/ons?   25   25  
Comments
Top