Skip to main content

Aerospike Database 6.1: Moving forward on query and data distribution

Secondary index improvements and enhanced throughput for XDR

Aerospike is proud to announce Aerospike Database 6.1, providing additional secondary index features and enhanced Cross-Datacenter Replication (XDR) throughput for rehydration and recovery use cases, is now Generally Available (GA). This release builds on our 6.0 and 5.X releases, where secondary indexes were re-architected and the XDR subsystem improved to enable fine-grained control and active-active datacenter configurations.

This release is a big step forward in support of more complex queries for real-time analytics as well as further establishing our industry-leading global data distribution capabilities.

Secondary indexes on nested elements of documents

Database 6.1 brings secondary index support to nested elements within a Map Collection Data Type (CDT), traditionally used to store JSON documents. This enhances Aerospike’s query capability when using a document modeling approach.  Along with new cardinality statistics (see below),  Aerospike SQL Powered by Starburst will be even more powerful.

We significantly enhanced our query capabilities with the release of Aerospike Database 6, rearchitecting secondary indexes in alignment with the design of our primary index. Database 6.1 removes the limitation to only index the top-level elements of a JSON document contained in a Map CDT. This allows developers to accelerate queries for documents where the predicate is matched against elements nested at arbitrary depths.

Warm start support for secondary indexes

Database 6.1 also significantly reduces the operational impact associated with using secondary indexes. In Aerospike Database Enterprise Edition (EE), secondary indexes will now be stored in shared memory, similar to the primary index. This means that secondary index use in EE will no longer slow down warm restarts. (This is an EE feature and is not available to users of Community Edition.)

This new secondary index functionality allows for wider application of secondary indexes at scale. With multiple secondary indexes added to clusters, a problem arises when restarting nodes through a rolling upgrade results in the rebuilding of all of the secondary indexes. By placing the secondary indexes in shared memory, the indexes can be restored from that shared memory rather than rebuilt, resulting in significant savings in the time to restart a node when a warm start is being performed. However, on a complete node shutdown and restart, all secondary indexes must be rebuilt through a data scan. Secondary indexes in shared memory are an EE feature.

Support for indexing a whole namespace

Previously, a secondary index call without a set name would apply to the “set of things, not in a set.” With the release of Database 6.1, a secondary index will now be preferred if it exists and matches when querying on a set. If not, a whole namespace secondary index will be used (should the index exist and match). This aligns secondary index behavior with the primary index.
(Please read the special upgrade instructions for Database 6.1 if you have secondary indexes that do not include a set name.)

Support for index cardinality information through info call

We constantly review our architecture for improvements, and in Database 6.1, we have added cardinality information on our indexes. This is available through an info command, allowing our Aerospike Presto/Trino Connector, Aerospike SQL Powered by Starburst, and our Spark Connector to use the cardinality of each applicable secondary index to the planning/optimization modules of Presto and Spark SQL for use in their SQL planning and optimization in situations where multiple secondary indexes are available.

To further understand how one might take advantage of the cardinality statistics when writing a query using the Aerospike APIs, we point out the following:

  • asadm -e 'enable;manage sindex create numeric occurred_idx ns sandbox set ufodata bin occurred'
  • asadm -e 'show statistics sindex for sandbox occurred_idx'
  • Similarly, asadm -e "asinfo -v 'sindex-stat:ns=sandbox; indexname=occured_idx' "
  • entries_per_bval – the ratio of entries to unique bin values for a given secondary index on the node
  • entries_per_rec – the ratio of entries to unique records for a given secondary index. Note that this will always be 1 if it is not a list or map index.

Values are integers (rounded to the nearest integer) and calculated using hyperloglog estimates for the unique bvals and recs, respectively. A background process generates the statistics. Zero values (0) mean the statistic has not been generated. The process runs at startup and every hour thereafter and upon creation and population of a secondary index.

The primary motivation for the statistic is to choose which secondary indexes are based upon using the lower entries_to_bval ratio as an indication of the stronger filter within the query.

By providing information on the cardinality of indexes, Aerospike connectors, as well as developers, can optimize query performance in complex queries or analytics.

Secondary Index Names Limited to 64 characters

Starting with Database 6.1, a secondary index name cannot exceed 64 characters in length. Such an index will fail to be created by the 6.1 nodes after an upgrade and restart. You will need to recreate the secondary index with a shorter name. (Please read the special upgrade instructions for Database 6.1 if you are using secondary indexes.)

Enhanced throughput for XDR

Aerospike Database 6.1 improves XDR throughput when XDR enters recovery mode to catch up after network disruptions or when a rewind of a namespace is triggered from a specified last-update-time (LUT). This is particularly beneficial when hydrating new clusters or transferring data between clusters when there is considerable write activity.

XDR uses multiple threads per node to service the queue of changes and route them along to the other side. When network disruptions or bursts of activity happen, XDR can fall behind and switch to recovery mode in order to catch up. Note that this mode is the same as when you are doing a rewind from the Last Update Time (LUT) or are rehydrating from one cluster to another.

In either mode, read lock contention can become an issue. In Database 6.1, we have optimized the recovery/rehydration code path by reducing the number of threads and thereby reducing the lock contention. This results in significant improvement in the performance of up to an order of magnitude for typical deployment topologies.

For the use of the Aerospike Database as a global data distribution point when rehydrating from a LUT, this is a significant improvement.

Summary

We think you’ll agree that Aerospike Database 6.1 demonstrates our continued focus on delivering the highest performance at the lowest cost and the ability to scale from gigabytes to petabytes and from thousands of transactions to millions of transactions per second. Aerospike Database 6.1 provides significant new functionality in support of queries across large data sets with the lowest latency and highest concurrent throughput of any non-relational database. And with the new support for cardinality statistics, Aerospike SQL Powered by Starburst delivers even more efficient SQL-based reporting and analytic functionality to leverage your real-time data. 6.1’s increase in XDR throughput delivers even more power to our customers’ data distribution needs providing unprecedented ability to move and replicate ever-larger amounts of data at high speeds.

For more information:

See Release notes

Try in our Code Sandbox (6.1 features coming very soon)

The post Aerospike Database 6.1: Moving forward on query and data distribution appeared first on SD Times.



from SD Times https://ift.tt/dC1RLHy

Comments

Popular posts from this blog

Difference between Web Designer and Web Developer Neeraj Mishra The Crazy Programmer

Have you ever wondered about the distinctions between web developers’ and web designers’ duties and obligations? You’re not alone! Many people have trouble distinguishing between these two. Although they collaborate to publish new websites on the internet, web developers and web designers play very different roles. To put these job possibilities into perspective, consider the construction of a house. To create a vision for the house, including the visual components, the space planning and layout, the materials, and the overall appearance and sense of the space, you need an architect. That said, to translate an idea into a building, you need construction professionals to take those architectural drawings and put them into practice. Image Source In a similar vein, web development and design work together to create websites. Let’s examine the major responsibilities and distinctions between web developers and web designers. Let’s get going, shall we? What Does a Web Designer Do?

A guide to data integration tools

CData Software is a leader in data access and connectivity solutions. It specializes in the development of data drivers and data access technologies for real-time access to online or on-premise applications, databases and web APIs. The company is focused on bringing data connectivity capabilities natively into tools organizations already use. It also features ETL/ELT solutions, enterprise connectors, and data visualization. Matillion ’s data transformation software empowers customers to extract data from a wide number of sources, load it into their chosen cloud data warehouse (CDW) and transform that data from its siloed source state, into analytics-ready insights – prepared for advanced analytics, machine learning, and artificial intelligence use cases. Only Matillion is purpose-built for Snowflake, Amazon Redshift, Google BigQuery, and Microsoft Azure, enabling businesses to achieve new levels of simplicity, speed, scale, and savings. Trusted by companies of all sizes to meet

2022: The year of hybrid work

Remote work was once considered a luxury to many, but in 2020, it became a necessity for a large portion of the workforce, as the scary and unknown COVID-19 virus sickened and even took the lives of so many people around the world.  Some workers were able to thrive in a remote setting, while others felt isolated and struggled to keep up a balance between their work and home lives. Last year saw the availability of life-saving vaccines, so companies were able to start having the conversation about what to do next. Should they keep everyone remote? Should they go back to working in the office full time? Or should they do something in between? Enter hybrid work, which offers a mix of the two. A Fall 2021 study conducted by Google revealed that over 75% of survey respondents expect hybrid work to become a standard practice within their organization within the next three years.  Thus, two years after the world abruptly shifted to widespread adoption of remote work, we are declaring 20