The Linux Page

Elassandra and the "discover" feature which can't be overwritten!

Sling — an elastic and a piece of wood

The Problem with Mapping

Today I spent some 3 hours looking into why I could not setup a certain field as text. The newer versions of Elassandra (ElasticSearch, really) makes use of keyword and text instead of string for text-like data. Only, by default the system sees those as list<text> and not just text. I'm not exactly sure why they do that, though.

Anyway, after many questions to Google, I found out that you have to tell the search system that your field is a singleton or it's going to view it as a container of type list. To do that, you just have to add cql_collection: "singleton" in the definition. That being said, it took me forever to find the problem in part because the "discover: ..." feature actually ignores overrides, even though it's shown that way in many examples all over the Internet and the ElasticSearch Documentations I've seen.

My How To

Say I have a simple table, say like this:

CREATE TABLE my_table (
    uuid_field uuid PRIMARY KEY,
    text_field text
);

To make it work with Elassandra, we need to create a custom index:

CREATE CUSTOM INDEX elastic_my_table_idx
                 ON my_search.my_table ()
              USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';

At this time I can't explain it, the custom indices that are created by express-cassandra do not get removed when I do a DELETE with the ElasticSearch REST API. However, a table I created, it deletes that index each time and I have to re-create it manually as shown above.

Next is the command to use to setup the `text_field`:

curl -XPUT \
     -H 'Content-Type: application/json' \
     'http://localhost:9200/my_search/?pretty' \
     -d '{ "mappings": { "my_table": { "properties": { "text_field": { "type": "keyword" } } } } }'

Unfortunately, this returns an error, which I still don't understand:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "i_o_exception",
        "reason" : "Existing column [text_field] type [text] mismatch with inferred type [list<text>]"
      }
    ],
    "type" : "i_o_exception",
    "reason" : "Existing column [text_field] type [text] mismatch with inferred type [list<text>]",
    "caused_by" : {
      "type" : "i_o_exception",
      "reason" : "Existing column [text_field] type [text] mismatch with inferred type [list<text>]"
    }
  },
  "status" : 500
}

Why infer list<text> when clearly the table text_field uses text as its type?

cqlsh:my_search> DESCRIBE my_table;
    
CREATE TABLE my_search.my_table (
    uuid_field uuid PRIMARY KEY,
    text_field text
) WITH bloom_filter_fp_chance = 0.01
      AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
      AND comment = ''
      AND compaction = {
            'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
            'max_threshold': '32',
            'min_threshold': '4' }
      AND compression = {
            'chunk_length_in_kb': '64',
            'class': 'org.apache.cassandra.io.compress.LZ4Compressor' }
      AND crc_check_chance = 1.0
      AND dclocal_read_repair_chance = 0.1
      AND default_time_to_live = 0
      AND gc_grace_seconds = 864000
      AND max_index_interval = 2048
      AND memtable_flush_period_in_ms = 0
      AND min_index_interval = 128
      AND read_repair_chance = 0.0
      AND speculative_retry = '99PERCENTILE';

Further, if I ask for a default mapping using the `discover` feature like so:

curl -XPUT \
     -H 'Content-Type: application/json' \
     'http://localhost:9200/my_search/_mapping/my_table?pretty' \
     -d '{"discover":".*"}'

It _works_, except that the column is assigned the default `keyword` type instead of text. Notice that in this case they do not choose to make the column a container and use a straight text and not a list<text>.

Here is the curl to check out the mapping:

    curl -XGET \
         -H 'Content-Type: application/json' \
         'http://localhost:9200/my_search/_mapping/my_table?pretty'

And we see that the "type" field is set to "keyword" which is correct in this case, that's the expected default.

{
  "my_search" : {
    "mappings" : {
      "my_table" : {
        "properties" : {
          "text_field" : {
            "type" : "keyword",
            "cql_collection" : "singleton"
          },
          "uuid_field" : {
            "type" : "keyword",
            "cql_collection" : "singleton",
            "cql_partition_key" : true,
            "cql_primary_key_order" : 0
          }
        }
      }
    }
  }
}

So, to fix my problem, instead of having a mix between the discover and some other types for one or two fields, I now define each field by hand. No choice.

Here is an example where I have a value that I want to be searchable as a text field (search any part of that text instead of the entire field as a keyword) which was used along the discover option:

es_index_mapping: {
    discover: '.*',
    properties: {
        text_field: {
            type: 'text',
            index: true
        }
    }
}

This one also is missing the cql_collection: "singleton" definition.

Side Note: Yes! The index parameter is expected to be true or false. In the old days you would put it to "analyzed", instead.

The correct way is to remove the discover field and define each table field properly manually:

es_index_mapping: {
    properties: {
        text_field: {
            type: 'text',
            cql_collection: 'singleton',
            index: true
        }
        uuid_field: {
            type: 'keyword',
            cql_collection: 'singleton',
            cql_partition_key: true,
            cql_primary_key_order: 0
        }
    }
}

Now I still have to test to make sure it works, but I'm pretty confident that this is the final solution to my problems. Nice field definitions written by hand.

WARNING: some of my examples use JavaScript simple objects as I use them with express-cassandra and not directly with curl. Remember that the curl packets expect valid JSON which means double quotes around all keywords left and right of the colon. The spaces and newlines are not required, though.