k4200’s notes and thoughts

Programmer side of k4200

Solr + DIH + MySQL Spatial data types

Goal

I'm now trying to implement a text + spatial search system. I have a (currently) small data set that consists of name, address, location (latlng) etc. stored in MySQL.

As you may know, MySQL itself has spatial features, so at first I was looking around groonga storage engine, which provides a full text search feature to MySQL, but realized it would be too much of a hassle
to install it on our working system on Amazon EC2.

Then, I did a bit more research and found that Solr, which is a popular full text search system, has also spatial search features, and decided to use it.

I'm pretty new to Solr, so if you find anything incorrect, etc., please let me know.

Description

Environment
  • Solr 3.4
  • MySQL 5.1 (I don't think MySQL version matters)
DIH (Data Import Handler)

As I wrote above, I have a MySQL table that has column of geometry type, and some others of normal data types like integer and varchar. Solr has a module called DataImportHandler (DIH), which is used to import data in RDBMS to Solr. You can find general information about it on the wiki.

Steps

The steps to import a table into Solr would be like the following:

  • Define a shema (schema.xml)
    • Data types
    • Fields
  • Define the mapping from the DB table to the Solr schema (db-data-config.xml)

It took me a few hours to get this up and running, but actually it turned out that most of the necessary information was found here.

Schema

To simplify the things, let's assume I have a table with id, name, and location. The schema would look like as follows:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="mydata" version="1.1">
  <types>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
    <!-- basic field types go on... -->

    <!-- For location data. This definition is taken from the example of Solr 3.3 -->
    <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
  </types>

  <fields>
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text" indexed="true" stored="true"/>
    <field name="latlng" type="location" indexed="true" stored="true" required="true" />

    <!-- Dynamic fields. This section is also taken from the example. -->
    <dynamicField name="*_i"  type="sint"    indexed="true"  stored="true"/>
    <dynamicField name="*_s"  type="string"  indexed="true"  stored="true"/>
    <!-- more dynamic fields here -->

    <!-- ########### THIS IS THE TRICK ############## -->
    <dynamicField name="*_coordinate" type="double" indexed="true" stored="false" multiValued="false" />
  </fields>

  <!-- some more basic settings here -->
</schema>

The point here is the dynamic field definition as written on the wiki page with an exception of multiValued being false. When I set it to true as in the document and issued a search query, I got the following error.

SEVERE: org.apache.solr.common.SolrException: can not use FieldCache on multivalued field: latlng_0_coordinate
        at org.apache.solr.schema.SchemaField.checkFieldCacheSource(SchemaField.java:183)
        at org.apache.solr.schema.DoubleField.getValueSource(DoubleField.java:49)
        at org.apache.solr.schema.LatLonType.createSpatialQuery(LatLonType.java:236)
        at org.apache.solr.search.SpatialFilterQParser.parse(SpatialFilterQParser.java:84)
        at org.apache.solr.search.QParser.getQuery(QParser.java:142)
        at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:94)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

db-data-config.xml

For DIH, I started from the example that comes with the Solr package and the final xml is shown below.

<dataConfig>
    <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db" user="user" password="pass" />
      <document>
        <entity name="mydata" query="select id, name CAST(CONCAT(X(latlng), ',', Y(latlng) ) AS CHAR) as latlng from sometable">
            <field column="ID" name="id" />
            <field column="NAME" name="name" />
            <field column="LATLNG" name="lntlng" />
        </entity>
    </document>
</dataConfig>

I'm not sure column names must be capital or not, but just followed the convention in the example file.

Anyway, there are two points to mention about the latlng field.

Convert geometry to string of "nn.nnnnn,mmm.mmmmm"

You need to convert a geometry (or any other spatial data type) field to char when you import it to Solr. If you simply wrote the following,

select id, name, latlng from sometable

you would get the error below:

org.apache.solr.common.SolrException: org.apache.lucene.spatial.tier.InvalidGeoException: incompatible dimension (2) and values ([B@1b56848).  Only 0 values specified
        at org.apache.solr.schema.LatLonType.createFields(LatLonType.java:69)
        at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:198)
        at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:276)
        at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
        at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
        at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
        at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
        at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:636)
        at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
        at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
        at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
        at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
        at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Cast is necessary?

The other thing is to cast the result of CONCAT function to CHAR explicitly. I'm not sure it still exists, but found a post about a bug in MySQL regarding CONCAT function,

Import and search

Now, it's ready to import. Hit the following URL as usual and that's it.
http://localhost:8983/solr/db/dataimport?command=full-import

To search a spatial index, you can use filters. See this page on the wiki for more details.

Conclusion

It's relatively easy to implement a system that requires full text and spatial search features at the same time using Solr. Now, I need to do performance test and some such for production use.