Solr + DIH + MySQL Spatial data types
Goal
I'm now trying to implement a text + spatial search system. I have a (currently) small data set that consists of name, address, location (latlng) etc. stored in MySQL.
As you may know, MySQL itself has spatial features, so at first I was looking around groonga storage engine, which provides a full text search feature to MySQL, but realized it would be too much of a hassle
to install it on our working system on Amazon EC2.
Then, I did a bit more research and found that Solr, which is a popular full text search system, has also spatial search features, and decided to use it.
I'm pretty new to Solr, so if you find anything incorrect, etc., please let me know.
Description
DIH (Data Import Handler)
As I wrote above, I have a MySQL table that has column of geometry type, and some others of normal data types like integer and varchar. Solr has a module called DataImportHandler (DIH), which is used to import data in RDBMS to Solr. You can find general information about it on the wiki.
Steps
The steps to import a table into Solr would be like the following:
- Define a shema (schema.xml)
- Data types
- Fields
- Define the mapping from the DB table to the Solr schema (db-data-config.xml)
It took me a few hours to get this up and running, but actually it turned out that most of the necessary information was found here.
Schema
To simplify the things, let's assume I have a table with id, name, and location. The schema would look like as follows:
<?xml version="1.0" encoding="UTF-8" ?> <schema name="mydata" version="1.1"> <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/> <!-- basic field types go on... --> <!-- For location data. This definition is taken from the example of Solr 3.3 --> <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/> </types> <fields> <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="name" type="text" indexed="true" stored="true"/> <field name="latlng" type="location" indexed="true" stored="true" required="true" /> <!-- Dynamic fields. This section is also taken from the example. --> <dynamicField name="*_i" type="sint" indexed="true" stored="true"/> <dynamicField name="*_s" type="string" indexed="true" stored="true"/> <!-- more dynamic fields here --> <!-- ########### THIS IS THE TRICK ############## --> <dynamicField name="*_coordinate" type="double" indexed="true" stored="false" multiValued="false" /> </fields> <!-- some more basic settings here --> </schema>
The point here is the dynamic field definition as written on the wiki page with an exception of multiValued being false. When I set it to true as in the document and issued a search query, I got the following error.
SEVERE: org.apache.solr.common.SolrException: can not use FieldCache on multivalued field: latlng_0_coordinate at org.apache.solr.schema.SchemaField.checkFieldCacheSource(SchemaField.java:183) at org.apache.solr.schema.DoubleField.getValueSource(DoubleField.java:49) at org.apache.solr.schema.LatLonType.createSpatialQuery(LatLonType.java:236) at org.apache.solr.search.SpatialFilterQParser.parse(SpatialFilterQParser.java:84) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:94) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
db-data-config.xml
For DIH, I started from the example that comes with the Solr package and the final xml is shown below.
<dataConfig> <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db" user="user" password="pass" /> <document> <entity name="mydata" query="select id, name CAST(CONCAT(X(latlng), ',', Y(latlng) ) AS CHAR) as latlng from sometable"> <field column="ID" name="id" /> <field column="NAME" name="name" /> <field column="LATLNG" name="lntlng" /> </entity> </document> </dataConfig>
I'm not sure column names must be capital or not, but just followed the convention in the example file.
Anyway, there are two points to mention about the latlng field.
Convert geometry to string of "nn.nnnnn,mmm.mmmmm"
You need to convert a geometry (or any other spatial data type) field to char when you import it to Solr. If you simply wrote the following,
select id, name, latlng from sometable
you would get the error below:
org.apache.solr.common.SolrException: org.apache.lucene.spatial.tier.InvalidGeoException: incompatible dimension (2) and values ([B@1b56848). Only 0 values specified at org.apache.solr.schema.LatLonType.createFields(LatLonType.java:69) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:198) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:276) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:636) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Import and search
Now, it's ready to import. Hit the following URL as usual and that's it.
http://localhost:8983/solr/db/dataimport?command=full-import
To search a spatial index, you can use filters. See this page on the wiki for more details.
Conclusion
It's relatively easy to implement a system that requires full text and spatial search features at the same time using Solr. Now, I need to do performance test and some such for production use.