Graphing around with Solr

Notes on querying graphs in Solr

Published

August 30, 2022

Setup

Editing

Setup your Quarto environment and install some dependencies into your environment:

jupyter 
graphviz 
tabulate 
jupyter-cache

Checkout this project and navigate to its root folder. Start the docker instance for Solr as described below, then run quarto preview to generate pages and open in a browser. Edits to pages are re-rendered and the browser refreshed on save.

Solr Instance

A Solr instance is needed to build these documents, and it is assumed the Solr service is available on port 18983. A minimal Solr instance can be deployed using docker and this docker-compose.yml:

version: '3'
services:
  solr:
    image: solr
    ports:
      - "18983:8983"
    volumes:
      - "solrdata:/var/solr"
    command: solr -f -cloud
volumes:
  solrdata:

Start it up like docker-compose up -d.

With the Solr instance running, we can create the test collection and populate it with the example graph documents.

Show the code
import example_graph
import graphutzing

solr = graphutzing.SolrConnection()
print("Deleting collection...")
solr.deleteCollection()
print("Creating collection...")
solr.createCollection()
print("Adding _nest_parent_ field to collection...")
solr.addField("_nest_parent_")
print("Populating index with test documents...")
ndocs = 0
for doc in example_graph.docs:
    ndocs += solr.addDocument(doc)
print(f"Added {ndocs} documents")
Deleting collection...
Creating collection...
{
  "responseHeader":{
    "status":0,
    "QTime":836},
  "success":{
    "172.18.0.2:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":368},
      "core":"reltest_shard1_replica_n1"}},
  "warning":"Using _default configset. Data driven schema functionality is enabled by default, which is NOT RECOMMENDED for production use. To turn it off: curl http://{host:port}/solr/reltest/config -d '{\"set-user-property\": {\"update.autoCreateFields\":\"false\"}}'"}

Adding _nest_parent_ field to collection...
{
  "responseHeader":{
    "status":400,
    "QTime":107},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject",
      "root-error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject"],
    "details":[{
        "errorMessages":["Field '_nest_parent_' already exists.\n"],
        "add-field":{
          "name":"_nest_parent_",
          "type":"string",
          "indexed":"true",
          "stored":"true"}}],
    "msg":"error processing commands, errors: [{errorMessages=[Field '_nest_parent_' already exists.\n], add-field={name=_nest_parent_, type=string, indexed=true, stored=true}}], ",
    "code":400}}

Populating index with test documents...
Added 14 documents

Publishing to GitHub Pages

This site is published to Github Pages using the Quarto publish gh-pages option. For example:

# First, commit and push edits
# Then publish
quarto publish gh-pages