Apache Tika on Platform.sh
Apache Tika on Platform.sh
In this tutorial we will set up
Search API Solr, and
Apache Tika on Platform.sh.
tl;dr: Working example: platform-tika
Drupal 8 + Solr
Install Drupal 8 on Platform.sh. Getting the search modules: the full documentation for setting up Solr and Drupal 8 can be found here: Using Solr with Drupal 8.x. I won't replicate that excellent documentation here but the quick and dirty of it is you need to install and configure
composer require drupal/search_api composer require drupal/search_api_solr
Search API Attachments
The additional piece that you need for
tika is the
composer require drupal/search_api_attachments
Search API Attachments lets you point at the
tika jar file to index your PDF documents. Before we can point at the jar file we have to grab and install it on Platform.sh project instance.
Getting the Tika jar on Platform.sh
Platform offers two
hooks where you can manipulate your app at two stages of the deploy
deploy. The difference is that
build is run while the file system is still writable and
deploy runs after the container is started and the file system is frozen as read only. You can read the full docs on hooks here: Platform Hooks.
We will use the
build hook to bring in the Tika jar file while we can still write to the file system. Open your
.platform.app.yaml file and either add a new
build hook or add to it if you already have one:
# The hooks executed at various points in the lifecycle of the application. hooks: build: | mkdir -p /app/srv/bin cd /app/srv/bin && curl -OL http://download.nextag.com/apache/tika/tika-app-1.16.jar
This creates the directory
/srv/bin and downloads the tika jar executable
tika-app-1.16.jar into it. Here is the full file for reference: .platform.app.yaml.
Configure Search API Attachments
Now that we have the
tika-app-1.16.jar file in place we are ready to configure the
search_api_attachments module. Visit
/admin/config/search/search_api_attachments in your browser and add the method, java executable, and tika paths configuration:
These paths correspond to the paths you entered in the
.platform.app.yaml file for the
Adding Tika to Lando
You can add
tika to Lando in a similar fashion. Open up your
.lando.yml file and add the following
extras step to Install
services: appserver: extras: # Apache Tika - apt-get update -y - apt-get install -y openjdk-7-jre-headless - apt-get install -y openjdk-7-jdk - mkdir -p /app/srv/bin && cd /app/srv/bin - cd /app/srv/bin && curl -OL http://download.nextag.com/apache/tika/tika-app-1.16.jar - apt-get remove openjdk-7-jdk -y
Here is the full file for reference: .lando.yml.
Voila! Now you have all the power of
tika to index and search your docs and a local dev stack to match and test on! Happy searching 🔍🕵🔎.