# Apache Tika on Platform.sh
Apache Tika is a java library that can extract metadata from documents such as PDF and create a searchable index for Solr.
In this tutorial we will set up Drupal 8
, Apache Solr
, Search API Solr
, and Apache Tika
on Platform.sh.
tl;dr: Working example: platform-tika
# Drupal 8 + Solr
Install Drupal 8 on Platform.sh. Getting the search modules: the full documentation for setting up Solr and Drupal 8 can be found here: Using Solr with Drupal 8.x. I won't replicate that excellent documentation here but the quick and dirty of it is you need to install and configure search_api
and search_api_solr
:
composer require drupal/search_api
composer require drupal/search_api_solr
# Search API Attachments
The additional piece that you need for tika
is the search_api_attachments
module.
composer require drupal/search_api_attachments
Search API Attachments lets you point at the tika
jar file to index your PDF documents. Before we can point at the jar file we have to grab and install it on Platform.sh project instance.
# Getting the Tika jar on Platform.sh
Platform offers two hooks
where you can manipulate your app at two stages of the deploy build
and deploy
. The difference is that build
is run while the file system is still writable and deploy
runs after the container is started and the file system is frozen as read only. You can read the full docs on hooks here: Platform Hooks.
We will use the build
hook to bring in the Tika jar file while we can still write to the file system. Open your .platform.app.yaml
file and either add a new build
hook or add to it if you already have one:
# The hooks executed at various points in the lifecycle of the application.
hooks:
build: |
mkdir -p /app/srv/bin
cd /app/srv/bin && curl -OL http://download.nextag.com/apache/tika/tika-app-1.16.jar
This creates the directory /srv/bin
and downloads the tika jar executable tika-app-1.16.jar
into it. Here is the full file for reference: .platform.app.yaml.
# Configure Search API Attachments
Now that we have the tika-app-1.16.jar
file in place we are ready to configure the search_api_attachments
module. Visit /admin/config/search/search_api_attachments
in your browser and add the method, java executable, and tika paths configuration:
These paths correspond to the paths you entered in the .platform.app.yaml
file for the build
step.
# Adding Tika to Lando
You can add tika
to Lando in a similar fashion. Open up your .lando.yml
file and add the following extras
step to Install tika
:
services:
appserver:
extras:
# Apache Tika
- apt-get update -y
- apt-get install -y openjdk-7-jre-headless
- apt-get install -y openjdk-7-jdk
- mkdir -p /app/srv/bin && cd /app/srv/bin
- cd /app/srv/bin && curl -OL http://download.nextag.com/apache/tika/tika-app-1.16.jar
- apt-get remove openjdk-7-jdk -y
Here is the full file for reference: .lando.yml.
# Conclusion
Voila! Now you have all the power of tika
to index and search your docs and a local dev stack to match and test on! Happy searching 🔍🕵🔎.