Using ZincSearch for PDF Indexing and Searching

TLDR Simonas inquires about using ZincSearch for PDF file indexing and searching. Prabhat suggests using an external program to read PDF contents and then ingesting the data with ZincSearch API.

Photo of Simonas
Simonas
Fri, 17 Mar 2023 12:00:22 UTC

Hey community, I recently just found zincsearch. I have a task to create a search engine for various pdf files/research papers. Maybe you have a similar use case and do zinc search is capable of indexing and search pdf files?

Photo of Prabhat
Prabhat
Fri, 17 Mar 2023 12:03:31 UTC

ZincSearch does not have native functionality to index pdf files. You will need an external program that can read the pdf file contents and then use ZincSearch API to ingest them which will allow you to us ethe search thereafter.

Photo of Simonas
Simonas
Fri, 17 Mar 2023 12:05:18 UTC

ok, thanks, I can do this, I have some solutions in mind to read papers

Photo of Prabhat
Prabhat
Fri, 17 Mar 2023 12:06:46 UTC

awesome. then read the files and simply call the _bulk aPI to ingest the data -

Photo of Prabhat
Prabhat
Fri, 17 Mar 2023 12:07:08 UTC

or

Photo of Simonas
Simonas
Fri, 17 Mar 2023 12:27:54 UTC

will do PoC and let you know, currently just doing research on technologies