环境
- Elasticsearch 7.1.1;
- ingest-attachment-7.1.1;
- CENTOS 7;
这里记录了使用Java将上传附件索引到Elasticsearch的实现方式。
前提
对文档的处理需借助Elastcisearch ingest-attachment插件,准备工作详见下列文章:
- CentOS安装Elasticsearch 7详细过程;
- Elasticsearch 7安装ingest-attachment插件的两种方式;
- 使用Elasticsearch Ingest Attachment Processor 插件处理文档;
- 使用Elasticsearch Ingest Attachment Processor 插件处理word/pdf文件;
实现
public String indexAttachmentToElasticSearch(String fileFullPath) {
String result = "error";
InputStream inputStream;
IndexRequest request;
try {
inputStream = new FileInputStream(new File(fileFullPath));
byte[] fileByteStream = IOUtils.toByteArray(inputStream);
String base64String = new String(Base64.getEncoder().encodeToString(fileByteStream).getBytes(), "UTF-8");
inputStream.close();
Map attachmentMap = new HashMap();
attachmentMap.put("data", base64String);
attachmentMap.put("fileName", "四个空格-https://www.4spaces.org");
String jsonString = JSONObject.toJSONString(attachmentMap);
request = new IndexRequest("data_archives_attachment");
request.id(UUID.randomUUID().toString());
request.setPipeline("single_attachment");
request.source(jsonString, XContentType.JSON);
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
result = response.status().toString();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (ElasticsearchException e) {
e.getDetailedMessage();
} catch (IOException e) {
e.printStackTrace();
}
return result;
}
上面代码中,single_attachment
为管道流名称,data_archives_attachment
为索引名称。
具体实现是借助Elastcisearch ingest-attachment插件处理BASE64字符串,文件转为BASE64字符串的过程为:
inputStream = new FileInputStream(new File(fileFullPath));
byte[] fileByteStream = IOUtils.toByteArray(inputStream);
String base64String = new String(Base64.getEncoder().encodeToString(fileByteStream).getBytes(), "UTF-8");
完整代码示例:https://github.com/aitlp/elastic;
这样的话数据库里的结构为{“attachment”:{“content”:”xxx”}},
要怎么才能变成{“content”:”xxx”},从而使结构保持一致?