Grouping组查询怎么使用-快上网网站建设公司

Grouping组查询怎么使用

这篇文章主要介绍“Grouping组查询怎么使用”，在日常操作中，相信很多人在Grouping组查询怎么使用问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”Grouping组查询怎么使用”的疑惑有所帮助！接下来，请跟着小编一起来学习吧！

专注于为中小企业提供网站设计制作、成都网站制作服务,电脑端+手机端+微信端的三站合一,更高效的管理,为中小企业越城免费做网站提供优质的服务。我们立足成都，凝聚了一批互联网行业人才，有力地推动了近千家企业的稳健成长，帮助中小企业通过网站建设实现规模扩充和转变。

lucene的联想词是在org.apache.lucene.lucene-grouping包下边，提供了组查询功能的支持。

Grouping介绍

Grouping主要为用户提供了处理不同lucene的中含有相同filed的不同doc的分组统计。



    org.apache.lucene
	lucene-grouping
    5.5.2

Grouping深入

1. Grouping参数

groupField：要分组的字段（如果groupField在doc中不存在，会返回一个null的分组）
groupSort：分组的排序规则，排序字段决定了分组内容展示的先后顺序
topNGroup：分组展示的数量，只计算0到topNGroup条记录
groupOffset：从第几个TopGroup开始计算 例：groupOffset为3的话，会展示从3到topNGroup对应的记录，此数值我们可以用于分页查询
withinGroupSort：每组内怎么排序
maxDocsPerGroup：每组处理多少个doc
withinGroupOffset：每组显示的doc初始位置

2.Grouping实现

grouping实现需要两步：

利用TermFirstPassGroupingCollector来收集top groups
用TermSecondPassGroupingCollector处理每个group对应的doc

3. Grouping查询

对要搜索的信息创建Query查询对象，Lucene会根据Query查询对象生成最终的查询语法，类似关系数据库Sql语法一样Lucene也有自己的查询语法，比如："word:北京长安中西医结合医院"表示查询Field的word为"北京长安中西医结合医院"的文档信息。

可通过两种方法创建查询对象：

使用Lucene提供Query子类

Query是一个抽象类，lucene提供了很多查询对象，比如TermQuery项精确查询，NumericRangeQuery数字范围查询等。

Query query = new TermQuery(new Term("word", "北京市鼓楼中医院"));

//参数1：要查询的域 参数2：最小值 参数3：最大值  参数4：是否包含最小值  参数5：是否包含最大值
Query query =NumericRangeQuery.newLongRange("size", 0, 1000, 	true, true);


扩展：
BooleanQuery query = new BooleanQuery();

Query query1 =NumericRangeQuery.newLongRange("size", 0, 1000, true, true);
Query query2 = new TermQuery(new Term("word","北京市鼓楼中医院"));

//添加到BooleanQuery对象中
query.add(query1,Occur.MUST);
query.add(query2,Occur.MUST);

备注：
Occur.MUST:必须满足此条件
Occur.SHOULD:应该满足，但是不满足也可以
Occur.MUST_NOT:必须不满足

使用QueryParse解析查询表达式

QueryParse会将用户输入的查询表达式解析成Query对象实例。

QueryParser queryParser =new QueryParser("word", new StandardAnalyzer());
Query query = queryParser.parse("北京市鼓楼中医院");

Grouping用例

1. Controller层

package com.spring.master.lucene.group.controller;

import com.spring.master.lucene.group.service.GroupService;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import javax.annotation.Resource;

/**
 * @author Huan Lee
 * @version 1.0
 * @date 2020-09-14 14:19
 * @describtion 业精于勤，荒于嬉；行成于思，毁于随。
 */
@RestController
@RequestMapping(value = "/group")
public class GroupController {

    @Resource
    private GroupService groupService;

    /**
     * 创建索引文档
     * @return
     */
    @GetMapping(value = "/createIndex")
    public String createIndex() {
        return groupService.createIndex();
    }


    /**
     * 搜索分组
     * lucene 5
     * @return
     */
    @GetMapping(value = "/searchGroup5")
    public String searchGroup5(String keyword) {
        groupService.searchGroup5(keyword);
        return "true";
    }

    /**
     * 搜索分组
     * lucene 7
     * @return
     */
    @GetMapping(value = "/searchGroup7")
    public String searchGroup7(String keyword) {
        groupService.searchGroup7(keyword);
        return "true";
    }
}


访问地址：
localhost:2000/spring-master/group/createIndex
localhost:2000/spring-master/group/searchGroup5?keyword=北京长安中西医结合医院
localhost:2000/spring-master/group/searchGroup7?keyword=北京长安中西医结合医院

2. Service层

package com.spring.master.lucene.group.impl;

import com.spring.master.global.Identities;
import com.spring.master.lucene.group.constant.SystemConstants;
import com.spring.master.lucene.group.service.GroupService;
import com.spring.master.lucene.suggest.constant.SuggestConstants;
import com.spring.master.lucene.suggest.util.FileUtils;
import com.spring.master.lucene.suggest.vo.DictionaryVO;
import lombok.extern.slf4j.Slf4j;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.queryparser.xml.builders.MatchAllDocsQueryBuilder;
import org.apache.lucene.search.*;
import org.apache.lucene.search.grouping.GroupDocs;
import org.apache.lucene.search.grouping.GroupingSearch;
import org.apache.lucene.search.grouping.SearchGroup;
import org.apache.lucene.search.grouping.TopGroups;
import org.apache.lucene.search.grouping.term.TermAllGroupsCollector;
import org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector;
import org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.Version;
import org.nlpcn.commons.lang.util.MD5;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.Random;

/**
 * @author Huan Lee
 * @version 1.0
 * @date 2020-09-14 18:45
 * @describtion 业精于勤，荒于嬉；行成于思，毁于随。
 */
@Service
@Slf4j
public class GroupServiceImpl implements GroupService {

    /**
     * 指定在哪个索引上进行分组
     */
    static String groupField = "sourceType";

    /**
     * 标准分词器
     */
    private static Analyzer analyzer = new StandardAnalyzer();


    @Override
    public String createIndex() {

        try {
            Directory directory = FSDirectory.open(Paths.get(SystemConstants.indexDir));
            IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
            indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);

            IndexWriter writer = new IndexWriter(directory, indexWriterConfig);

            // 读DictionaryVO数据
            List diseases = FileUtils.readCsv(SuggestConstants.disease);
            List doctors = FileUtils.readCsv(SuggestConstants.doctor);
            List facultys = FileUtils.readCsv(SuggestConstants.faculty);
            List hospitals = FileUtils.readCsv(SuggestConstants.hospital);
            List drugcatalogues = FileUtils.readCsv(SuggestConstants.drugcatalogue);

            diseases.forEach(disease -> {
                Document doc = new Document();
                // 进行分组的域上建立的必须是SortedDocValuesField类型
                doc.add(new SortedDocValuesField(groupField, new BytesRef("Disease")));
                doc.add(new StringField("id", Identities.uuid(), Field.Store.YES));
                doc.add(new StringField("sourceType", "Disease", Field.Store.YES));
                doc.add(new TextField("word", disease.getWord(), Field.Store.YES));
                try {
                    writer.addDocument(doc);
                } catch (Exception e) {
                    log.error(e.getMessage());
                }
            });

            hospitals.forEach(hospital -> {
                Document doc = new Document();
                // 进行分组的域上建立的必须是SortedDocValuesField类型
                doc.add(new SortedDocValuesField(groupField, new BytesRef("Hospital")));
                doc.add(new StringField("id", Identities.uuid(), Field.Store.YES));
                doc.add(new StringField("sourceType", "Hospital", Field.Store.YES));
                doc.add(new TextField("word", hospital.getWord(), Field.Store.YES));
                try {
                    writer.addDocument(doc);
                } catch (Exception e) {
                    log.error(e.getMessage());
                }
            });

            facultys.forEach(faculty -> {
                Document doc = new Document();
                // 进行分组的域上建立的必须是SortedDocValuesField类型
                doc.add(new SortedDocValuesField(groupField, new BytesRef("Faculty")));
                doc.add(new StringField("id", Identities.uuid(), Field.Store.YES));
                doc.add(new StringField("sourceType", "Faculty", Field.Store.YES));
                doc.add(new TextField("word", faculty.getWord(), Field.Store.YES));
                try {
                    writer.addDocument(doc);
                } catch (Exception e) {
                    log.error(e.getMessage());
                }
            });

            drugcatalogues.forEach(drugcatalogue -> {
                Document doc = new Document();
                // 进行分组的域上建立的必须是SortedDocValuesField类型
                doc.add(new SortedDocValuesField(groupField, new BytesRef("Drugcatalogue")));
                doc.add(new StringField("id", Identities.uuid(), Field.Store.YES));
                doc.add(new StringField("sourceType", "Drugcatalogue", Field.Store.YES));
                doc.add(new TextField("word", drugcatalogue.getWord(), Field.Store.YES));
                try {
                    writer.addDocument(doc);
                } catch (Exception e) {
                    log.error(e.getMessage());
                }
            });

            doctors.forEach(doctor -> {
                Document doc = new Document();
                // 进行分组的域上建立的必须是SortedDocValuesField类型
                doc.add(new SortedDocValuesField(groupField, new BytesRef("Doctor")));
                doc.add(new StringField("id", Identities.uuid(), Field.Store.YES));
                doc.add(new StringField("sourceType", "Doctor", Field.Store.YES));
                doc.add(new TextField("word", doctor.getWord(), Field.Store.YES));
                try {
                    writer.addDocument(doc);
                } catch (Exception e) {
                    log.error(e.getMessage());
                }
            });

            // 6 -- no sourceType field
            Document doc = new Document();
            doc.add(new StringField("sourceType", "Faculty", Field.Store.YES));
            doc.add(new TextField("word", "高压氧室", Field.Store.YES));
            doc.add(new StringField("id", "0", Field.Store.YES));
            writer.addDocument(doc);

            writer.commit();
            writer.close();
        } catch (Exception e) {
            log.error(e.getMessage());
        }
        return "true";
    }

    @Override
    public void searchGroup5(String keyword) {

        try {

            Directory directory = FSDirectory.open(Paths.get(SystemConstants.indexDir));

            IndexReader reader = DirectoryReader.open(directory);
            IndexSearcher searcher = new IndexSearcher(reader);
            // term查询
//            Query query = new TermQuery(new Term("word", keyword));

            QueryParser queryParser =new QueryParser("word", new StandardAnalyzer());
            Query query = queryParser.parse(keyword);


            // 每个分组内部的排序规则
            Sort groupSort = Sort.RELEVANCE;

            // 前N条中分组
            int topNGroups = 10;

            // 分组起始偏移量
            int groupOffset = 0;

            // 是否填充SearchGroup的sortValues
            boolean fillFields = true;

            // groupSort用于对组进行排序，docSort用于对组内记录进行排序，多数情况下两者是相同的，但也可不同
            Sort docSort = groupSort;

            // 用于组内分页，起始偏移量
            int docOffset = 0;

            // 每组返回多少条结果
            int docsPerGroup = 5;

            // 是否需要计算总的分组数量
            boolean requiredTotalGroupCount = true;

            // 是否需要缓存评分
            boolean cacheScores = true;

            // 第一次查询缓存容量的大小：设置为16M
            double maxCacheRAMMB = 16.0;

            // 支持lucene5.x版本
            TermFirstPassGroupingCollector c1 = new TermFirstPassGroupingCollector("sourceType", groupSort, groupOffset + topNGroups);

            /** 将TermFirstPassGroupingCollector包装成CachingCollector，为第一次查询加缓存，避免重复评分
             *  CachingCollector就是用来为结果收集器添加缓存功能的
             */
            CachingCollector cachedCollector = CachingCollector.create(c1, cacheScores, maxCacheRAMMB);

            // 开始第一次分组统计
            searcher.search(query, cachedCollector);

            // 第一次查询返回的结果集TopGroups中只有分组域值以及每组总的评分，至于每个分组里有几条，分别哪些索引文档，则需要进行第二次查询获取
            Collection> topGroups = c1.getTopGroups(groupOffset, fillFields);

            if (topGroups == null) {
                System.out.println("No groups matched ");
                return;
            }

            // 是否获取每个分组内部每个索引的评分
            boolean getScores = true;

            // 是否计算最大评分
            boolean getMaxScores = true;

            // 如果需要对Lucene的score进行修正，则需要重载TermSecondPassGroupingCollector
            TermSecondPassGroupingCollector c2 = new TermSecondPassGroupingCollector("sourceType", topGroups, groupSort, docSort, docOffset + docsPerGroup, getScores, getMaxScores, fillFields);

            // 如果需要计算总的分组数量，则需要把TermSecondPassGroupingCollector包装成TermAllGroupsCollector
            // TermAllGroupsCollector就是用来收集总分组数量的
            TermAllGroupsCollector allGroupsCollector = null;

            Collector secondPassCollector = null;

            //若需要统计总的分组数量
            if (requiredTotalGroupCount) {
                allGroupsCollector = new TermAllGroupsCollector("sourceType");
                secondPassCollector = MultiCollector.wrap(c2, allGroupsCollector);
            } else {
                secondPassCollector = c2;
            }

            // 如果第一次查询已经加了缓存，则直接从缓存中取
            if (cachedCollector.isCached()) {
                // 第二次查询直接从缓存中取
                cachedCollector.replay(secondPassCollector);
            } else {
                // 开始第二次分组查询
                searcher.search(query, secondPassCollector);
            }

            // 所有组的数量
            int totalGroupCount = 0;

            // 所有满足条件的记录数
            int totalHitCount = 0;

            // 所有组内的满足条件的记录数(通常该值与totalHitCount是一致的)
            int totalGroupedHitCount = -1;
            if (requiredTotalGroupCount) {
                totalGroupCount = allGroupsCollector.getGroupCount();
            }

            TopGroups groupsResult = c2.getTopGroups(docOffset);
            // 这里打印的3项信息就是第一次查询的统计结果
            totalHitCount = groupsResult.totalHitCount;
            totalGroupedHitCount = groupsResult.totalGroupedHitCount;

            // 打印总的分组数量
            log.info("groupCount: {}", totalGroupCount);
            log.info("groupsResult.totalHitCount: {}", totalHitCount);
            log.info("groupsResult.totalGroupedHitCount: {}", totalGroupedHitCount);
            log.info("************************************");

            int groupIdx = 0;

            // 下面打印的是第二次查询的统计结果，如果你仅仅只需要第一次查询的统计结果信息，不需要每个分组内部的详细信息，则不需要进行第二次查询，请知晓
            // 迭代组
            for (GroupDocs groupDocs : groupsResult.groups) {
                groupIdx ++;
                String groupVL = groupDocs.groupValue == null ? "分组域的域值为空" : new String(groupDocs.groupValue.bytes);

                // 分组域的域值，groupIdx表示组的索引即第几组
                log.info("group[{}].groupFieldValue: {}", groupIdx, groupVL);
                // 当前分组内命中的总记录数
                log.info("group[{}].totalHits: {}", groupIdx, groupDocs.totalHits);

                int docIdx = 0;
                // 迭代组内的记录
                for (ScoreDoc scoreDoc : groupDocs.scoreDocs) {
                    docIdx++;
                    // 打印分组内部每条记录的索引文档ID及其评分
                    log.info("group[" + groupIdx + "][" + docIdx + "]{docID:Score}:" + scoreDoc.doc + "/" + scoreDoc.score);
                    // 根据docID可以获取到整个Document对象，通过doc.get(fieldName)可以获取某个存储域的域值
                    // 注意searcher.doc根据docID返回的document对象中不包含docValuesField域的域值，只包含非docValuesField域的域值，请知晓
                    Document doc = searcher.doc(scoreDoc.doc);
                    log.info("group[" + groupIdx + "][" + docIdx + "]{docID:author}:" + doc.get("id") + ":" + doc.get("word"));
                }
                log.info("*****************************************");
            }
        } catch (Exception e) {
            log.error(e.getMessage());
        }
    }

    @Override
    public void searchGroup7(String keyword) {

        try {
            Directory directory = FSDirectory.open(Paths.get(SystemConstants.indexDir));
            IndexReader reader = DirectoryReader.open(directory);
            IndexSearcher searcher = new IndexSearcher(reader);

            // 标准分词
            Analyzer analyzer = new StandardAnalyzer();

            // 指定要进行分组的索引
            GroupingSearch groupingSearch = new GroupingSearch(groupField);
            // 指定分组排序规则
            groupingSearch.setGroupSort(new Sort(SortField.FIELD_SCORE));
            // 是否填充SearchGroup的sortValues
            groupingSearch.setFillSortFields(true);
            groupingSearch.setCachingInMB(4.0, true);
            groupingSearch.setAllGroups(true);
//            groupingSearch.setAllGroupHeads(true);
            // 限制分组个数
            groupingSearch.setGroupDocsLimit(10);

            QueryParser parser = new QueryParser("word", analyzer);
            Query query = parser.parse(keyword);

            TopGroups result = groupingSearch.search(searcher, query, 0, 1000);

            // 总命中数
            log.info("总命中数: {}", result.totalHitCount);
            //
            log.info("分组数:{}", result.groups.length);

            // 按照分组打印查询结果
            for (GroupDocs groupDocs : result.groups){
                if (groupDocs != null) {
                    if (groupDocs.groupValue != null) {
                        log.info("分组:{}", groupDocs.groupValue.utf8ToString());
                    }else{
                        // 由于建立索引时有一条数据没有在分组索引上建立SortedDocValued索引，因此这个分组的groupValue为null
                        log.info("分组:{}", "unknow");
                    }
                    log.info("组内数据条数:{}", groupDocs.totalHits);

                    for(ScoreDoc scoreDoc : groupDocs.scoreDocs){
                        log.info("sourceType:{}", searcher.doc(scoreDoc.doc).get("sourceType"));
                        log.info("word:{}", searcher.doc(scoreDoc.doc).get("word"));
                        log.info("*****************************");
                    }
                    System.out.println("=====================================");
                }
            }
        } catch (Exception e) {
            log.error(e.getMessage());
        }
    }
}

3. Util

package com.spring.master.lucene.util;

import com.spring.master.lucene.suggest.vo.DictionaryVO;
import lombok.extern.slf4j.Slf4j;

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.List;

/**
 * @author Huan Lee
 * @version 1.0
 * @date 2020-09-11 09:57
 * @describtion 业精于勤，荒于嬉；行成于思，毁于随。
 */
@Slf4j
public class FileUtils {

    /**
     * 读取词典csv文件
     * @param fileNamePath
     * @return
     */
    public static List readCsv(String fileNamePath) {

        List dictionarys = new ArrayList<>();
        try {
            // 换成你的文件名
            BufferedReader reader = new BufferedReader(new FileReader(fileNamePath));
            String line;
            while ((line = reader.readLine()) != null) {
                // CSV格式文件为逗号分隔符文件，这里根据逗号切分
                String[] item = line.split(",");
                dictionarys.add(new DictionaryVO(item[0], item[1], Long.parseLong(item[2]), Long.parseLong(item[3])));
            }
        } catch (Exception e) {
            e.printStackTrace();
            log.error(e.getMessage());
        }
        return dictionarys;
    }
}

4. Constant

package com.spring.master.lucene.group.constant;

/**
 * @author Huan Lee
 * @version 1.0
 * @date 2020-09-14 14:27
 * @describtion 业精于勤，荒于嬉；行成于思，毁于随。
 */
public class SystemConstants {

    /**
     * 索引目录
     */
    public static final String indexDir = "/Users/lihuan/Documents/projects/git/me/dictionary";
}

到此，关于“Grouping组查询怎么使用”的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注创新互联网站，小编会继续努力为大家带来更多实用的文章！

网站标题：Grouping组查询怎么使用
网站路径：http://cdkjz.cn/article/gocsdd.html

多年建站经验

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

咨询相关问题或预约面谈，可以通过以下方式与我们联系

网站建设

网站推广

案例

方案

电商网站开发

微信小程序

我们

联系

精准传达 • 有效沟通

查看其它板块

Grouping组查询怎么使用

Grouping介绍

Grouping深入

1. Grouping参数

2.Grouping实现

3. Grouping查询

Grouping用例

1. Controller层

2. Service层

3. Util

4. Constant

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

大客户专线成都：13518219792 座机：028-86922220

友情链接交换友情链接

网络推广

Network promotion

网站方案

Solution

电商网站开发

E-commerce & System

我们

About Us

联系

Contact Us

精准传达 • 有效沟通

查看其它板块

Grouping组查询怎么使用

Grouping介绍

Grouping深入

1. Grouping参数

2.Grouping实现

3. Grouping查询

Grouping用例

1. Controller层

2. Service层

3. Util

4. Constant

相关资讯

建立自己网站的方法有哪些

360安全路由器设置网址，360路由器怎么设置

excel设置按钮，excel如何添加按钮

易名中国域名门户网站,域名管理须注册商有易名中国

cad设置图纸，autocad里面的图纸怎么设置啊

如何创立个人网站，个人网站怎么建立

织梦模板使用php,如何修改默认模板?两种方法（2）

什么是交互性

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

大客户专线 成都：13518219792 座机：028-86922220

友情链接 交换友情链接

大客户专线成都：13518219792 座机：028-86922220

友情链接交换友情链接