Solr DataImportHandler 没有索引所有定义的数据

编程入门 行业动态 更新时间:2024-10-18 22:28:29
本文介绍了Solr DataImportHandler 没有索引所有定义的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我使用的是 solr5.3.

I am using solr5.3.

我正在尝试上传维基百科页面文章 使用DataImportHandler"转储到solr,但我在查询时只得到id和title文件.

I am trying to upload wikipedia page article dump to solr using "DataImportHandler" but I am getting only id and title files when i am querying.

下面是我的 data-config.xml

Below is my data-config.xml

<dataConfig> <dataSource type="FileDataSource" encoding="UTF-8" /> <document> <entity name="page" processor="XPathEntityProcessor" stream="true" forEach="/mediawiki/page/" url="/mnt/TEST/enwiki-20150602-pages-articles1.xml" transformer="RegexTransformer,DateFormatTransformer" > <field column="id" xpath="/mediawiki/page/id" /> <field column="title" xpath="/mediawiki/page/title" /> <field column="revision" xpath="/mediawiki/page/revision/id" /> <field column="user" xpath="/mediawiki/page/revision/contributor/username" /> <field column="userId" xpath="/mediawiki/page/revision/contributor/id" /> <field column="text" xpath="/mediawiki/page/revision/text" /> <field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" /> <field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/> </entity> </document> </dataConfig>

此外,我还在 schema.xml 中添加了以下内容.

Also I have added below entires to schema.xml.

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="title" type="string" indexed="true" stored="false"/> <field name="revision" type="int" indexed="true" stored="true"/> <field name="user" type="string" indexed="true" stored="true"/> <field name="userId" type="int" indexed="true" stored="true"/> <field name="text" type="text_en" indexed="true" stored="false"/> <field name="timestamp" type="date" indexed="true" stored="true"/> <field name="titleText" type="text_en" indexed="true" stored="true"/>

我已经从example/example-DIH/solr/solr/conf/schema.xml"复制了schema.xml,并删除了所有字段条目,只有评论中提到的少数例外.

I have copied schema.xml from "example/example-DIH/solr/solr/conf/schema.xml" and removed all field entries with few exceptions as mentioned in comments.

导入数据后,我只是想获取所有字段,但只获取Id"和Title".

After importing data I am just trying to fetch all fields but I am getting only "Id" and "Title".

此外,我尝试使用调试模式运行 documentImport,以便我可以获得有关索引的一些信息,但是每当我选择调试模式时,它只会导入 2 个文档.我不知道为什么?由于这个原因,我无法调试索引过程.

Also I tried to run documentImport using debug mode so that I can get some information regarding indexing, but at whenever i am selecting debug mode it is only importing 2 documents. I am not sure why? Due to this reason I am not able to debug the indexing process.

请进一步指导我.

编辑 - 我现在确定其他字段没有被索引,因为当我指定 df=user 或 text 时,我收到以下消息.

EDIT-I am now sure that other fields are not getting indexed because when I am specifying df=user or text, I am getting below message.

"msg": "未定义字段用户",

"msg": "undefined field user",

我查询如下:localhost:8983/solr/wiki/select?q=%3A&fl=id%2Ctitle%2Ctext%2Crevision&wt=json&indent=true&debugQuery=true

I am querying like below: localhost:8983/solr/wiki/select?q=%3A&fl=id%2Ctitle%2Ctext%2Crevision&wt=json&indent=true&debugQuery=true

推荐答案

提供的设置仅适用于经典架构.但是在 solrconfig 中默认启用了托管模式.因此,我没有收到文本.对于托管模式,我不需要定义schema.xml",我应该在 data-config.xml 中定义字段,如下所示.

The provided setting will work fine with classic schema only. But at solrconfig by default managed schema was enabled. Due to which I was not getting text. For managed schema I need not to define "schema.xml" and I should define fields in data-config.xml like below.

<field column="id" xpath="/mediawiki/page/id" /> <field column="title_s" xpath="/mediawiki/page/title" /> <field column="revision" xpath="/mediawiki/page/revision/id" /> <field column="user_s" xpath="/mediawiki/page/revision/contributor/username" /> <field column="userId" xpath="/mediawiki/page/revision/contributor/id" /> <field column="text_s" xpath="/mediawiki/page/revision/text" /> <field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" /> <field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/>

更多推荐

Solr DataImportHandler 没有索引所有定义的数据

本文发布于:2023-06-06 00:31:58,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/531108.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:索引   定义   数据   Solr   DataImportHandler

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!