通过Py2neo在Neo4j中建立关系非常缓慢(Building relationships in Neo4j via Py2neo is very slow)

数据库中有5种不同类型的节点。最大的一个有〜290k，最小的只有〜3k。每个节点类型都有一个id字段，并且它们都被编入索引。我使用py2neo来建立关系，但它非常慢（每秒插入约2个关系）

我使用从关系csv中读取的pandas ，迭代每一行以创建包含在事务中的关系。我在一次事务中尝试了批量创建10k个创建语句，但似乎并没有提高速度。

以下是代码：

df = pd.read_csv(r"C:\relationship.csv",dtype = datatype, skipinitialspace=True, usecols=fields) df.fillna('',inplace=True) def f(node_1 ,rel_type, node_2): try: tx = graph.begin() tx.evaluate('MATCH (a {node_id:$label1}),(b {node_id:$label2}) MERGE (a)-[r:'+rel_type+']->(b)', parameters = {'label1': node_1, 'label2': node_2}) tx.commit() except Exception as e: print(str(e)) for index, row in df.iterrows(): if(index%1000000 == 0): print(index) try: f(row["node_1"],row["rel_type"],row["node_2"]) except: print("error index: " + index)

有人能帮助我，我在这里做错了什么。谢谢！

We have 5 different types of nodes in database. Largest one has ~290k, the smallest is only ~3k. Each node type has an id field and they are all indexed. I am using py2neo to build relationship, but it is very slow (~ 2 relationships inserted per second)

I used pandas read from a relationship csv, iterate each row to create a relationship wrapped in transaction. I tried batch out 10k creation statements in one transaction, but it does not seem to improve the speed a lot.

Below is the code:

df = pd.read_csv(r"C:\relationship.csv",dtype = datatype, skipinitialspace=True, usecols=fields) df.fillna('',inplace=True) def f(node_1 ,rel_type, node_2): try: tx = graph.begin() tx.evaluate('MATCH (a {node_id:$label1}),(b {node_id:$label2}) MERGE (a)-[r:'+rel_type+']->(b)', parameters = {'label1': node_1, 'label2': node_2}) tx.commit() except Exception as e: print(str(e)) for index, row in df.iterrows(): if(index%1000000 == 0): print(index) try: f(row["node_1"],row["rel_type"],row["node_2"]) except: print("error index: " + index)

Can someone help me what I did wrong here. Thanks!

最满意答案

你声明有“5种不同类型的节点”（我解释为neo4j术语中的5个节点标签）。此外，你声明他们的id属性已经被编入索引。

但是你的f()函数不会生成一个使用标签的Cypher查询，也不会使用id属性。为了充分利用您的索引，您的Cypher查询必须指定节点标签和id值。

由于当前没有有效的方法来在执行MATCH时对标签进行参数化，因此以下版本的f()函数会生成具有硬编码标签（以及硬编码关系类型）的Cypher查询：

def f(label_1, id_1, rel_type, label_2, id_2): try: tx = graph.begin() tx.evaluate( 'MATCH' + '(a:' + label_1 + '{id:$id1}),' + '(b:' + label_2 + '{id:$id2}) ' + 'MERGE (a)-[r:'+rel_type+']->(b)', parameters = {'id1': id_1, 'id2': id_2}) tx.commit() except Exception as e: print(str(e))

调用f()的代码也必须更改为传递a和b的标签名称和id值。希望你的df行将包含这些数据（或者有足够的信息来获取这些数据）。

You state that there are "5 different types of nodes" (which I interpret to mean 5 node labels, in neo4j terminology). And, furthermore, you state that their id properties are already indexed.

But your f() function is not generating a Cypher query that uses the labels at all, and neither does it use the id property. In order to take advantage of your indexes, your Cypher query has to specify the node label and the id value.

Since there is currently no efficient way to parameterize the label when performing a MATCH, the following version of the f() function generates a Cypher query that has hardcoded labels (as well as a hardcoded relationship type):

def f(label_1, id_1, rel_type, label_2, id_2): try: tx = graph.begin() tx.evaluate( 'MATCH' + '(a:' + label_1 + '{id:$id1}),' + '(b:' + label_2 + '{id:$id2}) ' + 'MERGE (a)-[r:'+rel_type+']->(b)', parameters = {'id1': id_1, 'id2': id_2}) tx.commit() except Exception as e: print(str(e))

The code that calls f() will also have to be changed to pass in both the label names and the id values for a and b. Hopefully, your df rows will contain that data (or enough info for you to derive that data).

更多推荐

通过Py2neo在Neo4j中建立关系非常缓慢(Building relationships in Neo4j via Py2neo is very slow)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表