正则表达式帮助将列表分成两元组(Regex to help split up list into two

编程入门 行业动态 更新时间:2024-10-25 20:30:56
正则表达式帮助将列表分成两元组(Regex to help split up list into two-tuples)

给定一个actor列表,用括号中的字符名称,用分号(;)或comm(,)分隔:

Shelley Winters [Ruby]; Millicent Martin [Siddie]; Julia Foster [Gilda]; Jane Asher [Annie]; Shirley Ann Field [Carla]; Vivien Merchant [Lily]; Eleanor Bron [Woman Doctor], Denholm Elliott [Mr. Smith; abortionist]; Alfie Bass [Harry]

我将如何将其解析为[(演员,角色),......形式的两种类型的列表)

--> [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'), ('Denholm Elliott', 'Mr. Smith; abortionist')]

我最初有:

actors = [item.strip().rstrip(']') for item in re.split('\[|,|;',data['actors'])] data['actors'] = [(actors[i], actors[i + 1]) for i in range(0, len(actors), 2)]

但这并不是很有效,因为它也会将项目分成括号。

Given a list of actors, with their their character name in brackets, separated by either a semi-colon (;) or comm (,):

Shelley Winters [Ruby]; Millicent Martin [Siddie]; Julia Foster [Gilda]; Jane Asher [Annie]; Shirley Ann Field [Carla]; Vivien Merchant [Lily]; Eleanor Bron [Woman Doctor], Denholm Elliott [Mr. Smith; abortionist]; Alfie Bass [Harry]

How would I parse this into a list of two-typles in the form of [(actor, character),...]

--> [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'), ('Denholm Elliott', 'Mr. Smith; abortionist')]

I originally had:

actors = [item.strip().rstrip(']') for item in re.split('\[|,|;',data['actors'])] data['actors'] = [(actors[i], actors[i + 1]) for i in range(0, len(actors), 2)]

But this doesn't quite work, as it also splits up items within brackets.

最满意答案

您可以使用以下内容:

>>> re.findall(r'(\w[\w\s\.]+?)\s*\[([\w\s;\.,]+)\][,;\s$]*', s) [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'), ('Julia Foster', 'Gilda'), ('Jane Asher', 'Annie'), ('Shirley Ann Field', 'Carla'), ('Vivien Merchant', 'Lily'), ('Eleanor Bron', 'Woman Doctor'), ('Denholm Elliott', 'Mr. Smith; abortionist'), ('Alfie Bass', 'Harry')]

人们也可以简化一些事情.*? :

re.findall(r'(\w.*?)\s*\[(.*?)\][,;\s$]*', s)

You can go with something like:

>>> re.findall(r'(\w[\w\s\.]+?)\s*\[([\w\s;\.,]+)\][,;\s$]*', s) [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'), ('Julia Foster', 'Gilda'), ('Jane Asher', 'Annie'), ('Shirley Ann Field', 'Carla'), ('Vivien Merchant', 'Lily'), ('Eleanor Bron', 'Woman Doctor'), ('Denholm Elliott', 'Mr. Smith; abortionist'), ('Alfie Bass', 'Harry')]

One can also simplify some things with .*?:

re.findall(r'(\w.*?)\s*\[(.*?)\][,;\s$]*', s)

更多推荐

本文发布于:2023-08-04 11:38:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1414847.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:两元   列表   正则表达式   list   split

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!