正则表达式帮助将列表分成两元组(Regex to help split up list into two

正则表达式帮助将列表分成两元组(Regex to help split up list into two-tuples)

给定一个actor列表，用括号中的字符名称，用分号（;）或comm（，）分隔：

Shelley Winters [Ruby]; Millicent Martin [Siddie]; Julia Foster [Gilda]; Jane Asher [Annie]; Shirley Ann Field [Carla]; Vivien Merchant [Lily]; Eleanor Bron [Woman Doctor], Denholm Elliott [Mr. Smith; abortionist]; Alfie Bass [Harry]

我将如何将其解析为[（演员，角色），......形式的两种类型的列表）

--> [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'), ('Denholm Elliott', 'Mr. Smith; abortionist')]

我最初有：

actors = [item.strip().rstrip(']') for item in re.split('\[|,|;',data['actors'])] data['actors'] = [(actors[i], actors[i + 1]) for i in range(0, len(actors), 2)]

但这并不是很有效，因为它也会将项目分成括号。

Given a list of actors, with their their character name in brackets, separated by either a semi-colon (;) or comm (,):

Shelley Winters [Ruby]; Millicent Martin [Siddie]; Julia Foster [Gilda]; Jane Asher [Annie]; Shirley Ann Field [Carla]; Vivien Merchant [Lily]; Eleanor Bron [Woman Doctor], Denholm Elliott [Mr. Smith; abortionist]; Alfie Bass [Harry]

How would I parse this into a list of two-typles in the form of [(actor, character),...]

--> [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'), ('Denholm Elliott', 'Mr. Smith; abortionist')]

I originally had:

actors = [item.strip().rstrip(']') for item in re.split('\[|,|;',data['actors'])] data['actors'] = [(actors[i], actors[i + 1]) for i in range(0, len(actors), 2)]

But this doesn't quite work, as it also splits up items within brackets.

最满意答案

您可以使用以下内容：

>>> re.findall(r'(\w[\w\s\.]+?)\s*\[([\w\s;\.,]+)\][,;\s$]*', s) [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'), ('Julia Foster', 'Gilda'), ('Jane Asher', 'Annie'), ('Shirley Ann Field', 'Carla'), ('Vivien Merchant', 'Lily'), ('Eleanor Bron', 'Woman Doctor'), ('Denholm Elliott', 'Mr. Smith; abortionist'), ('Alfie Bass', 'Harry')]

人们也可以简化一些事情.*? ：

re.findall(r'(\w.*?)\s*\[(.*?)\][,;\s$]*', s)

You can go with something like:

>>> re.findall(r'(\w[\w\s\.]+?)\s*\[([\w\s;\.,]+)\][,;\s$]*', s) [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'), ('Julia Foster', 'Gilda'), ('Jane Asher', 'Annie'), ('Shirley Ann Field', 'Carla'), ('Vivien Merchant', 'Lily'), ('Eleanor Bron', 'Woman Doctor'), ('Denholm Elliott', 'Mr. Smith; abortionist'), ('Alfie Bass', 'Harry')]

One can also simplify some things with .*?:

re.findall(r'(\w.*?)\s*\[(.*?)\][,;\s$]*', s)

更多推荐