我在python中有一个字符串,我想以一种非常特殊的方式进行拆分.我想将其拆分为包含每个单独单词的列表,但一组单词以特定字符为边界的情况除外.例如,以下字符串将被拆分.
I have a string in python that I want to split in a very particular manner. I want to split it into a list containing each separate word, except for the case when a group of words are bordered by a particular character. For example, the following strings would be split as such.
'Jimmy threw his ball through the window.'成为
['Jimmy', 'threw', 'his', 'ball', 'through', 'the', 'window.']但是,我想要带有边框字符
However, with a border character I'd want
'Jimmy |threw his ball| through the window.'成为
['Jimmy', 'threw his ball', 'through', 'the', 'window.']作为附加组件,我需要-,它可能出现在分组短语的外面,以便在拆分后出现在其中,
As an additional component I need - which may appear outside the grouping phrase to appear inside it after splitting up i.e.,
'Jimmy |threw his| ball -|through the| window.'将成为
['Jimmy', 'threw his', 'ball', '-through the', 'window.']在没有很多复杂的for循环和if语句的情况下,我找不到一种简单的,pythonic的方式来做到这一点.有没有简单的方法来处理这样的事情?
I cannot find a simple, pythonic way to do this without a lot of complicated for loops and if statements. Is there a simple way to handle something like this?
推荐答案这不是开箱即用的解决方案,但这是一个非常像Python的函数,应该可以处理您扔给它的几乎所有内容
This isn't something with an out-of-the-box solution, but here's a function that's pretty Pythonic that should handle pretty much anything you throw at it.
def extract_groups(s): separator = repile("(-?\|[\w ]+\|)") components = separator.split(s) groups = [] for component in components: component = component.strip() if len(component) == 0: continue elif component[0] in ['-', '|']: groups.append(component.replace('|', '')) else: groups.extend(component.split(' ')) return groups使用您的示例:
>>> extract_groups('Jimmy threw his ball through the window.') ['Jimmy', 'threw', 'his', 'ball', 'through', 'the', 'window.'] >>> extract_groups('Jimmy |threw his ball| through the window.') ['Jimmy', 'threw his ball', 'through the', 'window.'] >>> extract_groups('Jimmy |threw his| ball -|through the| window.') ['Jimmy', 'threw his', 'ball', '-through the', 'window.']更多推荐
分割python字符串
发布评论