使用扫描程序标记文件(Use Scanner to tokenize a file)

系统教程 行业动态 更新时间:2024-06-14 17:01:31
使用扫描程序标记文件(Use Scanner to tokenize a file)

我需要标记一个文本文件,其中标记由“[a-zA-Z] +”定义,以下作品:

Pattern WORD = Pattern.compile("[a-zA-Z]+"); File f = new File(...); FileInputStream inputStream = new FileInputStream(f); Scanner scanner = new Scanner(inputStream); e problem is String word = null; while( (word = scanner.findWithinHorizon(WORD, (int)f.length() )) != null ) { // process the word }

问题是findWithinHorizon需要int作为地平线,而文件长度是long类型。

什么是使用扫描仪标记大文件的合理方法?

I need to tokenize a text file where tokens are defined by "[a-zA-Z]+" The following works:

Pattern WORD = Pattern.compile("[a-zA-Z]+"); File f = new File(...); FileInputStream inputStream = new FileInputStream(f); Scanner scanner = new Scanner(inputStream); e problem is String word = null; while( (word = scanner.findWithinHorizon(WORD, (int)f.length() )) != null ) { // process the word }

The problem is that findWithinHorizon requires int as the horizon while the file length is of type long.

What is a sensible way tokenize a large file using a Scanner?

最满意答案

使用一个否定匹配模式的分隔符:

Scanner s = new Scanner(f).useDelimiter("[^a-zA-Z]+"); while(s.hasNext()) { String token = s.next(); // do something with "token" }

Use a delimiter that is the negation of the matching pattern:

Scanner s = new Scanner(f).useDelimiter("[^a-zA-Z]+"); while(s.hasNext()) { String token = s.next(); // do something with "token" }

更多推荐

本文发布于:2023-04-20 16:20:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/1c119d54a48ceb07217bfc21ac1c008f.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:标记   文件   程序   file   tokenize

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!