首页 > 系统教程文章详情

使用扫描程序标记文件(Use Scanner to tokenize a file)

系统教程行业动态更新时间:2024-06-14 17:01:31

使用扫描程序标记文件(Use Scanner to tokenize a file)

我需要标记一个文本文件，其中标记由“[a-zA-Z] +”定义，以下作品：

Pattern WORD = Pattern.compile("[a-zA-Z]+"); File f = new File(...); FileInputStream inputStream = new FileInputStream(f); Scanner scanner = new Scanner(inputStream); e problem is String word = null; while( (word = scanner.findWithinHorizon(WORD, (int)f.length() )) != null ) { // process the word }

问题是findWithinHorizon需要int作为地平线，而文件长度是long类型。

什么是使用扫描仪标记大文件的合理方法？

I need to tokenize a text file where tokens are defined by "[a-zA-Z]+" The following works:

Pattern WORD = Pattern.compile("[a-zA-Z]+"); File f = new File(...); FileInputStream inputStream = new FileInputStream(f); Scanner scanner = new Scanner(inputStream); e problem is String word = null; while( (word = scanner.findWithinHorizon(WORD, (int)f.length() )) != null ) { // process the word }

The problem is that findWithinHorizon requires int as the horizon while the file length is of type long.

What is a sensible way tokenize a large file using a Scanner?

最满意答案

使用一个否定匹配模式的分隔符：

Scanner s = new Scanner(f).useDelimiter("[^a-zA-Z]+"); while(s.hasNext()) { String token = s.next(); // do something with "token" }

Use a delimiter that is the negation of the matching pattern:

Scanner s = new Scanner(f).useDelimiter("[^a-zA-Z]+"); while(s.hasNext()) { String token = s.next(); // do something with "token" }

更多推荐

本文发布于:2023-04-20 16:20:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/dzcp/1c119d54a48ceb07217bfc21ac1c008f.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

标记文件程序 file tokenize

发布评论取消回复

评论列表（有 0 条评论）

热门文章