我习惯于c风格 getchar(),但似乎没有可比的java。我正在建立一个词法分析器,我需要按字符读取输入字符。
I am used to the c-style getchar(), but it seems like there is nothing comparable for java. I am building a lexical analyzer, and I need to read in the input character by character.
我知道我可以使用扫描器扫描一个令牌或行并解析通过令牌char-by-char,但是对于跨越多行的字符串似乎是不实用的。有没有办法只是从Java的输入缓冲区中获取下一个字符,或者我应该只是插上Scanner类?
I know I can use the scanner to scan in a token or line and parse through the token char-by-char, but that seems unwieldy for strings spanning multiple lines. Is there a way to just get the next character from the input buffer in Java, or should I just plug away with the Scanner class?
输入是一个文件,而不是
The input is a file, not the keyboard.
推荐答案使用 Reader.read()。返回值-1表示流的结束;
Use Reader.read(). A return value of -1 means end of stream; else, cast to char.
此代码从文件参数列表中读取字符数据:
This code reads character data from a list of file arguments:
public class CharacterHandler { //Java 7 source level public static void main(String[] args) throws IOException { // replace this with a known encoding if possible Charset encoding = Charset.defaultCharset(); for (String filename : args) { File file = new File(filename); handleFile(file, encoding); } } private static void handleFile(File file, Charset encoding) throws IOException { try (InputStream in = new FileInputStream(file); Reader reader = new InputStreamReader(in, encoding); // buffer for efficiency Reader buffer = new BufferedReader(reader)) { handleCharacters(buffer); } } private static void handleCharacters(Reader reader) throws IOException { int r; while ((r = reader.read()) != -1) { char ch = (char) r; System.out.println("Do something with " + ch); } } }代码是它使用系统的默认字符集。只要有可能,就更喜欢已知的编码(如果你有选择,最好是Unicode编码)。有关详情,请参见 Charset 类。 (如果您觉得自虐,您可以阅读此字符编码指南。)
The bad thing about the above code is that it uses the system's default character set. Wherever possible, prefer a known encoding (ideally, a Unicode encoding if you have a choice). See the Charset class for more. (If you feel masochistic, you can read this guide to character encoding.)
(您可能需要注意的补充Unicode字符 - 那些需要两个字符值存储的网址。请参阅字符类别,这是一种边缘情况,可能不适用于家庭作业。)
(One thing you might want to look out for are supplementary Unicode characters - those that require two char values to store. See the Character class for more details; this is an edge case that probably won't apply to homework.)
更多推荐
如何在Java中逐个字符读取输入?
发布评论