StreamTokenizer in Java

Introduction

Parsing text streams is a common task in many Java applications. The `StreamTokenizer` class in Java provides a flexible and efficient way to break down a stream of characters into tokens. In this blog post, we will explore the features and functionality of `StreamTokenizer` through 10 different code examples.

StreamTokenizer in Java
StreamTokenizer in Java


Example 1: Basic Usage

import java.io.*;

public class Example1 {
public static void main(String[] args) throws IOException {
String input = "Hello World 123.45 true";
StringReader reader = new StringReader(input);
StreamTokenizer tokenizer = new StreamTokenizer(reader);

while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
System.out.println("Token: " + tokenizer.sval);
}
}
}

Explanation: This example demonstrates the basic usage of `StreamTokenizer` to tokenize a string. It prints each token until the end of the stream is reached.

Example 2: Customizing Token Characters


import java.io.*;

public class Example2 {
public static void main(String[] args) throws IOException {
String input = "Name:John, Age:25, Gender:Male";
StringReader reader = new StringReader(input);
StreamTokenizer tokenizer = new StreamTokenizer(reader);

tokenizer.wordChars(':', ':');
tokenizer.wordChars(',', ',');

while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
System.out.println("Token: " + tokenizer.sval);
}
}
}

Explanation: This example customizes the word characters, allowing tokens to include colon and comma. It parses a string containing key-value pairs.

Example 3: Handling Numbers


import java.io.*;

public class Example3 {
public static void main(String[] args) throws IOException {
String input = "42 3.14 -7.5";
StringReader reader = new StringReader(input);
StreamTokenizer tokenizer = new StreamTokenizer(reader);

while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) {
System.out.println("Number: " + tokenizer.nval);
}
}
}
}

Explanation: This example focuses on handling numeric values. It identifies and prints the numeric tokens in the input string.

Example 4: Handling Quoted Strings


import java.io.*;

public class Example4 {
public static void main(String[] args) throws IOException {
String input = "\"Java Programming\" 'is fun'";
StringReader reader = new StringReader(input);
StreamTokenizer tokenizer = new StreamTokenizer(reader);

tokenizer.quoteChar('"');
tokenizer.quoteChar('\'');

while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
if (tokenizer.ttype == StreamTokenizer.TT_WORD) {
System.out.println("String: " + tokenizer.sval);
}
}
}
}

Explanation: This example handles quoted strings enclosed in either double or single quotes and prints each string token.

Example 5: Recognizing Words and Whitespace


import java.io.*;

public class Example5 {
public static void main(String[] args) throws IOException {
String input = "Java is amazing";
StringReader reader = new StringReader(input);
StreamTokenizer tokenizer = new StreamTokenizer(reader);

tokenizer.wordChars('a', 'z');
tokenizer.whitespaceChars(' ', ' ');

while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
System.out.println("Token: " + tokenizer.sval);
}
}
}

Explanation: This example demonstrates recognizing specific word characters and whitespace characters, producing meaningful tokens.

Example 6: Handling Comments


import java.io.*;

public class Example6 {
public static void main(String[] args) throws IOException {
String input = "/* This is a comment */ 42 // Another comment";
StringReader reader = new StringReader(input);
StreamTokenizer tokenizer = new StreamTokenizer(reader);

tokenizer.slashStarComments(true);
tokenizer.slashSlashComments(true);

while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
System.out.println("Token: " + tokenizer.sval);
}
}
}

Explanation: This example handles both block and line comments, ignoring them during tokenization.

Example 7: Handling EOF


import java.io.*;

public class Example7 {
public static void main(String[] args) throws IOException {
String input = "Java is great";
StringReader reader = new StringReader(input);
StreamTokenizer tokenizer = new StreamTokenizer(reader);

while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
System.out.println("Token: " + tokenizer.sval);
}
}
}

Explanation: This example showcases how to handle the end of the stream (EOF) during tokenization.

Example 8: Ignoring Whitespace


import java.io.*;

public class Example8 {
public static void main(String[] args) throws IOException {
String input = " Token1 Token2 ";
StringReader reader = new StringReader(input);
StreamTokenizer tokenizer = new StreamTokenizer(reader);

tokenizer.whitespaceChars(' ', ' ');

while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
System.out.println("Token: " + tokenizer.sval);
}
}
}

Explanation: This example demonstrates how to ignore leading and trailing whitespace in tokens.

Example 9: Handling Ordinary Characters


import java.io.*;

public class Example9 {
public static void main(String[] args) throws IOException {
String input = "Java@Programming";
StringReader reader = new StringReader(input);
StreamTokenizer tokenizer = new StreamTokenizer(reader);

tokenizer.ordinaryChar('@');

while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
System.out.println("Token: " + tokenizer.sval);
}
}
}

Explanation: This example treats '@' as an ordinary character, considering it as part of a token.

Example 10: Resetting Tokenization


import java.io.*;

public class Example10 {
public static void main(String[] args) throws IOException {
String input = "Token1 Token2 Token3";
StringReader reader = new StringReader(input);
StreamTokenizer tokenizer = new StreamTokenizer(reader);

while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
System.out.println("Token: " + tokenizer.sval);
}

// Reset the tokenizer for re-use
tokenizer.resetSyntax();

while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
System.out.println("Reused Token: " + tokenizer.sval);
}
}
}

Explanation: This example shows how to reset the tokenizer's state for re-use, allowing it to tokenize a new input stream.

Conclusion

In this comprehensive guide, we covered various aspects of the `StreamTokenizer` class in Java through 10 different code examples. From basic usage to advanced customization, you now have a solid foundation for efficiently parsing text streams in your Java applications. Experiment with these examples and adapt them to your specific use cases to master the art of stream tokenization.

Post a Comment

Previous Post Next Post