Streams
Processes use streams for all of their I/O operations. A stream is a sequence of bytes. The bytes can represent any kind of data, for example, text, images, video, audio. In introductory programming courses, streams are associated with files. A program reads or writes a stream of data from a file on a storage device. But we will see that streams are much more versatile. We will show how programs can read or write streams of data from other programs. In other words, we will see that streams can be used to implement Inter-process Communication.
- https://en.wikipedia.org/wiki/Stream_(computing)
- https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/package-summary.html
Here are references to several online book chapters that review using streams, mostly for file I/O.
- Chapter 11: Files and Streams (PDF) from Java Java Java
- Chapter 2, Input and Output from Core Java, Volume II--Advanced Features, 11th Edition
- Section 11.1, I/O Streams (PDF) (Source code)
- Chapter 7, Input/Output from Java Language Features
- Chapter 11, Saving and Loading Information (code)from Carleton University
- Chapter 6, Streams and File/Device I/O (code) from Carleton University
- I/O Streams from the Java Tutorials
NOTE: The Java language now has two very different kinds of object that
are called "streams". There are the traditional I/O streams that we introduce
in this document. In addition, starting in Java 8, Java defined a Stream
class that is an implementation of the Stream abstract data type, an idea that
comes from functional programming languages. The new Stream class is not for
doing I/O. The new Stream class provides a modern way to process data
structures from the Java Collections Framework.
- https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/stream/package-summary.html
- https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/package-summary.html
Here are the basic "stream" classes in Java. You can see that the
java.util.stream.Stream class is nothing like the java.io.InputStream
or java.io.OutputStream classes.
- https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/InputStream.html
- https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/OutputStream.html
- https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/stream/Stream.html
The Stream abstract data type is becoming an important part of modern programming languages. It plays a big part in modern Java.
- https://en.wikipedia.org/wiki/Stream_(abstract_data_type)
- https://dev.java/learn/api/streams/
- https://www.baeldung.com/java-8-streams
- https://link.springer.com/content/pdf/10.1007/978-1-4842-7080-6_8
- https://link.springer.com/content/pdf/10.1007/978-1-4842-7135-3_6
Standard I/O Streams
When a process is created by the operating system, the process is always supplied with three open streams. These three streams are called the "standard streams". They are
- standard input (stdin)
- standard output (stdout)
- standard error (stderr)
We can visualize a process as an object with three "connections" where data (bytes) can either flow into the process or flow out from the process.
process
+-----------------+
| |
>------->>stdin stdout>>-------->
| |
| stderr>>-------->
| |
+-----------------+
A console application will usually have its stdin stream connected to the computer's keyboard and its stdout and stderr streams connected to the console window.
process
+-----------------+
| |
keyboard --->>stdin stdout>>------+---> console window
| | |
| stderr>>------+
| |
+-----------------+
It is important to realize that the above picture is independent of the programming language used to write the program which is running in the process. Every process looks like this. It is up to each programming language to allow programs, written in that language, to make use of this setup provided by the operating system.
Every operating system has its own way of giving a process access to the internal data structures the operating system uses to keep track of what each standard stream is "connected" to.
The Linux operating system gives every process three file descriptors,
#define STDIN_FILENO 0, STDOUT_FILENO 1, STDERR_FILENO 2
Linux provides the read() and write() system calls to let a process
read from and write to these file descriptors.
The Windows operating system gives every process three handles. We
retrieve the handles using the GetStdHandle() function with one of
these input parameters.
STD_INPUT_HANDLE, STD_OUTPUT_HANDLE, STD_ERROR_HANDLE
Windows provides the ReadFile() and WriteFile() system calls to let
a process read from and write to these handles.
Every programming language must have a way of representing the three standard streams and every language must provide a way to read from the standard input stream and a way to write to the standard output and standard error streams.
For example, here is how the three standard I/O streams are represented by some common programming languages.
Java uses Stream objects.
java.io.InputStream System.in
java.io.PrintStream System.out
java.io.PrintStream System.err
These are static fields in the java.lang.System class.
Standard C uses pointers to FILE objects.
FILE* stdin;
FILE* stdout;
FILE* stderr;
These are defined in the stdio.h header file.
Python uses text File objects.
sys.stdin
sys.stdout
sys.stderr
These are in the sys module.
C++ uses stream objects.
istream std::cin;
ostream std::cout;
ostream std::cerr;
These are defined in the <iostream> header.
.Net uses Stream objects.
System.IO.TextReader Console.In
System.IO.TextWriter Console.Out
System.IO.TextWriter Console.Error
These are static fields in the System.Console class.
The C language provides functions like getchar(), scanf(), and fscanf()
to read from stdin and it provides printf() and fprintf() to write to
stdout and stderr. On a Windows computer, the C language's printf()
function will be implemented using Window's WriteFile() system call with
the STD_OUTPUT_HANDLE handle. On a Linux computer, the C language's
printf() function will be implemented using Linux's write() system call
with the STDOUT_FILENO file descriptor.
- https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/System.html#field-summary
- https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/FileDescriptor.html
- https://man7.org/linux/man-pages/man3/stdio.3.html
- https://en.cppreference.com/w/c/header/stdio.html
- https://cplusplus.com/reference/cstdio/
- https://docs.python.org/3/library/sys.html#sys.stdin
- https://en.cppreference.com/w/cpp/header/iostream.html
- https://cplusplus.com/reference/iostream/
- https://learn.microsoft.com/en-us/dotnet/api/system.console
I/O Redirection
Every process is created by the operating system at the request of some other process, the parent process. When the parent process asks the operating system to create a child process, the parent must tell the operating system how to "connect" the child's three standard streams. The parent telling the operating system how to connect the child's three standard streams is usually referred to as I/O redirection.
At a shell command prompt, if we type a command like this,
> foo > result.txt
then the shell program (cmd.exe on Windows, or bash on Linux) is the parent
process. The above command tells the shell process to ask the operating system
to create a child process from the foo program. But in addition to asking the
operating system to create the child process, the shell process also instructs
the operating system to redirect the child process's standard output to the
file result.txt. So when the foo process runs, it looks like this.
foo process
+-----------------+
| |
keyboard --->>stdin stdout>>----> result.txt
| |
| stderr>>----> console window
| |
+-----------------+
Stdin and stderr have their default connections, and stdout is redirected
to the file result.txt.
If we type a command like this,
> foo > result.txt < data.txt
then the shell process will ask the operating system to create a child process
from the foo program and also ask the operating system to redirect the child
process's standard output to the file result.txt and redirect the child
process's standard input to the file data.txt. So when foo process runs,
it looks like this.
foo process
+-----------------+
| |
data.txt --->>stdin stdout>>----> result.txt
| |
| stderr>>----> console window
| |
+-----------------+
- https://en.wikipedia.org/wiki/Redirection_(computing)
- https://www.linfo.org/redirection.html
- https://man7.org/linux/man-pages/man1/bash.1.html#REDIRECTION
- https://ss64.com/nt/syntax-redirection.html
Shared streams
When two processes share a stream, it is usually the case that one of the two processes is idle while the other process uses the shared stream (the idle process will often be waiting for the other process to terminate). If two processes are simultaneously using a shared stream, the results can be confusing and unpredictable.
If two processes simultaneously use an output stream, then their outputs will be, more or less, randomly intermingled in the stream's final destination. This can lead to unusable results.
If two processes simultaneously use an input stream, as in the following
picture, then it is not the case that every input byte flows into each
process. Each input byte can only be consumed by one of the two processes.
Which process gets a particular byte of input depends on the ordering of
when each process calls its read() function on the input stream. This is
almost never a desirable situation. Processes almost never simultaneously
use a shared input stream. Shared input streams are very common, but the
two processes almost always have a way to synchronize their use of the
stream so that they are never reading from it simultaneously. The most
common way for two processes to share an input stream is for the parent
process to wait for the child process to terminate. Then the parent
process can resume reading from the input stream.
parent
+--------------+
| |
+----->>stdin stdout>>------->
| | |
| | stderr>>--->
| | |
| | |
| +--------------+
------+
|
| child
| +--------------+
| | |
+-------->>stdin stdout>>------>
| |
| stderr>>--->
| |
| |
+--------------+
Pipes
If we type a command like this,
> foo < data.txt | bar > result.txt
the shell process will ask the operating system to create two child
processes, one from the foo program and the other from the bar program.
In addition, the shell process will ask the operating system to create a
pipe object and have the stdout of the foo process redirected to the
input of the pipe, and have the stdin of the bar process redirected to
the output of the pipe. Finally, the shell process will ask the operating
system to redirect the bar process's standard output to the file result.txt
and redirect the foo process's standard input to the file data.txt. So
while this command is executing, it looks like this.
foo process bar process
+---------------+ +---------------+
| | pipe | |
data.txt-->>stdin stdout>>--========-->>stdin stdout>>------> result.txt
| | | |
| stderr>>---+ | stderr>>----+-> console window
| | | | | |
+---------------+ | +---------------+ |
| |
+-----------------------------+
In the above command, the two programs, foo and bar, are running
simultaneously (in parallel) with each other. The pipe object acts as a
"buffer" between the two processes. Whenever the foo process writes
something to its output, that something gets put in the pipe "buffer". Then
when the bar process wants to read some input data, it reads whatever is
currently in the pipe "buffer". If the foo process writes data faster than
the bar process reads data, then data accumulates in the pipe. When foo
terminates, it may be that data still remains in the pipe, in which case bar
will continue to run until it has emptied the pipe. On the other hand, if the
bar process reads data out of the pipe much faster than foo writes data
into the pipe, then the bar process will often find the pipe empty when
bar wants to read some data. In that case, bar "blocks" and waits until
some data shows up in the pipe. When the foo process writes its last bit
of data to the pipe and then foo terminates, the operating system will let
the bar process know that it has reached the "end-of-file" after the bar
process reads the last bit of data from the pipe.
Here is another way to think about the above pipeline command. The shell
process could run the two programs, foo and bar, sequentially, one
after the other. In other words, the shell process could interpret this
command,
> foo < data.txt | bar > result.txt
as the following three commands.
> foo < data.txt > temp
> bar < temp > result.txt
> del temp
These three commands would have a picture that looks like this.
foo process
+-----------------+
| |
data.txt --->>stdin stdout>>----> temp
| |
| stderr>>----> console window
| |
+-----------------+
bar process
+-----------------+
| |
temp --->>stdin stdout>>----> result.txt
| |
| stderr>>----> console window
| |
+-----------------+
First the foo process is executed with its output stored in a temporary
file called temp. Then the bar process is run with its input coming
from the temp file. Then the temp file gets deleted.
Notice that this sequential interpretation of the pipeline command might be considerably slower than the parallel interpretation. And since the sequential interpretation needs to store all the intermediate data in a temp file, the sequential interpretation may require far more storage space than the parallel interpretation.
Here is a more detailed picture of a Java process, its three standard
streams, and their buffers. The "user space" buffers belong to Java
classes and are used by Java methods. For example, the Scanner class,
and all its methods, have a user space input buffer. The PrintWriter
class, and its print(), println(), printf() methods, have a user
space output buffer. (Note: C and C++ processes do not have a user
space buffer for stderr.)
Java process
+---------------------------------------+
| |
kernel space | user space user space | kernel space
+------+ | +------+ +------+ | +------+
keyboard -->| |----->>-| |->stdin stdout->| |-->>---+--->| |---> console window
+------+ | +------+ +------+ | | +------+
buffer | buffer buffer | | buffer
| | |
| user space | |
| +------+ | |
| stderr->| |-->>---+
| +------+ |
| buffer |
+---------------------------------------+
Here is a sketch of two processes connected with a pipe and some of the associated buffers.
foo process bar process
+---------------+ user space +---------------+
| | +------+ | |
data.txt -->>stdin stdout>>-----| |--+ +--->>stdin stdout>>----> result.txt
| | +------+ | | | |
| | buffer | | | |
| | | | | |
| stderr>>--+ +--------+ | | stderr>>---+-> console window
| | | | | | | |
+---------------+ | | kernel space | +---------------+ |
| | +------+ | |
| +--| pipe |--+ | |
| +------+ | | |
| buffer | | |
| | | |
| +---------+ | |
| | | |
| | user space | |
| | +------+ | |
| +--| |----+ |
| +------+ |
| buffer |
| |
+-----------------------------------------------+
Filters and Pipelines
A filter is a program that reads data from its stdin, does some kind of
operation on the data, and then writes that converted data to its stdout.
In the filter_programs folder there are Java programs that can act as filter
programs. They are all very short programs that do simple manipulations of the
input characters. Look at the source code. Compile and then run them using
command-lines like the following.
> java Reverse < Readme.txt > result.txt
> java Double < Readme.txt | java Reverse
> java Double | java ToUpperCase | java Reverse
> java ShiftN 2 | java ToUpperCase | java Reverse
> java Twiddle < Readme.txt | java ToUpperCase | java Double | java RemoveVowels > result2.txt
Then run a couple of the programs by themselves, without any I/O redirection or pipes, to see how they manipulate input data (from the keyboard) to produce output data (in the console window).
> java ToUpperCase
> java Double
> java Reverse
> java MakeOneLine
Notice that you need to tap the Enter key to send input from the keyboard
to the program. Sometimes you see immediate output. Sometimes there is no
output until the input is terminated (end-of-file). You denote the end of
your input to the program by typing Control-z on Windows or Control-d
on Linux. *Do not use Control-C. That terminates the program (instead
of terminating just the program's input) and causes the program's output
to be lost.
- https://en.wikipedia.org/wiki/Pipeline_(Unix)
- https://en.wikipedia.org/wiki/Pipeline_(software)
- https://en.wikipedia.org/wiki/Filter_(software)
Command-line Syntax
We have seen that command-lines can be made up of, among other things, program names, command-line arguments, file names, I/O redirection operators, and pipes. In this section we will look at the syntax of building complex command-lines that combine all of these elements along with a few new elements.
CMD syntax.
- https://ss64.com/nt/syntax-redirection.html
- https://ss64.com/nt/syntax-conditional.html
- https://ss64.com/nt/syntax-esc.html
- https://ss64.com/nt/syntax.html
- https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/cmd
- https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/command-line-syntax-key
Bash syntax.