Working with Python Files I/O
When it comes to storing, reading, or communicating data, working with the files of an operating system is both necessary and easy with Python. Unlike other languages where file input and output requires complex reading and writing objects, Python simplifies the process only needing commands to open, read/write and close the file. This topic explains how Python can interface with files on the operating system.
- filename — the path to our file or, if the file is in the working directory, the filename of our file
- access_mode — a string value that determines how the file is opened
- buffering — an integer value used for optional line buffering
i) File Modes
There are different modes we can open a file with, specified by the mode parameter. These include:
‘r’ — reading mode. The default. It allows we only to read the file, not to modify it. When using this mode the file must exist.
‘w’ — writing mode. It will create a new file if it does not exist, otherwise will erase the file and allow we to write to it.
‘a’ — append mode. It will write data to the end of the file. It does not erase the file, and the file must exist for this mode.
‘rb’ — reading mode in binary. This is similar to r except that the reading is forced in binary mode. This is also a default choice.
‘r+’ — reading mode plus writing mode at the same time. This allows we to read and write into files at the same time without having to use r and w.
‘rb+’ — reading and writing mode in binary. The same as r+ except the data is in binary
‘wb’ — writing mode in binary. The same as w except the data is in binary.
‘w+’ — writing and reading mode. The exact same as r+ but if the file does not exist, a new one is made. Otherwise, the file is overwritten.
‘wb+’ — writing and reading mode in binary mode. The same as w+ but the data is in binary. ‘
ab’ — appending in binary mode. Similar to a except that the data is in binary.
‘a+’ — appending and reading mode. Similar to w+ as it will create a new file if the file does not exist. Otherwise, the file pointer is at the end of the file if it exists. ‘
ab+’ — appending and reading mode in binary. The same as a+ except that the data is in binary.
Python 3 added a new mode for exclusive creation so that we will not accidentally truncate or overwrite an existing file.
‘x’ — open for exclusive creation, will raise FileExistsError if the file already exists.
‘xb’ — open for exclusive creation writing mode in binary. The same as x except the data is in binary.
‘x+’ — reading and writing mode. Similar to w+ as it will create a new file if the file does not exist. Otherwise, will raise FileExistsError.
‘xb+’ — writing and reading mode. The exact same as x+ but the data is binary.
ii) Reading a file line by line:
The simplest way to iterate over a file line-by-line:
readline() allows for more granular control over line-by-line iteration. The example below is equivalent to the one above:
Using the for loop iterator and readline() together is considered bad practice.
To iterate all files, including in sub directories, use os.walk:
root_dir can be “.” to start from current directory, or any other path to start from.
iii) Getting the full contents of a file
The preferred method of file i/o is to use the with keyword. This will ensure the file handle is closed once the reading or writing has been completed.
iv) Writing to a file
If we open myfile.txt, we will see that the contents. Python doesn’t automatically add line breaks, we need to do that manually:
If we want to specify an encoding, you simply add the encoding parameter to the open function:
It is also possible to use the print statement to write to a file. The mechanics are different in Python 2 vs Python 3, but the concept is the same in that we can take the output that would have gone to the screen and send it to a file instead.
v) Check whether a file or path exists
Employ the EAFP coding style and try to open it.
vi) Checking if a file is empty
It returns boolean value False, which means my file/path is not empty.
vii) Read a file between a range of lines
So let’s suppose we want to iterate only between some specific lines of a file We can make use of itertools for that.
- This will read through the lines 12 to 30 as in python indexing starts from 0. So line number 1 is indexed as 0 As can also read some extra lines by making use of the next() keyword here.
- And when we are using the file object as an iterable, we don’t use the readline() statement here as the two techniques of traversing a file are not to be mixed together.
viii) Copy a directory tree
ix) Copying contents of one file to a different file
- Using the shutil module:
x) Context Managers
While Python’s context managers are widely used, few understand the purpose behind their use. These statements, commonly used with reading and writing files, assist the application in conserving system memory and improve resource management by ensuring specific resources are only in use for certain processes. This topic explains and demonstrates the use of Python’s context managers.
What is Context Managers?
A context manager is an object that is notified when a context (a block of code) starts and ends. We commonly use one with the with statement. It takes care of the notifying. For example, file objects are context managers. When a context ends, the file object is closed automatically.
Anything that ends execution of the block causes the context manager’s exit method to be called. This includes exceptions, and can be useful when an error causes you to prematurely exit from an open file or connection. Exiting a script without properly closing files/connections is a bad idea, that may cause data loss or other problems. By using a context manager we can ensure that precautions are always taken to prevent damage or loss in this way. This feature was added in Python 2.5.