I am trying to access the first few lines of text files using the FileApi in JavaScript.
In order to do so, I slice an arbitrary number of bytes from the beginning of the file and hand the blob over to the FileReader.
For large files this takes very long, even though, my understanding currently is that only the first few bytes of the file need to be accessed.
Is there some implementation in the background that requires the whole file to be accessed before it can be sliced?
Does it depend on the browser implementation of the FileApi?
I currently have tested in both Chrome and Edge (chromium).
Analysis in Chrome using the performance dev tools shows a lot of idle time before the reader.onloadend and no increase in ram usage. This might be however, because the FileApi is implemented in the Browser itself and does not reflect in the JavaScript performance statistics.
My implementation of the FileReader looks something like this:
const reader = new FileReader();
reader.onloadend = (evt) => {
if (evt.target.readyState == FileReader.DONE) {
console.log(evt.target.result.toString());
}
};
// Slice first 10240 bytes of the filevar blob = files.item(0).slice(0, 1024 * 10);
// Start reading the sliced blob
reader.readAsBinaryString(blob);
This works fine but as described performs quite underwhelmingly for large files. I tried it for 10kb, 100mb and 6gb. The time until the first 10kb are logged seems to correlate directly to the file size.
Any suggestions on how to improve performance for reading the beginning of a file?
Edit: Using Response and DOM streams as suggested by @BenjaminGruenbaum does sadly not improve the read performance.
var dest = newWritableStream({​​​​​​​​
write(str) {​​​​​​​​
console.log(str);
}​​​​​​​​,
}​​​​​​​​);
var blob = files.item(0).slice(0, 1024 * 10);
(blob.stream ? blob.stream() : newResponse(blob).body)
// Decode the binary-encoded response to string
.pipeThrough(newTextDecoderStream())
.pipeTo(dest)
.then(() => {​​​​​​​​
console.log('done');
}​​​​​​​​);
There are a few things you can try to improve the performance of reading the beginning of a file with the FileReader API:
Use readAsArrayBuffer instead of readAsBinaryString: The readAsArrayBuffer method reads the file as a raw binary data and returns it as an ArrayBuffer, which can be faster than reading it as a binary string.
Use the progress event of the FileReader: You can use the progress event to read the file in chunks and display a progress bar to the user. This can improve the overall user experience, as the user will not have to wait as long for the file to be read.
Use the Streams API: Instead of using the FileReader, you can use the Streams API to read the file in chunks. The Streams API allows you to read a file as a stream of data, which can be faster than reading the whole file at once. You can use the pipeTo method to pipe the data from the stream to a writable stream, such as a DOM stream or a Response stream.
Use a web worker: If you are reading large files and the main thread is getting blocked, you can use a web worker to read the file in the background and send the data back to the main thread as it is read. This can improve the performance of the file reading, as it will not block the main thread.
Use a server-side solution: If you are reading very large files and none of the above solutions work, you may want to consider using a server-side solution to read the file and send the data to the client. This can be more efficient, as the server has more resources and can read the file faster than the client.