In-memory Decompression of gzipped Web-response body Data in Elixir

I was trying to write a web crawler with Elixir and Crawly (A web crawling framework written in Elixir). The website I was trying to crawl had their sitemap gzipped. So, each entry in their main sitemap returned me a file like “sitemap-01.xml.gz” which needed to be gunzipped to get the XML file I was after.

Initially, I thought HTTPoision will handle the decompression. But turns out HTTPoision is just a wrapper around hackney and hackney doesn’t support this. So, I did a few google searches, and being tired, I didn’t use the most effective keywords and ended up with File.stream! which obviously didn’t work. Because, File.stream! needs a file path which should have been a red flag, but I proceeded to go down the rabbit hole anyway.

Then I thought that it might work if I write the response to a file and decompress it with File.stream! but thinking about it gave me the chills, there’ll be a lot of files written, decompressed, and read from. So this wasn’t the solution I was even going to write and try out.

After a whole lot more searches and asking around, I finally found the answer (Huge thanks to Mafinar bhai), which is an Erlang lib called “zlib“. Using this library I could easily get the data I wanted to like the following code block:

response.body
|> :zlib.gunzip()

Now you might be asking why I didn’t use an HTTP client which had compression support like, Tesla? Because, I had HTTPoison free with Crawly, and I didn’t want to explore how to change it to Tesla or Mint due to a deadline. Yes, deadlines are the worst!