4 minutes
Using Rust for Simple Web Requests with Async
Intro
I was thinking of writing a script to fetch reddit threads and select some content to put on my motd. Considering it’s a really small project, I thought it would be best in python. Since reddit can directly feed json without authentication, any subreddit can be fetched using the requests library easier.
However, why not doing in Rust? There seems like a good request alternative on Rust called reqwest*. Since everyone is currently on the hype of making everything async, this library supports two modes, full and blocking async. I don’t think my tiny script will benefit from asynchronicity, but why not we try both anyway.
Implementation
Before making it specific, I created an earlier test version, which collects to 250 top posts from /r/all from 5 different calls and just prints them. The code complexity is not that difficult given considering the library as a black box. Creating a reqwest client and sending a get request is quite simple. I have used serde_json to parse the data afterwards.
Here’s the blocking version:
// [dependencies]
// reqwest = {version = "0.10.6", features = ["json", "blocking"] }
// serde_json = "1.0.53"
use serde_json::Value;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = reqwest::blocking::Client::new();
let mut last_id = String::new();
for _ in 0..5 {
let ret = client
.get(&format!(
"https://www.reddit.com/r/all/top.json?limit=50&after={}",
&last_id
))
.send()?;
last_id.clear();
let body = ret.text().unwrap();
let v: Value = serde_json::from_str(&body).unwrap();
if let Some(n) = v["data"]["dist"].as_u64() {
for i in 0..n {
println!("{}", v["data"]["children"][i as usize]["data"]["title"]);
}
let new_after = &v["data"]["children"][(n as usize) - 1]["data"]["name"];
last_id.insert_str(0, new_after.as_str().unwrap());
}
}
Ok(())
}
Using Rust 1.44, it took 2m 58s to compile the blocking version on release optimisations with 105 total packages to 6.2 MB (after strip it’s 3.2 MB) on Intel i7-6700HQ (8) @ 2.591GHz.
The async version is almost the same. I’m just using await commands once after the request is sent and once to retrieve the body and not benefiting of the asynchronicity as I told earlier.
// [dependencies]
// reqwest = {version = "0.10.6", features = ["json"] }
// tokio = {version = "0.2.21", features = ["macros"]}
// serde_json = "1.0.53"
use serde_json::Value;
#[tokio::main]
async fn main() -> Result<(), reqwest::Error> {
let client = reqwest::Client::new();
let mut last_id = String::new();
for _ in 0..5 {
let ret = client
.get(&format!(
"https://www.reddit.com/r/all/top.json?limit=50&after={}",
&last_id
))
.send()
.await?;
last_id.clear();
let body = ret.text().await?;
let v: Value = serde_json::from_str(&body).unwrap();
if let Some(n) = v["data"]["dist"].as_u64() {
for i in 0..n {
println!("{}", v["data"]["children"][i as usize]["data"]["title"]);
}
let new_after = &v["data"]["children"][(n as usize) - 1]["data"]["name"];
last_id.insert_str(0, new_after.as_str().unwrap());
}
}
Ok(())
}
It compiled in 2m 51s with 103 packages. The executable size is 5.9 MB (after strip it is 3 MB).
Both versions suggest that the high level reqwest library can be quite heavy for compile time. And the executable sizes are somewhat big for a project like this.
The same code in python as well for comparison:
import requests
import json
get_str = "https://www.reddit.com/r/all/top.json?limit=50&after={}"
last_id = ""
for _ in range(5):
ret = requests.get(get_str.format(last_id), headers={'User-agent': 'TEST'})
v = json.loads(ret.text)
n = int(v["data"]["dist"])
for i in range(n):
print(v["data"]["children"][i]["data"]["title"])
last_id = v["data"]["children"][n-1]["data"]["name"]
If it is used without indicating any user agent, you get 429 Too many requests all the time though. I assume it is due to some other people using the requests library and the default (or no) user agent header is just blocked this way.
We can actually find out what is the default user agents of both Rust reqwest and python requests by sending a simple request to:
https://httpbin.org/user-agent
Rust reqwest doesn’t sends any user-agent information apperantly if it is not set and it just return null. However, python requests by default set the user-agent to “python-requests/X.XX.X”, which can be blocked out by websites if they detect a misuse.
In case you are wondering, the user-agent field on reqwest can be set using the ClientBuilder this:
let client = reqwest::Client::builder()
.user_agent("TEST".to_string())
.build()?;
Final Words
Including the python version, they all perform the same. Maybe python version allocates a bit more memory for the whole process but I don’t think it is that important.
Considering the complexity of the writing the code in Rust, I think it’s fine for this case to stick with python equivalent. I guess if the json was too large to parse, the importance of using a high performant low level Rust can show it’s benefits, but this is not the case over here.
On the other hand, writing the Rust equivalents were still relatively not too chalenging either since the verbose compiler helps you in any stae. One can still consider Rust for this purpose if they want minimal resource usage. It is also a nice toy project to play with async I suppose.