Please upgrade past Pleroma 2.7.0 (or at least patch it)
Hey there. Are you one of these "Fediverse" enthusiasts? Are you hard core enough to run an instance of some of this stuff? Do you run Pleroma? Is it version 2.7.0? If so, you probably should do something about that, like upgrading to 2.7.1 or something.
Based on my own investigations into really bad behavior in my web server logs, there's something that got into 2.7.0 that causes dumb things to happen. It goes like this: first, it shows up and does a HEAD. Then it comes back and does a GET, but it sends complete nonsense in the headers. Apache hates it, and it gets a 400.
What do I mean by nonsense? I mean sending things like "etag" *in the request*. Guess what, that's a server-side header. Or, sending "content-type" "and "content-length" *in the request*. Again, those are server-side headers unless you're sending a body, and why the hell would you do that on a GET?
I mean, seriously, I had real problems trying to understand this behavior. Who sends that kind of stuff in a request, right? And why?
This is the kind of stuff I was seeing on the inbound side:
raw_header { name: "user-agent" value: "Pleroma 2.7.0-1-g7a73c34d; < guilty party removed >" } raw_header { name: "date" value: "Thu, 05 Dec 2024 23:52:38 GMT" } raw_header { name: "server" value: "Apache" } raw_header { name: "last-modified" value: "Tue, 30 Apr 2024 04:03:30 GMT" } raw_header { name: "etag" value: "\"26f7-6174873ecba70\"" } raw_header { name: "accept-ranges" value: "bytes" } raw_header { name: "content-length" value: "9975" } raw_header { name: "content-type" value: "text/html" } raw_header { name: "Host" value: "rachelbythebay.com" }
Sending date and server? What what what?
Last night, I finally got irked enough to go digging around in their git repo, and I think I found a smoking gun. I don't know Elixir *at all*, so this is probably wrong on multiple levels, but something goofy seems to have changed with a commit in July, resulting in this:
def rich_media_get(url) do headers = [{"user-agent", Pleroma.Application.user_agent() <> "; Bot"}] with {_, {:ok, %Tesla.Env{status: 200, headers: headers}}} <- {:head, Pleroma.HTTP.head(url, headers, http_options())}, {_, :ok} <- {:content_type, check_content_type(headers)}, {_, :ok} <- {:content_length, check_content_length(headers)}, {_, {:ok, %Tesla.Env{status: 200, body: body}}} <- {:get, Pleroma.HTTP.get(url, headers, http_options())} do {:ok, body}
Now, based on my addled sense of comprehension for this stuff, this is just a guess, but it sure looks like it's populating "headers" with a user-agent, then fires that off as a HEAD. Then it takes the *incoming* headers, adds them to that, then turns the whole mess around and sends it as a GET.
Assuming I'm right, that would explain the really bizarre behavior.
There was another commit about a month later and the code changed quite a bit, including a telling change to NOT send "headers" back out the door on the second request:
defp head_first(url) do with {_, {:ok, %Tesla.Env{status: 200, headers: headers}}} <- {:head, Pleroma.HTTP.head(url, req_headers(), http_options())}, {_, :ok} <- {:content_type, check_content_type(headers)}, {_, :ok} <- {:content_length, check_content_length(headers)}, {_, {:ok, %Tesla.Env{status: 200, body: body}}} <- {:get, Pleroma.HTTP.get(url, req_headers(), http_options())} do {:ok, body} end end
Now both requests call a function (req_headers) which itself just supplies the user-agent as seen before.
What's frustrating is that the commit for this doesn't explain that it's fixing an inability to fetch previews of links or anything of the sort, and so the changelog for 2.7.1 doesn't say it either. This means users of the thing would have no idea if they should upgrade past 2.7.0.
Well, I'm changing that. This is your notification to upgrade past that. Please stop regurgitating headers at me. I know my servers are named after birds, but they really don't want to be fed that way.
...
One small side note for the devs: having version numbers and even git commit hashes made it possible to bracket this thing. Without those in the user-agent, I would have been stuck trying to figure it out based on the dates the behavior began, and that's never fun. The pipeline from "git commit" to actual users causing mayhem can be rather long.
So, whoever did that, thanks for that.