Ticket #1581 (closed defect: fixed)

Opened 3 years ago

Last modified 16 months ago

rTorrent dies with "rtorrent: PollKQueue::modify() error: Bad file descriptor0"

Reported by: chasba Owned by: rakshasa
Priority: highest Component: rtorrent
Version: Severity: major
Keywords: Cc:

Description

Hi. I'm running FreeBSD 7.0-RELEASE-p7 with rTorrent 0.8.4/0.12.4 installed from ports, cURL 7.18.0 with c-ares.

rTorrent keeps dying on me with exit message: rtorrent: PollKQueue::modify() error: Bad file descriptor0]

Removing all of the content in my session folder seems to get it running again, but rehashing 100+ GB of data is beginning to annoy me a bit.

What could be the cause of these crashes, and what can I do to get things running smoothly?

Attachments

fix_kqueue.diff Download (0.7 KB) - added by rakshasa 3 years ago.
Possible fix for the problem, try this.

Change History

  Changed 3 years ago by anonymous

At what point does it die? When starting up or later?

The most likely culprit is libcurl, it's been known to try to set event masks for file descriptors that are closed. It might help to upgrade to 7.18.2. But even then, according to the kevent docs, the kqueue syscall shouldn't return an error in that case but simply place an error event in the queue. So I don't know how it can fail like that. Do you regularly have over 1000 open sockets? That could mean the pending queue fills up and it can't place the error there.

You could try starting rtorrent with a low max_open_sockets setting (say, 500) and then change the setting after a minute or so (you can use a schedule) when things have stabilized.

  Changed 3 years ago by chasba

It occurs if I load several torrents, quit rTorrent with CTRL+q and restart with 'rtorrent' from the command line.

I am hoping to see an update to the FreeBSD cURL port soon, I also suspect this problem could be related to cURL/libcurl. The server is mainly a torrent server, not running anything else of importance. I can't see how it could have >1000 sockets open the way I use it.

As a test, I have now reinstalled rTorrent so it uses select polling, not kqueue. For now, it seems to start up and run correctly.

I will try setting max_open_sockets to ~500 as you mentioned when I've doing some more testing with select polling.

Thank you for your input!

  Changed 3 years ago by anonymous

Another workaround might be to stop all downloads before closing rtorrent, and open them one-by-one after restarting. There really should be an option for a staggered restart that doesn't try to do everything at once, opening dozens of downloads and tracker announces in the same instant.

  Changed 3 years ago by cjkenna

I have this same problem with rTorrent 0.8.4. cURL is 7.19.4 on FreeBSD 7.1-RELEASE-p4. After downgrading to 0.8.2, the problem is gone. However, when I start rtorrent up, many of the initial requests to the tracker error with the message, "Server returned nothing, no errors, no data." Perhaps this can help diagnose the problem.

  Changed 3 years ago by anonymous

same problem here: FreeBSD 7.1-RC2 with rtorrent 0.8.2_x 0.8.4_x | KQueue

  Changed 3 years ago by anonymous

Does it work if you compile without kqueue? Or apply  http://ovh.ttdpatch.net/~jdrexler/rt/env_poll.diff and turn it off that way.

  Changed 3 years ago by anonymous

Ok, I compiled it without kqueue(make in /usr/ports/net-p2p/rtorrent/ version 0.8.2) and now it uses select, it works perfectly! What are the compromises of not using kqueue? Thank you and I update here if something goes wrong!

ps: rtorrent port seems to be using a package called KQueue(2), I don't know what that (2) means

  Changed 3 years ago by anonymous

It's not a package, it's a manual page for the kqueue syscall. Using kqueue allows rtorrent to use less CPU with many connected peers. Using select is slower, especially with hundreds or even thousands of peers. As long as your computer is fast enough, there will still be full functionality.

  Changed 3 years ago by anonymous

any idea when this issue will get fixed ? kqueue works fine on other software, i'm sure some knowledgeable person can fix this.

  Changed 3 years ago by anonymous

to comment on my line above, i use all the latest version of almost everything, even freebsd 8 current. rtorrent works fine in epoll but it uses from 50 to 80% of my cpu when connected to many clients (my max socket is 2500). i tried wiht kqueue and the cpu load is a lot lower, but after a while (could be hours) it crashes.

I had to use the scheduler to increase the connection slowly on start.

by the way, i love rtorrent, i used to use mldonkey for my torrents but rtorrent downloads a lot faster with all the udp dth and encryption features

  Changed 3 years ago by anonymous

one more thing, on freebsd an include () is missing to be able to compile

#ifdef USE_KQUEUE #include <assert.h> #endif

Changed 3 years ago by rakshasa

Possible fix for the problem, try this.

  Changed 3 years ago by rakshasa

Removed a flush of events that wasn't supposed to be there, resulting in messed up internal state when more than 1024 socket state changes were made between two calls to poll.

  Changed 3 years ago by anonymous

I recompiled and trying the change. I will let you know.

Before that, rtorrent crashed when it had 400 sockets open, so i'm not sure why it did that, before it crashed at ~1000 (i guess 1024).

By the way, is there a way to get the logs into a file ? just the critial ones, like when it crash because of kqueue you get a message right before it dies ?

Also one thing, when it crashes, it would be great if the app could exit clean and save all the seesions, this way restarting would not require a rehash of all files.

  Changed 3 years ago by anonymous

When it crashes, you do NOT want to save state because it's very likely going to be corrupted. So you're quite likely to be sending bad chunks after restarting if you trust the state written in the death throes of a crash.

During a crash, the app by the very definition of a crash cannot "exit clean". It's like asking a soldier with a gaping head wound to not make a mess on the carpet.

  Changed 3 years ago by anonymous

so i've been running the client withe your last change, but it crashed again last night (signal 6 abort). It had about 750 connections before it died.

  Changed 3 years ago by anonymous

i finally got it to reach 1024 connections & it crashed right away once passed that number.

Do you have any other idea of what could be done to make it work passed that limit ?

I now have a script to monitor rtorrent and relaunch it when it dies :-(

follow-up: ↓ 22   Changed 3 years ago by josef

rakshasa, I don't think the flush_events() call was the problem here.

The code as it's written cannot correctly handle kqueue errors. Normally they're placed as events in the waiting event list returned by the kqueue call, but when the event list is full, or what the real problem is here, when kqueue is called with NO waiting event list at all (i.e. the "kevent(m_fd, m_changes, m_changedEvents, NULL, 0, NULL)" call), then the call itself returns an error.

So the proper fix would be to to have a waiting event list argument even in that case. I added the flush_events call to do that. If you flush_events() can't be called here (though I don't understand why), then I don't know if there's an easy fix for the problem. Especially when it could ever get to the point that the waiting event list actually fills up, i.e. when over 1024 FDs are active between two rtorrent ticks. I believe the waiting event list is only processed by rtorrent once per tick; that would have to be changed to properly support >1024 (active) connections. Or maybe simply change the waiting event list to a std::vector that can be grown as needed, whenever the kevent() call returns an error that indicates that it wanted to place an error event in the waiting event list but it was full.

At least the simplest kind of this problem might still be fixed by

void
PollKQueue::modify(Event* event, unsigned short op, short mask) {
  // Flush the changed filters to the kernel if the buffer is full.
  if (m_changedEvents == m_maxEvents)
    flush_events();

  struct kevent* itr = m_changes + (m_changedEvents++);
...

But I have no way of testing that properly, or if that is indeed the problem in this case, rather than actually filling up the waiting event list (which I would never be able to test... 1024 constantly active connections? Hah, my router would just curl up and die).

In any case kevent() should NEVER be called without a waiting event list because then the call will fail if any of the FDs have an error condition (such as a connection timing out between rtorrent adding the FD to the changed list and calling kevent(), for instance). Or it would need to figure out which of the FDs caused the error condition and removing it from the kevent argument before trying the call again. That would cost at least O(N log N) syscalls though.

A workaround for now could be to simply increase the fixed table size a lot, i.e. change the 1024 in PollKQueue::create to, say, 16384 or make it at least twice the size of the maxOpenSockets argument or something. The couple of kilobytes that would be wasted by this don't really matter at all.

  Changed 3 years ago by chasba

Glad(?) to see that there is still a discussion regarding this problem. For the record, yes, select based polling works flawlessly here on my FreeBSD 7.2-p1 system.

Tempting faith, I just tried a kqueue install throwing the same 'rtorrent: PollKQueue::modify() error: Bad file descriptor9]' error.

Still, thank you guys for looking into it! Thank Jebus for select based polling.

  Changed 3 years ago by rakshasa

This is why I hate debugging/coding without being able to test stuff...

  Changed 3 years ago by astadtler

Updating to the new rtorrent-devel (8.5/12.5) seems to solve this problem for me. Also curl has been updated to 7.19.5 in FreeBSD and the problems remain. I'm going to keep looking in to this though would be nice to have both versions working properly.

  Changed 3 years ago by anonymous

Ignore my post nevermind its still broken.

in reply to: ↑ 17 ; follow-up: ↓ 23   Changed 3 years ago by anonymous

Replying to josef:

A workaround for now could be to simply increase the fixed table size a lot, i.e. change the 1024 in PollKQueue::create to, say, 16384 or make it at least twice the size of the maxOpenSockets argument or something. The couple of kilobytes that would be wasted by this don't really matter at all.

This solved my kqueue problems.

in reply to: ↑ 22 ; follow-up: ↓ 24   Changed 3 years ago by anonymous

Replying to anonymous:

Replying to josef:

A workaround for now could be to simply increase the fixed table size a lot, i.e. change the 1024 in PollKQueue::create to, say, 16384 or make it at least twice the size of the maxOpenSockets argument or something. The couple of kilobytes that would be wasted by this don't really matter at all.

This solved my kqueue problems.

You mean this kinda patch?

- return new PollKQueue(fd, 1024, maxOpenSockets); + return new PollKQueue(fd, 16384, maxOpenSockets);

or no?

in reply to: ↑ 23   Changed 3 years ago by astadtler@gmail.com

Replying to anonymous:

Replying to anonymous:

Replying to josef:

A workaround for now could be to simply increase the fixed table size a lot, i.e. change the 1024 in PollKQueue::create to, say, 16384 or make it at least twice the size of the maxOpenSockets argument or something. The couple of kilobytes that would be wasted by this don't really matter at all.

This solved my kqueue problems.

You mean this kinda patch? - return new PollKQueue(fd, 1024, maxOpenSockets); + return new PollKQueue(fd, 16384, maxOpenSockets); or no?

Yep exactly

  Changed 2 years ago by chasba

Can anyone confirm the previous posting?

- return new PollKQueue(fd, 1024, maxOpenSockets); + return new PollKQueue(fd, 16384, maxOpenSockets);

I do not have the possibility to try it out myself. Thank you.

  Changed 2 years ago by anonymous

perhaps if josef would post a patch, it'd be possible to test ?

  Changed 2 years ago by astadtler@gmail.com

 http://prozium.us/poll_kqueue_freebsd.patch

There is a patch for you guys to try. We can probably get FLZ to commit it.

  Changed 2 years ago by thomas

Thanks, this fixed it for me!

  Changed 2 years ago by bjf@bryanfullerton.com

This patch fixes the issue for me as well. (FreeBSD 7.2-RELEASE-p3)

follow-up: ↓ 31   Changed 2 years ago by jesper@monsted.dk

The patch fixed my problems (8.0-RC1 and libtorrent-0.12.4).

in reply to: ↑ 30 ; follow-up: ↓ 32   Changed 2 years ago by astadtler@gmail.com

I'm using on 8.0-RC-1 and rtorrent 8.5/12.5 , I submitted the patch to the port maintainer a few weeks ago but never heard anything back.

in reply to: ↑ 31   Changed 2 years ago by wonslung@gmail.com

how do you apply this patch? i'm having this bug

  Changed 2 years ago by wonslung@gmail.com

never mind, i found it...i was looking in the rtorrent source, not the libtorrent source

  Changed 2 years ago by lucasreddinger

thanks for the patch--it's working well so far! i look forward to getting this added to the port.

% uname -mrs FreeBSD 7.1-PRERELEASE i386 %

  Changed 2 years ago by pabelanger@gmail.com

Thanks for the patch. I have requested this be merged into freebsd ports, but also hope it will get merge upstream too.

  Changed 2 years ago by chasba

Great stuff! Thank you for committing this, I will upgrade ASAP.

Again, thank you.

  Changed 2 years ago by rakshasa

  • status changed from new to closed
  • resolution set to fixed

  Changed 2 years ago by resipsa

The freebsd patch fixes the problem os OS X too.

  Changed 2 years ago by lucasreddinger

It's happening again! D:

( 7:46:18) Using 'kqueue' based polling. rtorrent: PollKQueue::modify() error: Bad file descriptor

% uname -mrs FreeBSD 8.0-RELEASE i386 % pkg_info|grep libtorrent libtorrent-0.12.4 BitTorrent Library written in C++ % pkg_info|grep rtorrent rtorrent-0.8.4_1 BitTorrent Client written in C++ %

This is on a fresh FreeBSD 8.0-release installation using the latest i386 package.

  Changed 2 years ago by ariykd@gmail.com

This occurred for me using the MacPorts? rtorrent-devel/libtorrent-devel (0.8.6 and 0.12.6) on PowerPC OS X 10.5. I opened a Unix socket and attempted to connect from wtorrent which caused a crash. After, starting rtorrent would bring up the main screen, freeze for about 10 seconds, and dump me back to the shell with the following:

Caught Segmentation fault, dumping stack:
0 0   rtorrent                            0x000415f0 _Z8do_panici + 112
1 1   rtorrent                            0x00047f0c _ZN13SignalHandler6caughtEi + 540
2 2   libSystem.B.dylib                   0x91fdc9fc _sigtramp + 68
Abort trap

GDB backtrace:

rtorrent: PollKQueue::modify() error: Bad file descriptor

Program exited with code 0377.

I resolved it by deleting all my sessions.

  Changed 16 months ago by lucasreddinger

Is there a cleaner solution than continually increasing maxOpenSockets? Inevitably, someone will try to seed more and more torrents and keep running into this problem.

Note: See TracTickets for help on using tickets.