Ticket #321 (reopened defect)

Opened 5 years ago

Last modified 16 months ago

Sequential download when there is only one peer

Reported by: anonymous Owned by: rakshasa
Priority: normal Component: libtorrent
Version: Severity: normal
Keywords: Cc:

Description

I noticed that rtorrent downloads chunks sequentially when there is only one seed and no peers.

In this scenario the strategy may seem optimal (an usable continuous piece of the torrent is downloaded) but it may happen that another peer connects later. If that happened, and both used this strategy, they could not exchange chunks because they would both have the beginning of the torrent.

Change History

  Changed 5 years ago by rakshasa

  • status changed from new to closed
  • resolution set to invalid

Except they don't, rarest-first is still active. (Except for some caching of chunks)

  Changed 5 years ago by anonymous

  • status changed from closed to reopened
  • resolution invalid deleted

well, there are no rare chunks when there is only one seed. All are there exactly once. And I did notice sequential download on a few torrents so it does happen.

Of course, once other peers connect the rarest-first algorithm would select the rare chunks, and downloading from several sources would leave holes.

But when there is only one source the download is sequential.

In a scenario like this:

1) there is a seed S (with no peers)

2) peer A connects to S using libtorrent, and downloads start of the torrent (sequentially)

3) peer A disconnects

4) peer B connects to S, and downloads start of the torrent (sequentially)

5) peer A reconnects to download the rest of the torrent

You get the worst case. One of the peers has subset of the chunks the other peer has.

  Changed 5 years ago by rakshasa

The questions then becomes, is it worth causing a disk seek for every single chunk downloaded in the (rather rare) cases the above scenario might happen. BitTorrent is bad enough on harddrives as it is, and the above can be solved when the Fast Extension becomes widespread.

  Changed 5 years ago by anonymous

It causes a disk seek in any other case, so what's the problem if it does it in this 'rare' case as well?

  Changed 5 years ago by anonymous

Anyway, I do not see how Fast Extension is going to solve this.

There is some "allowed fast list" but I do not see how it would be used in the above scenario.

The option to suggest chunks deos not seem applicable either.

  Changed 4 years ago by anonymous

hmm, I got a torrent where there was one peer sending data much faster than the other overall traffic, and the behaviour is the same. rtorrent pretty much downloads sequentially from that single peer, the rest of the traffic can be seen as only noise.

I am not sure why the rarest first algorithm does not work - perhaps everybody has all the chunks that can be obtained fast.

  Changed 3 years ago by chalsall@chalsall.com

Here's another issue...

A very slow seeder. Multiple libtorrent downloaders. Downloader A catches up to what another downloader (B) has completed, and quickly requests and received blocks from B. From that point on, A and B both request the same block from the seeder going forward, slowing down the entire process.

I've observed this empirically just now, downloading the Fedora 10 Snap-2 Source CD.

Would in not be reasonable to have a configuration option to randomize requests?

  Changed 3 years ago by anonymous

It would be good to at least randomize the chunk every once in a while... like for every 10th chunk or so.

follow-up: ↓ 10   Changed 3 years ago by ray.voelker@gmail.com

I was wondering if there might be a way to actually FORCE a sequential download? I'm working on a disk image type of project where I'm looking to use the downloaded file to image a drive. If I could get the image file to download, and continue to download in a sequential fashion, I could then open the file up on standard out, start decompressing it, and use that output to image the drive.

Coincidently, sequential downloading would be an awesome thing for "streaming media", and watching video as you're downloading the rest in the background.

in reply to: ↑ 9 ; follow-up: ↓ 11   Changed 3 years ago by anonymous

A while ago I made a patch to fix the linear requesting. It randomizes the chunk randomly every 1-31 chunks.

 http://ovh.ttdpatch.net/~jdrexler/rt/fix-linear-requesting.diff

Replying to ray.voelker@gmail.com:

I was wondering if there might be a way to actually FORCE a sequential download?

Well the peers will normally have a random chunk selection and you can only download those. Downloading a torrent sequentially also greatly reduces the swarm efficiency and would get a client shunned very quickly. While there is nothing that stops you from hacking a torrent client to do this, it can obviously never be a standard behaviour. If everybody in the swarm did that, all peers would eventually be requesting the same piece from all seeds because they arrive at the same completion after sharing what pieces they have with each other.

in reply to: ↑ 10   Changed 3 years ago by ray.voelker@gmail.com

How would one go about hacking the torrent client to do this? : )

The reason I'm asking is because I'm actually using the rtorrent client to transfer computer disk images (large 15 gig compressed disk images) in a closed computer lab environment.

My process basically entails getting the compressed disk image file with rtorrent, saving it on a partition toward the end of the disk. Once it's done downloading, it begins to "image" the file to the start of the disk once I see the .finished symlink in the torrent watch directory. It's working great for me (I see virtually no slowdown during the download process when I'm doing this with 20+ PCs), but it does take some time away from the process to have to wait for the entire 13 gig file to transfer before I can use it to image the start of the drive. I had previously used wget from a single standalone server to "fetch" output and feed it to STDOUT, decompress the image file on the fly, and use that output to send the image to the start of the drive. Obviously the more clients I added to the equation, the slower things became. I was almost hoping to use rtorrent as a drop in replacement for wget in this case. Of course that's not in the cards since I do need a place to "store" the large image file while it's swarming.

Perhaps I should just leave this one well enough alone, as I'm sure disk i/o will crawl to a standstill (if not rip the drive to shreds) if I'm doing an active sequential torrent, decompressing it, and imaging to the front of the drive all at once. I suppose waiting for the entire file to complete and hash check gives me some piece of mind that I have all the correct data as well.

Thanks for responding to the first post. I definitely see why one would want to avoid sequential downloading in a "real world" environment.

follow-up: ↓ 13   Changed 3 years ago by anonymous

Chunks are hash checked as they're being downloaded, so once a chunk is marked as done the data integrity has been verified.

I don't quite understand your setup, but there's only one original seed and you control it, right? In that case maybe all you need to do is turn on initial seeding there and it will offer the pieces sequentially.

in reply to: ↑ 12   Changed 3 years ago by ray.voelker@gmail.com

Replying to anonymous:

Chunks are hash checked as they're being downloaded, so once a chunk is marked as done the data integrity has been verified. I don't quite understand your setup, but there's only one original seed and you control it, right? In that case maybe all you need to do is turn on initial seeding there and it will offer the pieces sequentially.

I had looked into initial seeding previously, but I was under the impression that initial seeding only happened until one of the clients had 100% of the torrent, and then quit. I wanted to keep it seeding indefinitely.

Yea, basically I have a couple of rtorrent clients running on two separate servers seeding the disk image file. When I fire up my image CD(live linux cd with rtorrent client and some bash scripts) it connects to the two clients that are seeding and grabs from there.

I guess what you're saying is that if I only had one and only one client seeding the file, then I would see sequential download?

Seems as though if I can't guarantee a sequential download, then I can't guarantee that I can open the file up for read until I have 100% completion.

I was just looking to see if I could streamline my process a bit. Gotta say though, that this method of imaging sure beats my previous multi client / 1 server model I was using. I see about 11 MB/s downloads on every client I bring online to image using rtorrent.

  Changed 3 years ago by anonymous

Yes, initial seeding stops after each chunk has been downloaded once. And if you have multiple downloaders, they would get alternating pieces, not all pieces from the beginning.

  Changed 16 months ago by anonymous

another scenario is when i download a torrent (almost) everybody else already has (>100 peers, exactly 3 with <100% right now). would there be a problem with sequential downloads in this situation?

maybe there is some sensible ratio, if >X% of all (connected) peers are seeding, do a sequential download.

(my understanding is it tries to connect to other leechers and looks what they have first, before bothering seeders)

Note: See TracTickets for help on using tickets.