Urllib2 download very slow compared to browser






















Upload The 3 Channels, Upload Cheers, Drew. Oh, i'm sorry. I said that, but what i mean is that i downloaded on Chrome, and later, i download on Edge Dev, separately, with only one Browser running at a time. Besides that, i tested a download in another computer in the same network. And for my surprise, Chrome and Edge and Edge Dev are with the same download speed, which is ok. But why in my main computer this doesn't happens? This is my result when using edge beta.

Well, i download the Edge Canary and so far is good, but the problem with the speed limit keeps the same. I also tried what you said about reseting the Edge settings before. Products 74 Special Topics 42 Video Hub Most Active Hubs Microsoft Teams. Security, Compliance and Identity. Microsoft Edge Insider. Azure Databases.

Project Bonsai. Education Sector. Microsoft Localization. Microsoft PnP. I've evaluated curl and my web browser, which both retrieve the response quickly. The text was updated successfully, but these errors were encountered:. Just to remove influence from importing the modules, could you repeat the tests measuring the time, after the import statements? Sorry, something went wrong. HelioGuilherme66 see below:. To see if the problem replicates on your system, you can see if the following is slow or fast:.

Here are the response headers:. The problem is the Content-Type header. You are not allowed to have a space between the header name value and the colon: the specification forbids it. Python 3 hits a parsing problem on this, and so only sees the headers before that one:. Because we only see those headers, we don't know what the content-length is, so we have to wait for the TCP FIN to work out when the end of the body is.

That's why we're taking so long. Also, I'm still wondering why twitter would be so much slower than all other websites I've previously scraped with urllib. AlexisEggermont Try using urllib2 which is an updated version of urllib. See stackoverflow. I don't agree with this, I've been comparing Python's urllib get request to other languages, java, vba, c , node.

Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. The Overflow Blog. Podcast Who is building clouds for the independent developer? Your initial report is very good and much appreciated, we've just been busy and this doesn't jump out as urgent.

I think calling gethostbyaddr is intentional here and goes back to the Python 1. It first calls setipaddr, which calls getaddrinfo to get the IP address. For "docs.

Thank you very much for your answer. I'll try to test this on python 3. I suppose that IE doesn't make a reverse lookup for each request. IE doesn't even make a forward dns lookup for the hosname given to check whether it should bypass proxy, whereas urllib does.

There's no reason for any other software to take its settings into account. That said, it would be great if urllib can avoid adding long delays, at least more than once. I'm personally not sure how best to do that though. Imagine that someone wants that requests to "ovinnik. I don't know what for, it's just an assumption.

He adds a hostname "ovinnik. He sees that requests to "ovinnik. And it's ok. But suddenly he discovers that requests in urllib to "ubuntu. I think this behavior of urllib should be at least optional. I set the default value to False, because I think it's better when the behaviour of urllib corresponds to IE rather than previous versions of urllib. This will affect only NT-systems.



0コメント

  • 1000 / 1000