Fastest way to pop N items from a large dict2019 Community Moderator ElectionWhat is the fastest way to get the value of π?Fastest way to convert string to integer in PHPFastest way to determine if an integer's square root is an integerHow to randomly select an item from a list?How to remove items from a list while iterating?Fastest way to list all primes below NHow to remove the first Item from a list?Fastest way to check if a value exist in a listIs there any pythonic way to combine two dicts (adding values for keys that appear in both)?Fastest way to determine if an integer is between two integers (inclusive) with known sets of values
Are relativity and doppler effect related?
New passport but visa is in old (lost) passport
How could a scammer know the apps on my phone / iTunes account?
How could an airship be repaired midflight?
Why does a Star of David appear at a rally with Francisco Franco?
Is there a hypothetical scenario that would make Earth uninhabitable for humans, but not for (the majority of) other animals?
How to pronounce "I ♥ Huckabees"?
Why is the President allowed to veto a cancellation of emergency powers?
Print a physical multiplication table
As a new Ubuntu desktop 18.04 LTS user, do I need to use ufw for a firewall or is iptables sufficient?
What is the significance behind "40 days" that often appears in the Bible?
How do I hide Chekhov's Gun?
What exactly is this small puffer fish doing and how did it manage to accomplish such a feat?
This word with a lot of past tenses
Employee lack of ownership
How to deal with taxi scam when on vacation?
What's the meaning of a knight fighting a snail in medieval book illustrations?
Why one should not leave fingerprints on bulbs and plugs?
Unable to evaluate Eigenvalues and Eigenvectors for a matrix (2)
Why did it take so long to abandon sail after steamships were demonstrated?
A diagram about partial derivatives of f(x,y)
Can I use USB data pins as power source
et qui - how do you really understand that kind of phraseology?
How to terminate ping <dest> &
Fastest way to pop N items from a large dict
2019 Community Moderator ElectionWhat is the fastest way to get the value of π?Fastest way to convert string to integer in PHPFastest way to determine if an integer's square root is an integerHow to randomly select an item from a list?How to remove items from a list while iterating?Fastest way to list all primes below NHow to remove the first Item from a list?Fastest way to check if a value exist in a listIs there any pythonic way to combine two dicts (adding values for keys that appear in both)?Fastest way to determine if an integer is between two integers (inclusive) with known sets of values
I have a large dict src
(up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst
and leave only the remaining items in src
. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.
Fastest approach I've found so far:
src = i: i ** 3 for i in range(1000000)
# Taking items 1 by 1 (~0.0059s)
dst =
while len(dst) < 20000:
item = src.popitem()
dst[item[0]] = item[1]
Is there anything better? Even a marginal gain would be good.
python python-3.x performance dictionary optimization
add a comment |
I have a large dict src
(up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst
and leave only the remaining items in src
. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.
Fastest approach I've found so far:
src = i: i ** 3 for i in range(1000000)
# Taking items 1 by 1 (~0.0059s)
dst =
while len(dst) < 20000:
item = src.popitem()
dst[item[0]] = item[1]
Is there anything better? Even a marginal gain would be good.
python python-3.x performance dictionary optimization
add a comment |
I have a large dict src
(up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst
and leave only the remaining items in src
. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.
Fastest approach I've found so far:
src = i: i ** 3 for i in range(1000000)
# Taking items 1 by 1 (~0.0059s)
dst =
while len(dst) < 20000:
item = src.popitem()
dst[item[0]] = item[1]
Is there anything better? Even a marginal gain would be good.
python python-3.x performance dictionary optimization
I have a large dict src
(up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst
and leave only the remaining items in src
. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.
Fastest approach I've found so far:
src = i: i ** 3 for i in range(1000000)
# Taking items 1 by 1 (~0.0059s)
dst =
while len(dst) < 20000:
item = src.popitem()
dst[item[0]] = item[1]
Is there anything better? Even a marginal gain would be good.
python python-3.x performance dictionary optimization
python python-3.x performance dictionary optimization
edited 8 hours ago
martineau
69.1k1092186
69.1k1092186
asked 8 hours ago
Ivailo KaramanolevIvailo Karamanolev
540615
540615
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
A simple comprehension inside dict
will do:
dict(src.popitem() for _ in range(20000))
Here you have the timing tests
setup = """
src = i: i ** 3 for i in range(1000000)
def method_1(d):
dst =
while len(dst) < 20000:
item = d.popitem()
dst[item[0]] = item[1]
return dst
def method_2(d):
return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))
print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))
Results:
Method 1: 0.007701821999944514
Method 2: 0.004668198998842854
3
so much for dict comprehension. Good one usingdict
!!!
– Jean-François Fabre
8 hours ago
1
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
8 hours ago
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre
8 hours ago
3
You can shave a little more time off by saving the bound method first.f = d.popitem; return dict(f() for _ in range(20000))
.
– chepner
8 hours ago
2
Usingitertools.islice
anditertools.repeat
is even a little faster still:dict(f() for f in islice(repeat(d.popitem), 20000))
.
– chepner
7 hours ago
|
show 5 more comments
I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range
that yields & unpacks the keys & values
dst = key:value for key,value in (src.popitem() for _ in range(20000))
on my machine:
your code: 0.00899505615234375
my code: 0.007996797561645508
so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer
This approach can be useful if you want to transform the keys or values in the process.
add a comment |
This seems a bit faster still, though I'm not sure why:
from itertools import islice
def method_4(d):
result = dict(islice(d.items(), 20000))
for k in result: del d[k]
return result
Compared to other versions, using Netwave's testcase:
Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer
It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55199303%2ffastest-way-to-pop-n-items-from-a-large-dict%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
A simple comprehension inside dict
will do:
dict(src.popitem() for _ in range(20000))
Here you have the timing tests
setup = """
src = i: i ** 3 for i in range(1000000)
def method_1(d):
dst =
while len(dst) < 20000:
item = d.popitem()
dst[item[0]] = item[1]
return dst
def method_2(d):
return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))
print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))
Results:
Method 1: 0.007701821999944514
Method 2: 0.004668198998842854
3
so much for dict comprehension. Good one usingdict
!!!
– Jean-François Fabre
8 hours ago
1
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
8 hours ago
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre
8 hours ago
3
You can shave a little more time off by saving the bound method first.f = d.popitem; return dict(f() for _ in range(20000))
.
– chepner
8 hours ago
2
Usingitertools.islice
anditertools.repeat
is even a little faster still:dict(f() for f in islice(repeat(d.popitem), 20000))
.
– chepner
7 hours ago
|
show 5 more comments
A simple comprehension inside dict
will do:
dict(src.popitem() for _ in range(20000))
Here you have the timing tests
setup = """
src = i: i ** 3 for i in range(1000000)
def method_1(d):
dst =
while len(dst) < 20000:
item = d.popitem()
dst[item[0]] = item[1]
return dst
def method_2(d):
return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))
print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))
Results:
Method 1: 0.007701821999944514
Method 2: 0.004668198998842854
3
so much for dict comprehension. Good one usingdict
!!!
– Jean-François Fabre
8 hours ago
1
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
8 hours ago
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre
8 hours ago
3
You can shave a little more time off by saving the bound method first.f = d.popitem; return dict(f() for _ in range(20000))
.
– chepner
8 hours ago
2
Usingitertools.islice
anditertools.repeat
is even a little faster still:dict(f() for f in islice(repeat(d.popitem), 20000))
.
– chepner
7 hours ago
|
show 5 more comments
A simple comprehension inside dict
will do:
dict(src.popitem() for _ in range(20000))
Here you have the timing tests
setup = """
src = i: i ** 3 for i in range(1000000)
def method_1(d):
dst =
while len(dst) < 20000:
item = d.popitem()
dst[item[0]] = item[1]
return dst
def method_2(d):
return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))
print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))
Results:
Method 1: 0.007701821999944514
Method 2: 0.004668198998842854
A simple comprehension inside dict
will do:
dict(src.popitem() for _ in range(20000))
Here you have the timing tests
setup = """
src = i: i ** 3 for i in range(1000000)
def method_1(d):
dst =
while len(dst) < 20000:
item = d.popitem()
dst[item[0]] = item[1]
return dst
def method_2(d):
return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))
print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))
Results:
Method 1: 0.007701821999944514
Method 2: 0.004668198998842854
answered 8 hours ago
NetwaveNetwave
13.1k22246
13.1k22246
3
so much for dict comprehension. Good one usingdict
!!!
– Jean-François Fabre
8 hours ago
1
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
8 hours ago
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre
8 hours ago
3
You can shave a little more time off by saving the bound method first.f = d.popitem; return dict(f() for _ in range(20000))
.
– chepner
8 hours ago
2
Usingitertools.islice
anditertools.repeat
is even a little faster still:dict(f() for f in islice(repeat(d.popitem), 20000))
.
– chepner
7 hours ago
|
show 5 more comments
3
so much for dict comprehension. Good one usingdict
!!!
– Jean-François Fabre
8 hours ago
1
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
8 hours ago
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre
8 hours ago
3
You can shave a little more time off by saving the bound method first.f = d.popitem; return dict(f() for _ in range(20000))
.
– chepner
8 hours ago
2
Usingitertools.islice
anditertools.repeat
is even a little faster still:dict(f() for f in islice(repeat(d.popitem), 20000))
.
– chepner
7 hours ago
3
3
so much for dict comprehension. Good one using
dict
!!!– Jean-François Fabre
8 hours ago
so much for dict comprehension. Good one using
dict
!!!– Jean-François Fabre
8 hours ago
1
1
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
8 hours ago
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
8 hours ago
1
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre
8 hours ago
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre
8 hours ago
3
3
You can shave a little more time off by saving the bound method first.
f = d.popitem; return dict(f() for _ in range(20000))
.– chepner
8 hours ago
You can shave a little more time off by saving the bound method first.
f = d.popitem; return dict(f() for _ in range(20000))
.– chepner
8 hours ago
2
2
Using
itertools.islice
and itertools.repeat
is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000))
.– chepner
7 hours ago
Using
itertools.islice
and itertools.repeat
is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000))
.– chepner
7 hours ago
|
show 5 more comments
I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range
that yields & unpacks the keys & values
dst = key:value for key,value in (src.popitem() for _ in range(20000))
on my machine:
your code: 0.00899505615234375
my code: 0.007996797561645508
so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer
This approach can be useful if you want to transform the keys or values in the process.
add a comment |
I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range
that yields & unpacks the keys & values
dst = key:value for key,value in (src.popitem() for _ in range(20000))
on my machine:
your code: 0.00899505615234375
my code: 0.007996797561645508
so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer
This approach can be useful if you want to transform the keys or values in the process.
add a comment |
I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range
that yields & unpacks the keys & values
dst = key:value for key,value in (src.popitem() for _ in range(20000))
on my machine:
your code: 0.00899505615234375
my code: 0.007996797561645508
so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer
This approach can be useful if you want to transform the keys or values in the process.
I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range
that yields & unpacks the keys & values
dst = key:value for key,value in (src.popitem() for _ in range(20000))
on my machine:
your code: 0.00899505615234375
my code: 0.007996797561645508
so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer
This approach can be useful if you want to transform the keys or values in the process.
edited 8 hours ago
answered 8 hours ago
Jean-François FabreJean-François Fabre
106k957115
106k957115
add a comment |
add a comment |
This seems a bit faster still, though I'm not sure why:
from itertools import islice
def method_4(d):
result = dict(islice(d.items(), 20000))
for k in result: del d[k]
return result
Compared to other versions, using Netwave's testcase:
Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer
It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.
add a comment |
This seems a bit faster still, though I'm not sure why:
from itertools import islice
def method_4(d):
result = dict(islice(d.items(), 20000))
for k in result: del d[k]
return result
Compared to other versions, using Netwave's testcase:
Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer
It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.
add a comment |
This seems a bit faster still, though I'm not sure why:
from itertools import islice
def method_4(d):
result = dict(islice(d.items(), 20000))
for k in result: del d[k]
return result
Compared to other versions, using Netwave's testcase:
Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer
It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.
This seems a bit faster still, though I'm not sure why:
from itertools import islice
def method_4(d):
result = dict(islice(d.items(), 20000))
for k in result: del d[k]
return result
Compared to other versions, using Netwave's testcase:
Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer
It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.
answered 4 hours ago
jpajpa
5,3831226
5,3831226
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55199303%2ffastest-way-to-pop-n-items-from-a-large-dict%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown