Fastest way to pop N items from a large dict2019 Community Moderator ElectionWhat is the fastest way to get the value of π?Fastest way to convert string to integer in PHPFastest way to determine if an integer's square root is an integerHow to randomly select an item from a list?How to remove items from a list while iterating?Fastest way to list all primes below NHow to remove the first Item from a list?Fastest way to check if a value exist in a listIs there any pythonic way to combine two dicts (adding values for keys that appear in both)?Fastest way to determine if an integer is between two integers (inclusive) with known sets of values

Are relativity and doppler effect related?

New passport but visa is in old (lost) passport

How could a scammer know the apps on my phone / iTunes account?

How could an airship be repaired midflight?

Why does a Star of David appear at a rally with Francisco Franco?

Is there a hypothetical scenario that would make Earth uninhabitable for humans, but not for (the majority of) other animals?

How to pronounce "I ♥ Huckabees"?

Why is the President allowed to veto a cancellation of emergency powers?

Print a physical multiplication table

As a new Ubuntu desktop 18.04 LTS user, do I need to use ufw for a firewall or is iptables sufficient?

What is the significance behind "40 days" that often appears in the Bible?

How do I hide Chekhov's Gun?

What exactly is this small puffer fish doing and how did it manage to accomplish such a feat?

This word with a lot of past tenses

Employee lack of ownership

How to deal with taxi scam when on vacation?

What's the meaning of a knight fighting a snail in medieval book illustrations?

Why one should not leave fingerprints on bulbs and plugs?

Unable to evaluate Eigenvalues and Eigenvectors for a matrix (2)

Why did it take so long to abandon sail after steamships were demonstrated?

A diagram about partial derivatives of f(x,y)

Can I use USB data pins as power source

et qui - how do you really understand that kind of phraseology?

How to terminate ping <dest> &

Fastest way to pop N items from a large dict

2019 Community Moderator ElectionWhat is the fastest way to get the value of π?Fastest way to convert string to integer in PHPFastest way to determine if an integer's square root is an integerHow to randomly select an item from a list?How to remove items from a list while iterating?Fastest way to list all primes below NHow to remove the first Item from a list?Fastest way to check if a value exist in a listIs there any pythonic way to combine two dicts (adding values for keys that appear in both)?Fastest way to determine if an integer is between two integers (inclusive) with known sets of values

I have a large dict src (up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst and leave only the remaining items in src. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.

Fastest approach I've found so far:

src = i: i ** 3 for i in range(1000000)

# Taking items 1 by 1 (~0.0059s)
dst = 
while len(dst) < 20000:
 item = src.popitem()
 dst[item[0]] = item[1]

Is there anything better? Even a marginal gain would be good.

edited 8 hours ago

martineau

69.1k1092186

asked 8 hours ago

Ivailo Karamanolev

540615

add a comment |

Fastest approach I've found so far:

src = i: i ** 3 for i in range(1000000)

# Taking items 1 by 1 (~0.0059s)
dst = 
while len(dst) < 20000:
 item = src.popitem()
 dst[item[0]] = item[1]

Is there anything better? Even a marginal gain would be good.

edited 8 hours ago

martineau

69.1k1092186

asked 8 hours ago

Ivailo Karamanolev

540615

add a comment |

Fastest approach I've found so far:

src = i: i ** 3 for i in range(1000000)

# Taking items 1 by 1 (~0.0059s)
dst = 
while len(dst) < 20000:
 item = src.popitem()
 dst[item[0]] = item[1]

Is there anything better? Even a marginal gain would be good.

edited 8 hours ago

martineau

69.1k1092186

asked 8 hours ago

Ivailo Karamanolev

540615

Fastest approach I've found so far:

src = i: i ** 3 for i in range(1000000)

# Taking items 1 by 1 (~0.0059s)
dst = 
while len(dst) < 20000:
 item = src.popitem()
 dst[item[0]] = item[1]

Is there anything better? Even a marginal gain would be good.

python python-3.x performance dictionary optimization

edited 8 hours ago

martineau

69.1k1092186

asked 8 hours ago

Ivailo Karamanolev

540615

edited 8 hours ago

martineau

69.1k1092186

asked 8 hours ago

Ivailo Karamanolev

540615

edited 8 hours ago

martineau

69.1k1092186

edited 8 hours ago

martineau

69.1k1092186

edited 8 hours ago

martineau

69.1k1092186

asked 8 hours ago

Ivailo Karamanolev

540615

asked 8 hours ago

Ivailo Karamanolev

540615

asked 8 hours ago

Ivailo Karamanolev

540615

add a comment |

3 Answers
3

active

oldest

votes

A simple comprehension inside dict will do:

dict(src.popitem() for _ in range(20000))

Here you have the timing tests

setup = """
src = i: i ** 3 for i in range(1000000)

def method_1(d):
 dst = 
 while len(dst) < 20000:
 item = d.popitem()
 dst[item[0]] = item[1]
 return dst

def method_2(d):
 return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))

Results:

Method 1: 0.007701821999944514
Method 2: 0.004668198998842854

answered 8 hours ago

Netwave

13.1k22246

3

so much for dict comprehension. Good one using dict!!!

– Jean-François Fabre
8 hours ago

1

Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

– Netwave
8 hours ago

1

for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

– Jean-François Fabre
8 hours ago

3

You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

– chepner
8 hours ago

2

Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

– chepner
7 hours ago

|
show 5 more comments

I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values

dst = key:value for key,value in (src.popitem() for _ in range(20000))

on my machine:

your code: 0.00899505615234375
my code: 0.007996797561645508

so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer

This approach can be useful if you want to transform the keys or values in the process.

edited 8 hours ago

answered 8 hours ago

Jean-François Fabre

106k957115

add a comment |

This seems a bit faster still, though I'm not sure why:

from itertools import islice
def method_4(d):
 result = dict(islice(d.items(), 20000))
 for k in result: del d[k]
 return result

Compared to other versions, using Netwave's testcase:

Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer

It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.

answered 4 hours ago

jpa

5,3831226

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55199303%2ffastest-way-to-pop-n-items-from-a-large-dict%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

A simple comprehension inside dict will do:

dict(src.popitem() for _ in range(20000))

Here you have the timing tests

setup = """
src = i: i ** 3 for i in range(1000000)

def method_1(d):
 dst = 
 while len(dst) < 20000:
 item = d.popitem()
 dst[item[0]] = item[1]
 return dst

def method_2(d):
 return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))

Results:

Method 1: 0.007701821999944514
Method 2: 0.004668198998842854

answered 8 hours ago

Netwave

13.1k22246

3

so much for dict comprehension. Good one using dict!!!

– Jean-François Fabre
8 hours ago

1

Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

– Netwave
8 hours ago

1

for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

– Jean-François Fabre
8 hours ago

3

You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

– chepner
8 hours ago

2

Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

– chepner
7 hours ago

|
show 5 more comments

A simple comprehension inside dict will do:

dict(src.popitem() for _ in range(20000))

Here you have the timing tests

setup = """
src = i: i ** 3 for i in range(1000000)

def method_1(d):
 dst = 
 while len(dst) < 20000:
 item = d.popitem()
 dst[item[0]] = item[1]
 return dst

def method_2(d):
 return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))

Results:

Method 1: 0.007701821999944514
Method 2: 0.004668198998842854

answered 8 hours ago

Netwave

13.1k22246

3

so much for dict comprehension. Good one using dict!!!

– Jean-François Fabre
8 hours ago

1

Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

– Netwave
8 hours ago

1

for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

– Jean-François Fabre
8 hours ago

3

You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

– chepner
8 hours ago

2

Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

– chepner
7 hours ago

|
show 5 more comments

A simple comprehension inside dict will do:

dict(src.popitem() for _ in range(20000))

Here you have the timing tests

setup = """
src = i: i ** 3 for i in range(1000000)

def method_1(d):
 dst = 
 while len(dst) < 20000:
 item = d.popitem()
 dst[item[0]] = item[1]
 return dst

def method_2(d):
 return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))

Results:

Method 1: 0.007701821999944514
Method 2: 0.004668198998842854

answered 8 hours ago

Netwave

13.1k22246

A simple comprehension inside dict will do:

dict(src.popitem() for _ in range(20000))

Here you have the timing tests

setup = """
src = i: i ** 3 for i in range(1000000)

def method_1(d):
 dst = 
 while len(dst) < 20000:
 item = d.popitem()
 dst[item[0]] = item[1]
 return dst

def method_2(d):
 return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))

Results:

Method 1: 0.007701821999944514
Method 2: 0.004668198998842854

answered 8 hours ago

Netwave

13.1k22246

answered 8 hours ago

Netwave

13.1k22246

answered 8 hours ago

Netwave

13.1k22246

answered 8 hours ago

Netwave

13.1k22246

3

so much for dict comprehension. Good one using dict!!!

– Jean-François Fabre
8 hours ago

1

Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

– Netwave
8 hours ago

1

for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

– Jean-François Fabre
8 hours ago

3

You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

– chepner
8 hours ago

2

Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

– chepner
7 hours ago

|
show 5 more comments

3

so much for dict comprehension. Good one using dict!!!

– Jean-François Fabre
8 hours ago

1

Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

– Netwave
8 hours ago

1

for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

– Jean-François Fabre
8 hours ago

3

You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

– chepner
8 hours ago

2

Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

– chepner
7 hours ago

so much for dict comprehension. Good one using dict!!!

– Jean-François Fabre
8 hours ago

Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

– Netwave
8 hours ago

for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

– Jean-François Fabre
8 hours ago

You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

– chepner
8 hours ago

Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

– chepner
7 hours ago

|
show 5 more comments

I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values

dst = key:value for key,value in (src.popitem() for _ in range(20000))

on my machine:

your code: 0.00899505615234375
my code: 0.007996797561645508

so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer

This approach can be useful if you want to transform the keys or values in the process.

edited 8 hours ago

answered 8 hours ago

Jean-François Fabre

106k957115

add a comment |

I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values

dst = key:value for key,value in (src.popitem() for _ in range(20000))

on my machine:

your code: 0.00899505615234375
my code: 0.007996797561645508

so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer

This approach can be useful if you want to transform the keys or values in the process.

edited 8 hours ago

answered 8 hours ago

Jean-François Fabre

106k957115

add a comment |

I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values

dst = key:value for key,value in (src.popitem() for _ in range(20000))

on my machine:

your code: 0.00899505615234375
my code: 0.007996797561645508

so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer

This approach can be useful if you want to transform the keys or values in the process.

edited 8 hours ago

answered 8 hours ago

Jean-François Fabre

106k957115

I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values

dst = key:value for key,value in (src.popitem() for _ in range(20000))

on my machine:

your code: 0.00899505615234375
my code: 0.007996797561645508

so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer

This approach can be useful if you want to transform the keys or values in the process.

edited 8 hours ago

answered 8 hours ago

Jean-François Fabre

106k957115

edited 8 hours ago

answered 8 hours ago

Jean-François Fabre

106k957115

answered 8 hours ago

Jean-François Fabre

106k957115

answered 8 hours ago

Jean-François Fabre

106k957115

add a comment |

This seems a bit faster still, though I'm not sure why:

from itertools import islice
def method_4(d):
 result = dict(islice(d.items(), 20000))
 for k in result: del d[k]
 return result

Compared to other versions, using Netwave's testcase:

Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer

It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.

answered 4 hours ago

jpa

5,3831226

add a comment |

This seems a bit faster still, though I'm not sure why:

from itertools import islice
def method_4(d):
 result = dict(islice(d.items(), 20000))
 for k in result: del d[k]
 return result

Compared to other versions, using Netwave's testcase:

Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer

It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.

answered 4 hours ago

jpa

5,3831226

add a comment |

This seems a bit faster still, though I'm not sure why:

from itertools import islice
def method_4(d):
 result = dict(islice(d.items(), 20000))
 for k in result: del d[k]
 return result

Compared to other versions, using Netwave's testcase:

Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer

It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.

answered 4 hours ago

jpa

5,3831226

This seems a bit faster still, though I'm not sure why:

from itertools import islice
def method_4(d):
 result = dict(islice(d.items(), 20000))
 for k in result: del d[k]
 return result

Compared to other versions, using Netwave's testcase:

Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer

It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.

answered 4 hours ago

jpa

5,3831226

answered 4 hours ago

jpa

5,3831226

answered 4 hours ago

jpa

5,3831226

answered 4 hours ago

jpa

5,3831226

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Arsthbt

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

3 Answers
3

3 Answers
3

3 Answers
3