Fastest way to pop N items from a large dict2019 Community Moderator ElectionWhat is the fastest way to get the value of π?Fastest way to convert string to integer in PHPFastest way to determine if an integer's square root is an integerHow to randomly select an item from a list?How to remove items from a list while iterating?Fastest way to list all primes below NHow to remove the first Item from a list?Fastest way to check if a value exist in a listIs there any pythonic way to combine two dicts (adding values for keys that appear in both)?Fastest way to determine if an integer is between two integers (inclusive) with known sets of values

Are relativity and doppler effect related?

New passport but visa is in old (lost) passport

How could a scammer know the apps on my phone / iTunes account?

How could an airship be repaired midflight?

Why does a Star of David appear at a rally with Francisco Franco?

Is there a hypothetical scenario that would make Earth uninhabitable for humans, but not for (the majority of) other animals?

How to pronounce "I ♥ Huckabees"?

Why is the President allowed to veto a cancellation of emergency powers?

Print a physical multiplication table

As a new Ubuntu desktop 18.04 LTS user, do I need to use ufw for a firewall or is iptables sufficient?

What is the significance behind "40 days" that often appears in the Bible?

How do I hide Chekhov's Gun?

What exactly is this small puffer fish doing and how did it manage to accomplish such a feat?

This word with a lot of past tenses

Employee lack of ownership

How to deal with taxi scam when on vacation?

What's the meaning of a knight fighting a snail in medieval book illustrations?

Why one should not leave fingerprints on bulbs and plugs?

Unable to evaluate Eigenvalues and Eigenvectors for a matrix (2)

Why did it take so long to abandon sail after steamships were demonstrated?

A diagram about partial derivatives of f(x,y)

Can I use USB data pins as power source

et qui - how do you really understand that kind of phraseology?

How to terminate ping <dest> &



Fastest way to pop N items from a large dict



2019 Community Moderator ElectionWhat is the fastest way to get the value of π?Fastest way to convert string to integer in PHPFastest way to determine if an integer's square root is an integerHow to randomly select an item from a list?How to remove items from a list while iterating?Fastest way to list all primes below NHow to remove the first Item from a list?Fastest way to check if a value exist in a listIs there any pythonic way to combine two dicts (adding values for keys that appear in both)?Fastest way to determine if an integer is between two integers (inclusive) with known sets of values










10















I have a large dict src (up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst and leave only the remaining items in src. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.



Fastest approach I've found so far:



src = i: i ** 3 for i in range(1000000)

# Taking items 1 by 1 (~0.0059s)
dst =
while len(dst) < 20000:
item = src.popitem()
dst[item[0]] = item[1]


Is there anything better? Even a marginal gain would be good.










share|improve this question




























    10















    I have a large dict src (up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst and leave only the remaining items in src. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.



    Fastest approach I've found so far:



    src = i: i ** 3 for i in range(1000000)

    # Taking items 1 by 1 (~0.0059s)
    dst =
    while len(dst) < 20000:
    item = src.popitem()
    dst[item[0]] = item[1]


    Is there anything better? Even a marginal gain would be good.










    share|improve this question


























      10












      10








      10


      1






      I have a large dict src (up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst and leave only the remaining items in src. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.



      Fastest approach I've found so far:



      src = i: i ** 3 for i in range(1000000)

      # Taking items 1 by 1 (~0.0059s)
      dst =
      while len(dst) < 20000:
      item = src.popitem()
      dst[item[0]] = item[1]


      Is there anything better? Even a marginal gain would be good.










      share|improve this question
















      I have a large dict src (up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst and leave only the remaining items in src. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.



      Fastest approach I've found so far:



      src = i: i ** 3 for i in range(1000000)

      # Taking items 1 by 1 (~0.0059s)
      dst =
      while len(dst) < 20000:
      item = src.popitem()
      dst[item[0]] = item[1]


      Is there anything better? Even a marginal gain would be good.







      python python-3.x performance dictionary optimization






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 8 hours ago









      martineau

      69.1k1092186




      69.1k1092186










      asked 8 hours ago









      Ivailo KaramanolevIvailo Karamanolev

      540615




      540615






















          3 Answers
          3






          active

          oldest

          votes


















          12














          A simple comprehension inside dict will do:



          dict(src.popitem() for _ in range(20000))


          Here you have the timing tests



          setup = """
          src = i: i ** 3 for i in range(1000000)

          def method_1(d):
          dst =
          while len(dst) < 20000:
          item = d.popitem()
          dst[item[0]] = item[1]
          return dst

          def method_2(d):
          return dict(d.popitem() for _ in range(20000))
          """
          import timeit
          print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

          print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))


          Results:



          Method 1: 0.007701821999944514
          Method 2: 0.004668198998842854





          share|improve this answer


















          • 3





            so much for dict comprehension. Good one using dict!!!

            – Jean-François Fabre
            8 hours ago






          • 1





            Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

            – Netwave
            8 hours ago






          • 1





            for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

            – Jean-François Fabre
            8 hours ago







          • 3





            You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

            – chepner
            8 hours ago






          • 2





            Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

            – chepner
            7 hours ago


















          2














          I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values



          dst = key:value for key,value in (src.popitem() for _ in range(20000))


          on my machine:



          your code: 0.00899505615234375
          my code: 0.007996797561645508


          so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer



          This approach can be useful if you want to transform the keys or values in the process.






          share|improve this answer
































            1














            This seems a bit faster still, though I'm not sure why:



            from itertools import islice
            def method_4(d):
            result = dict(islice(d.items(), 20000))
            for k in result: del d[k]
            return result


            Compared to other versions, using Netwave's testcase:



            Method 1: 0.004459443036466837 # original
            Method 2: 0.0034434819826856256 # Netwave
            Method 3: 0.002602717955596745 # chepner
            Method 4: 0.001974945073015988 # this answer


            It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.






            share|improve this answer






















              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55199303%2ffastest-way-to-pop-n-items-from-a-large-dict%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              12














              A simple comprehension inside dict will do:



              dict(src.popitem() for _ in range(20000))


              Here you have the timing tests



              setup = """
              src = i: i ** 3 for i in range(1000000)

              def method_1(d):
              dst =
              while len(dst) < 20000:
              item = d.popitem()
              dst[item[0]] = item[1]
              return dst

              def method_2(d):
              return dict(d.popitem() for _ in range(20000))
              """
              import timeit
              print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

              print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))


              Results:



              Method 1: 0.007701821999944514
              Method 2: 0.004668198998842854





              share|improve this answer


















              • 3





                so much for dict comprehension. Good one using dict!!!

                – Jean-François Fabre
                8 hours ago






              • 1





                Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

                – Netwave
                8 hours ago






              • 1





                for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

                – Jean-François Fabre
                8 hours ago







              • 3





                You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

                – chepner
                8 hours ago






              • 2





                Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

                – chepner
                7 hours ago















              12














              A simple comprehension inside dict will do:



              dict(src.popitem() for _ in range(20000))


              Here you have the timing tests



              setup = """
              src = i: i ** 3 for i in range(1000000)

              def method_1(d):
              dst =
              while len(dst) < 20000:
              item = d.popitem()
              dst[item[0]] = item[1]
              return dst

              def method_2(d):
              return dict(d.popitem() for _ in range(20000))
              """
              import timeit
              print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

              print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))


              Results:



              Method 1: 0.007701821999944514
              Method 2: 0.004668198998842854





              share|improve this answer


















              • 3





                so much for dict comprehension. Good one using dict!!!

                – Jean-François Fabre
                8 hours ago






              • 1





                Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

                – Netwave
                8 hours ago






              • 1





                for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

                – Jean-François Fabre
                8 hours ago







              • 3





                You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

                – chepner
                8 hours ago






              • 2





                Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

                – chepner
                7 hours ago













              12












              12








              12







              A simple comprehension inside dict will do:



              dict(src.popitem() for _ in range(20000))


              Here you have the timing tests



              setup = """
              src = i: i ** 3 for i in range(1000000)

              def method_1(d):
              dst =
              while len(dst) < 20000:
              item = d.popitem()
              dst[item[0]] = item[1]
              return dst

              def method_2(d):
              return dict(d.popitem() for _ in range(20000))
              """
              import timeit
              print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

              print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))


              Results:



              Method 1: 0.007701821999944514
              Method 2: 0.004668198998842854





              share|improve this answer













              A simple comprehension inside dict will do:



              dict(src.popitem() for _ in range(20000))


              Here you have the timing tests



              setup = """
              src = i: i ** 3 for i in range(1000000)

              def method_1(d):
              dst =
              while len(dst) < 20000:
              item = d.popitem()
              dst[item[0]] = item[1]
              return dst

              def method_2(d):
              return dict(d.popitem() for _ in range(20000))
              """
              import timeit
              print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

              print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))


              Results:



              Method 1: 0.007701821999944514
              Method 2: 0.004668198998842854






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered 8 hours ago









              NetwaveNetwave

              13.1k22246




              13.1k22246







              • 3





                so much for dict comprehension. Good one using dict!!!

                – Jean-François Fabre
                8 hours ago






              • 1





                Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

                – Netwave
                8 hours ago






              • 1





                for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

                – Jean-François Fabre
                8 hours ago







              • 3





                You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

                – chepner
                8 hours ago






              • 2





                Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

                – chepner
                7 hours ago












              • 3





                so much for dict comprehension. Good one using dict!!!

                – Jean-François Fabre
                8 hours ago






              • 1





                Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

                – Netwave
                8 hours ago






              • 1





                for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

                – Jean-François Fabre
                8 hours ago







              • 3





                You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

                – chepner
                8 hours ago






              • 2





                Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

                – chepner
                7 hours ago







              3




              3





              so much for dict comprehension. Good one using dict!!!

              – Jean-François Fabre
              8 hours ago





              so much for dict comprehension. Good one using dict!!!

              – Jean-François Fabre
              8 hours ago




              1




              1





              Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

              – Netwave
              8 hours ago





              Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

              – Netwave
              8 hours ago




              1




              1





              for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

              – Jean-François Fabre
              8 hours ago






              for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

              – Jean-François Fabre
              8 hours ago





              3




              3





              You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

              – chepner
              8 hours ago





              You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

              – chepner
              8 hours ago




              2




              2





              Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

              – chepner
              7 hours ago





              Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

              – chepner
              7 hours ago













              2














              I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values



              dst = key:value for key,value in (src.popitem() for _ in range(20000))


              on my machine:



              your code: 0.00899505615234375
              my code: 0.007996797561645508


              so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer



              This approach can be useful if you want to transform the keys or values in the process.






              share|improve this answer





























                2














                I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values



                dst = key:value for key,value in (src.popitem() for _ in range(20000))


                on my machine:



                your code: 0.00899505615234375
                my code: 0.007996797561645508


                so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer



                This approach can be useful if you want to transform the keys or values in the process.






                share|improve this answer



























                  2












                  2








                  2







                  I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values



                  dst = key:value for key,value in (src.popitem() for _ in range(20000))


                  on my machine:



                  your code: 0.00899505615234375
                  my code: 0.007996797561645508


                  so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer



                  This approach can be useful if you want to transform the keys or values in the process.






                  share|improve this answer















                  I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values



                  dst = key:value for key,value in (src.popitem() for _ in range(20000))


                  on my machine:



                  your code: 0.00899505615234375
                  my code: 0.007996797561645508


                  so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer



                  This approach can be useful if you want to transform the keys or values in the process.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 8 hours ago

























                  answered 8 hours ago









                  Jean-François FabreJean-François Fabre

                  106k957115




                  106k957115





















                      1














                      This seems a bit faster still, though I'm not sure why:



                      from itertools import islice
                      def method_4(d):
                      result = dict(islice(d.items(), 20000))
                      for k in result: del d[k]
                      return result


                      Compared to other versions, using Netwave's testcase:



                      Method 1: 0.004459443036466837 # original
                      Method 2: 0.0034434819826856256 # Netwave
                      Method 3: 0.002602717955596745 # chepner
                      Method 4: 0.001974945073015988 # this answer


                      It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.






                      share|improve this answer



























                        1














                        This seems a bit faster still, though I'm not sure why:



                        from itertools import islice
                        def method_4(d):
                        result = dict(islice(d.items(), 20000))
                        for k in result: del d[k]
                        return result


                        Compared to other versions, using Netwave's testcase:



                        Method 1: 0.004459443036466837 # original
                        Method 2: 0.0034434819826856256 # Netwave
                        Method 3: 0.002602717955596745 # chepner
                        Method 4: 0.001974945073015988 # this answer


                        It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.






                        share|improve this answer

























                          1












                          1








                          1







                          This seems a bit faster still, though I'm not sure why:



                          from itertools import islice
                          def method_4(d):
                          result = dict(islice(d.items(), 20000))
                          for k in result: del d[k]
                          return result


                          Compared to other versions, using Netwave's testcase:



                          Method 1: 0.004459443036466837 # original
                          Method 2: 0.0034434819826856256 # Netwave
                          Method 3: 0.002602717955596745 # chepner
                          Method 4: 0.001974945073015988 # this answer


                          It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.






                          share|improve this answer













                          This seems a bit faster still, though I'm not sure why:



                          from itertools import islice
                          def method_4(d):
                          result = dict(islice(d.items(), 20000))
                          for k in result: del d[k]
                          return result


                          Compared to other versions, using Netwave's testcase:



                          Method 1: 0.004459443036466837 # original
                          Method 2: 0.0034434819826856256 # Netwave
                          Method 3: 0.002602717955596745 # chepner
                          Method 4: 0.001974945073015988 # this answer


                          It's a bit baffling to me, as one could expect the extra iteration and lookup to be slower. Hopefully I haven't made some embarrassing mistake.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 4 hours ago









                          jpajpa

                          5,3831226




                          5,3831226



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55199303%2ffastest-way-to-pop-n-items-from-a-large-dict%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Oświęcim Innehåll Historia | Källor | Externa länkar | Navigeringsmeny50°2′18″N 19°13′17″Ö / 50.03833°N 19.22139°Ö / 50.03833; 19.2213950°2′18″N 19°13′17″Ö / 50.03833°N 19.22139°Ö / 50.03833; 19.221393089658Nordisk familjebok, AuschwitzInsidan tro och existensJewish Community i OświęcimAuschwitz Jewish Center: MuseumAuschwitz Jewish Center

                              Valle di Casies Indice Geografia fisica | Origini del nome | Storia | Società | Amministrazione | Sport | Note | Bibliografia | Voci correlate | Altri progetti | Collegamenti esterni | Menu di navigazione46°46′N 12°11′E / 46.766667°N 12.183333°E46.766667; 12.183333 (Valle di Casies)46°46′N 12°11′E / 46.766667°N 12.183333°E46.766667; 12.183333 (Valle di Casies)Sito istituzionaleAstat Censimento della popolazione 2011 - Determinazione della consistenza dei tre gruppi linguistici della Provincia Autonoma di Bolzano-Alto Adige - giugno 2012Numeri e fattiValle di CasiesDato IstatTabella dei gradi/giorno dei Comuni italiani raggruppati per Regione e Provincia26 agosto 1993, n. 412Heraldry of the World: GsiesStatistiche I.StatValCasies.comWikimedia CommonsWikimedia CommonsValle di CasiesSito ufficialeValle di CasiesMM14870458910042978-6

                              Typsetting diagram chases (with TikZ?) Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)How to define the default vertical distance between nodes?Draw edge on arcNumerical conditional within tikz keys?TikZ: Drawing an arc from an intersection to an intersectionDrawing rectilinear curves in Tikz, aka an Etch-a-Sketch drawingLine up nested tikz enviroments or how to get rid of themHow to place nodes in an absolute coordinate system in tikzCommutative diagram with curve connecting between nodesTikz with standalone: pinning tikz coordinates to page cmDrawing a Decision Diagram with Tikz and layout manager