How to count in linear time worst-case? Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?Sorting a set of $n$ elements containing only $log n$ unique elementsIs there a sorting algorithm of order $n + k logk$?Priority queue with unique elements and sublinear time merge?Average Case runtime for random choice searchCan element uniqueness be solved in deterministic linear time?Building static hash table with particular collisionsDoes my simple, static hash table have O(1) worst case lookup?Sorting a set of $n$ elements containing only $log n$ unique elementsCan the HTML5 parsing algorithm be implemented in linear time?Optimal data structure for sorted listDetermine the worst-case complexity that allow you to conclude that a given array with n elements is not sortedFind four sets where each element from those four appears in at least two of those four sets

Are all CP/M-80 implementations binary compatible?

Book with legacy programming code on a space ship that the main character hacks to escape

What is the ongoing value of the Kanban board to the developers as opposed to management

What's the difference between using dependency injection with a container and using a service locator?

Multiple options vs single option UI

Is accepting an invalid credit card number a security issue?

c++ diamond problem - How to call base method only once

What *exactly* is electrical current, voltage, and resistance?

Can I criticise the more senior developers around me for not writing clean code?

Israeli soda type drink

What is ls Largest Number Formed by only moving two sticks in 508?

Arriving in Atlanta after US Preclearance in Dublin. Will I go through TSA security in Atlanta to transfer to a connecting flight?

Why does the Cisco show run command not show the full version, while the show version command does?

Why is there a performance penalty for nested subroutines in Delphi?

How to find the right literary agent in the USA?

Dynamic Return Type

Is a 5 watt UHF/VHF handheld considered QRP?

My admission is revoked after accepting the admission offer

"Whatever a Russian does, they end up making the Kalashnikov gun"? Are there any similar proverbs in English?

How do cards with "X" work?

How would I use different systems of magic when they are capable of the same effects?

Multiple fireplaces in an apartment building?

AI positioning circles within an arc at equal distances and heights

Suing a Police Officer Instead of the Police Department



How to count in linear time worst-case?



Announcing the arrival of Valued Associate #679: Cesar Manara
Unicorn Meta Zoo #1: Why another podcast?Sorting a set of $n$ elements containing only $log n$ unique elementsIs there a sorting algorithm of order $n + k logk$?Priority queue with unique elements and sublinear time merge?Average Case runtime for random choice searchCan element uniqueness be solved in deterministic linear time?Building static hash table with particular collisionsDoes my simple, static hash table have O(1) worst case lookup?Sorting a set of $n$ elements containing only $log n$ unique elementsCan the HTML5 parsing algorithm be implemented in linear time?Optimal data structure for sorted listDetermine the worst-case complexity that allow you to conclude that a given array with n elements is not sortedFind four sets where each element from those four appears in at least two of those four sets










5












$begingroup$


This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:




Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = (x_i, c_i)^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.




Here are some (failed) ideas I've had and have been suggested:




  1. Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.


  2. Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.


  3. Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.


  4. Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.

Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?










share|cite|improve this question









$endgroup$







  • 2




    $begingroup$
    It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
    $endgroup$
    – Apass.Jack
    3 hours ago











  • $begingroup$
    @Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
    $endgroup$
    – ryan
    3 hours ago















5












$begingroup$


This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:




Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = (x_i, c_i)^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.




Here are some (failed) ideas I've had and have been suggested:




  1. Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.


  2. Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.


  3. Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.


  4. Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.

Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?










share|cite|improve this question









$endgroup$







  • 2




    $begingroup$
    It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
    $endgroup$
    – Apass.Jack
    3 hours ago











  • $begingroup$
    @Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
    $endgroup$
    – ryan
    3 hours ago













5












5








5





$begingroup$


This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:




Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = (x_i, c_i)^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.




Here are some (failed) ideas I've had and have been suggested:




  1. Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.


  2. Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.


  3. Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.


  4. Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.

Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?










share|cite|improve this question









$endgroup$




This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:




Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = (x_i, c_i)^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.




Here are some (failed) ideas I've had and have been suggested:




  1. Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.


  2. Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.


  3. Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.


  4. Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.

Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?







algorithms search-trees hash-tables






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked 4 hours ago









ryanryan

3,2191927




3,2191927







  • 2




    $begingroup$
    It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
    $endgroup$
    – Apass.Jack
    3 hours ago











  • $begingroup$
    @Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
    $endgroup$
    – ryan
    3 hours ago












  • 2




    $begingroup$
    It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
    $endgroup$
    – Apass.Jack
    3 hours ago











  • $begingroup$
    @Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
    $endgroup$
    – ryan
    3 hours ago







2




2




$begingroup$
It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
$endgroup$
– Apass.Jack
3 hours ago





$begingroup$
It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
$endgroup$
– Apass.Jack
3 hours ago













$begingroup$
@Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
$endgroup$
– ryan
3 hours ago




$begingroup$
@Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
$endgroup$
– ryan
3 hours ago










2 Answers
2






active

oldest

votes


















3












$begingroup$

This is a nice question.



In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.



However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.






share|cite|improve this answer









$endgroup$




















    1












    $begingroup$

    There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.



    In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.



    As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^100$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^100$.






    share|cite|improve this answer









    $endgroup$













      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "419"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f108465%2fhow-to-count-in-linear-time-worst-case%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      3












      $begingroup$

      This is a nice question.



      In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.



      However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.






      share|cite|improve this answer









      $endgroup$

















        3












        $begingroup$

        This is a nice question.



        In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.



        However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.






        share|cite|improve this answer









        $endgroup$















          3












          3








          3





          $begingroup$

          This is a nice question.



          In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.



          However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.






          share|cite|improve this answer









          $endgroup$



          This is a nice question.



          In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.



          However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered 3 hours ago









          Apass.JackApass.Jack

          14.5k1940




          14.5k1940





















              1












              $begingroup$

              There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.



              In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.



              As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^100$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^100$.






              share|cite|improve this answer









              $endgroup$

















                1












                $begingroup$

                There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.



                In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.



                As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^100$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^100$.






                share|cite|improve this answer









                $endgroup$















                  1












                  1








                  1





                  $begingroup$

                  There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.



                  In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.



                  As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^100$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^100$.






                  share|cite|improve this answer









                  $endgroup$



                  There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.



                  In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.



                  As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^100$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^100$.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered 2 hours ago









                  D.W.D.W.

                  104k14131298




                  104k14131298



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Computer Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f108465%2fhow-to-count-in-linear-time-worst-case%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Bett Inhaltsverzeichnis Geschichte | Bettformen | Bettgrößen | Andere Bezeichnungen | Bettenmangel | Betten in der bildenden Kunst | Schlafmedizinische Gesichtspunkte | Siehe auch | Literatur | Weblinks | Einzelnachweise | NavigationsmenüBett, Bettstatt, BettstelleCommons: BettBabybetten: Anwendung, Ausstattungsmerkmale und VergleichskriterienWasserbetten. Vorurteile im TestHapfnNursch10.1007/s11818-012-0584-74006250-8AKS4329276-8

                      Luksemburg Sisukord Nimi | Asend | Loodus | Riigikord | Haldusjaotus | Rahvastik | Riigikaitse | Majandus | Taristu | Ajalugu | Eesti ja Luksemburgi suhted | Haridus | Kultuur | Vaata ka | Viited | Välislingid | Navigeerimismenüü50° N, 6° EÜlevaade Luksemburgi kaitsealadest.Luksemburgi rahvaarv. Statistikaamet.World Bank'i andmebaasÜlevaade Luksemburgi loodusest.Ülevaade Luksemburgi metsadest.Guy Colling. "Red List of the Vascular Plants of Luxembourg." Travaux scientifiques du Musée national d’histoire naturelle Luxembourg. 2005.Luxembourg’s biodiversity at risk.Maailma kahepaiksete andmebaas.Denis Lepage. "Luxembourg." Avibase.Ülevaade temperatuuridest. Luksemburgi meteoroloogiateenistus.Ülevaade Luksemburgist. Euroopa Liidu esinduse koduleht.Système politique. TerritoireÜlevaade Luksemburgi rahvastikust. Luksemburgi statistikaamet.Luksemburgi rahvastik. Luksemburgi statistikaamet.The World FactbookMonique Borsenberger, Paul Dickes. "Religions au Luxembourg. Quelle évolution entre 1999-2008". Luksemburgi statistikaamet. 2011.Luksemburgi peapiiskopkond. Catholic-Hierarchy.Luksemburgi armee koduleht.Luksemburgi armee relvastus.Eesti Välisministeerium.Luksemburgi rahvastik. Luksemburgi statistikaamet.Luksemburgi Eesti Seltsi koduleht.Helen Eelrand. "Raadio, mis muutis maailma." Eesti Päevaleht. 13. märts 2004.Ülevaade Luksemburgi haridussüsteemist.Ülevaade Luksemburgi keskkoolidest.Luksemburgr

                      Valle di Casies Indice Geografia fisica | Origini del nome | Storia | Società | Amministrazione | Sport | Note | Bibliografia | Voci correlate | Altri progetti | Collegamenti esterni | Menu di navigazione46°46′N 12°11′E / 46.766667°N 12.183333°E46.766667; 12.183333 (Valle di Casies)46°46′N 12°11′E / 46.766667°N 12.183333°E46.766667; 12.183333 (Valle di Casies)Sito istituzionaleAstat Censimento della popolazione 2011 - Determinazione della consistenza dei tre gruppi linguistici della Provincia Autonoma di Bolzano-Alto Adige - giugno 2012Numeri e fattiValle di CasiesDato IstatTabella dei gradi/giorno dei Comuni italiani raggruppati per Regione e Provincia26 agosto 1993, n. 412Heraldry of the World: GsiesStatistiche I.StatValCasies.comWikimedia CommonsWikimedia CommonsValle di CasiesSito ufficialeValle di CasiesMM14870458910042978-6