Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Generators" section misleads about the ability to iterate generatiors multiple times #8

Open
SyberiaK opened this issue Jul 9, 2024 · 1 comment

Comments

@SyberiaK
Copy link

SyberiaK commented Jul 9, 2024

Generators are a powerful tool to save memory and improve performance. In general, they yield one value at a time and can be iterated over multiple times.

That's not true. After the first go-through the generator becomes empty.

L = [n for n in range(42_000)]
print(getsizeof(L))  # 351064 bytes
print(sum(L))  # 881979000

G = (n for n in range(42_000))
print(getsizeof(G))  # 200 bytes
print(sum(G))  # 881979000
print(sum(G))  # 0  # oops!

Also... making this kind of generator from range is kinda a wacky idea. You can just use the range itself, which also acts like a generator and saves some additional memory.
And it can actually be iterated multiple times.

from sys import getsizeof


L = [n for n in range(42_000)]
print(getsizeof(L))  # 351064 bytes
print(sum(L))  # 881979000

G = range(42_000)
print(getsizeof(G))  # 48 bytes
print(sum(G))  # 881979000
print(sum(G))  # 881979000  # yay
@Steven-Willers
Copy link

Also... making this kind of generator from range is kinda a wacky idea. You can just use the range itself, which also acts like a generator and saves some additional memory. And it can actually be iterated multiple times.

from sys import getsizeof


L = [n for n in range(42_000)]
print(getsizeof(L))  # 351064 bytes
print(sum(L))  # 881979000

G = range(42_000)
print(getsizeof(G))  # 48 bytes
print(sum(G))  # 881979000
print(sum(G))  # 881979000  # yay

@SyberiaK in Python, range() is often compared to generators due to its low memory usage. While it behaves like a generator in some ways, it is not a true generator.

For instance, in the generator example (given below), when the for loop breaks at i == 3, the generator "remembers" that iteration and continues from 4 when you convert it to a list.

gen = (i for i in range(10))

for i in gen:
    print(i)
    if i == 3:
        break

print(list(gen))  

Output:

0
1
2
3
[4, 5, 6, 7, 8, 9]

However, rangedoes not remember where you left off, leading to a reset-like behavior when evaluated again. This reduces efficiency in tasks like file reading, data collection and many others where on need to keep track.

gen= range(10)

for i in gen:
    print(i)
    if i == 3:
        break

print(list(gen))

Output:

0
1
2
3
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

While range() is memory-efficient, it lacks the advantages of a true generator. It behaves more like a lightweight list than a generator because it can be re-evaluated without remembering its state. One of the best explanation of range is given in this answer of SO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants