4 February 2016

Python How to empty a list

Say you have a list in your program and you are going to reuse the name of the list, or delete all items from the list (empty the list). One option is to delete the list completely along with the name : 
>>> del li
Another option is to delete all items inside the list:  
>>> del li[:]
Many of you can think of another alternative : 
>>> li = []
Though li = [], apparently it makes li empty, actually it is creating a new list object and the previous object will be in memory.

Now, if you take some time to play in your Python interpreter, you will have a better idea what happens :

>>> li = [1, 2, 3, 4, 5]
>>> li2 = li
>>> li2
[1, 2, 3, 4, 5]
>>> del li[:]
>>> li
>>> li2
>>> li1 = [1, 3, 5]
>>> li3 = li1
>>> li3
[1, 3, 5]
>>> li1
[1, 3, 5]
>>> li1 = []
>>> li1
>>> li3
[1, 3, 5]
>>> li4 = [1, 2]
>>> li5 = li4
>>> li5
[1, 2]
>>> del li4
>>> li4
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'li4' is not defined
>>> li5
[1, 2]

Python AttributeError: 'module' object has no attribute

Are you getting this type of error in your Python program?
AttributeError: 'module' object has no attribute 'whatever' ? 
Well, then you might need to change the name of your python file.

For example, save the following code in a file named json.py :
import json
print json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])

Then try to run it, and you will get this error :

$ python json.py 
Traceback (most recent call last):
  File "json.py", line 1, in 
    import json
  File "/home/work/practice/json.py", line 3, in 
    print json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
AttributeError: 'module' object has no attribute 'dumps'

But of course the json module has 'dumps' attribute! What went wrong?

When you wrote import json, it tries to find the file json.py in the current directory first and it got the file that you wrote (you saved the file as json.py) and it surely doesn't have any attribute (variable or method) named dumps. All you need to do is to change your filename.

So, the lesson is, don't name your python program such a way that it matches a module name used in your program.

Python pow() function returns float or integer?

Today, someone posted this code and asked why x is float ? 
>>> from math import pow
>>> x = pow(5, 2)
>>> x
It's float because the pow function of math package returns float. But there is a built-in pow function in Python which will return integer in this case. You need to restart the python interpreter before you run the following code : 
>>> a = 5
>>> x = pow(a, 2)
>>> x
>>> type(x)
<type 'int'>
So, don't get confused. 

finding maximum element of a list - divide and conquer approach

Here I share my Python implementation of divide and conquer approach of finding maximum element from a list :
Divide and conquer approach to find the maximum from a list

def find_max(li, left, right):
    if left == right:
        return li[left]
    mid = (left + right) / 2
    max1 = find_max(li, left, mid)
    max2 = find_max(li, mid+1, right)
    return max1 if max1 > max2 else max2
def main():
    li = [1, 5, 2, 9, 3, 7, 5, 2, 10]
    print "Maximum element of the list is", find_max(li, 0, len(li)-1)
if __name__ == "__main__":

Python crawler controller

I work on a project where I have written 20+ crawlers and the crawlers are running 24/7 (with good amount of sleep). Sometimes, I need to update / restart the server. Then I have to start all the crawlers again. So, I have written a script that will control all the crawlers. It will first check if the crawler is already running, and if not, then it will start the crawler and the crawler will run in the background. I also saved the pid of all the crawlers in a text file so that I can kill a particular crawler immediately when needed.

Here is my code :

import shlex
from subprocess import Popen, PIPE

site_dt = {'Site1 Name' : ['site1_crawler.py', 'site1_crawler.out'], 
'Site2 Name' : ['site2_crawler.py', 'site2_crawler.out']}

location = "/home/crawler/"

pidfp = open('pid.txt', 'w')

def is_running(pname):
    p1 = Popen(["ps", "ax"], stdout=PIPE)
    p2 = Popen(["grep", pname], stdin=p1.stdout, stdout=PIPE)
    p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
    output = p2.communicate()[0]
    if output.find('/home/crawler/'+pname) > -1:
        return True
    return False

def main():
    for item in site_dt.keys():
        print item
        if is_running(site_dt[item][0]) is True:
            print site_dt[item][0], "already running"
        cmd = "python " + location + site_dt[item][0] + " -l info"   
        outfile = "log/" + site_dt[item][1] 
        fp = open(outfile, 'w')
        pid = Popen(shlex.split(cmd), stdout=fp).pid
        print pid
        pidfp.write(item + ": " + pid + "\n")
if __name__ == "__main__":

If you feel that there is scope for improvement, please comment.

Prime Number Generator in Python using Sieve Method

I have implemented a prime number generator in Python using Sieve of Eratosthenes. Here is my code:
import math

high = 10000
root = int(math.sqrt(high) + 1.0)

ara = [x % 2 for x in xrange(0, high)]

ara[1] = 0
ara[2] = 1

x = 3
while x <= root:        
    if ara[x]:
        z = x * x
        while z < high:             
            ara[z] = 0
            z += (x + x)        
    x += 2
prime = [x for x in xrange(2, len(ara)) if ara[x] == 1]
print prime
I am looking for ideas to make my code faster. I tested my Python program using the code for solving a problem named Prime Generator in SPOJ. I could solve the problem correctly that took 3 seconds time and 3.8 MB memory. I am posting another code I found here. This one is similar implementation to mine but more Pythonic and much faster:
nroot = int(math.sqrt(n))
sieve = [True] * (n+1)
sieve[0] = False
sieve[1] = False

for i in xrange(2, nroot+1):
    if sieve[i]:
        m = n/i - i
        sieve[i*i: n+1:i] = [False] * (m+1)

sieve = [i for i in xrange(n+1) if sieve[i]]
I updated the above code and now it's 20% faster! :)
nroot = int(math.sqrt(n))
sieve = [True] * (n+1)
sieve[0] = False
sieve[1] = False
sieve[4: n+1:2] = [False] * (n / 2 - 1)
for i in xrange(2, nroot+1):
    if sieve[i]:
        m = (n/i - i) / 2
        sieve[i*i: n+1:i+i] = [False] * (m+1)

sieve = [i for i in xrange(n+1) if sieve[i]]

How to get image size from url

Today I needed to get image size (width and height) from image urls. I am sharing my python code here.

import requests
from PIL import Image
from io import BytesIO

def get_image_size(url):
    data = requests.get(url).content
    im = Image.open(BytesIO(data))    
    return im.size

if __name__ == "__main__":
    url = "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcw6adFki0Rnm_aA1YNYXwLWCTzPKjLbG9hI0QweVAoLlNWzK2NQHez4RMhvh9A6WS1eSccU5Swsk2bewxXa9j0rYO3Gl7Jw8ht45n72SONTlwdALyDYJqWtcsW0IY6HFTmFjDuV1yvsRF/s1600/sbn_pic_2.jpg"
    width, height = get_image_size(url)
    print width, height

I used pillow - which is a PIL fork.

Feel free to comment if you have any suggestion to improve my code.

Code School Python Courses

As Python is getting more and more popular and becoming the de facto standard for starting to learn programming, we can see increasing amount of online content to learn Python. Last week got an email form Code School team to check out their couple of new Python courses. I spent around 30 minutes watching the videos, checkout out the exercises. The exercises are good, and can be done online. They also prepared the videos with care. But one thing that bothers me, the lecture is kind of robotic. I didn't feel anything special, in fact didn't feel the connection to the teacher that I used to feel in other courses in Coursera and Udacity. Anyway, Python beginners can check those courses:
1. Try Python
2. Flying through Python