Lateo.net - Flux RSS en pagaille (pour en ajouter : @ moi)

🔒
❌ À propos de FreshRSS
Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.
À partir d’avant-hierMarina Mele's site

3rd-grade & Karatsuba multiplication Algorithms

Par Marina Mele

In this post we’re going to study the third-grade algorithm to compute the product of two numbers, and we’re going to compare it with a much more efficient algorithm: The Karatsuba multiplication algorithm.

Did you know that these two algorithms are the ones used in the built in Python multiplication?

We will talk about the Order of both algorithms and give you Python implementations of both of them.

Maybe you’re thinking, why is she writing now about algorithms? Some time ago, I took the course Algorithms: Design and Analysis, Part 1&2, by Tim Roughgarden,  and it was a lot of fun.

Now I’m taking the course again, but I’m spending more time to review the algorithms and play with them. I really encourage you to take the course if you have some time. But first, read this post 🙂

This is the outline:

  • Third grade multiplication algorithm
  • Python code for the third grade product algorithm
  • The Karatsuba Algorithm
  • Python code for the Karatsuba algorithm

Let’s start! 🙂

Third grade multiplication algorithm

First, we’re going to review the third grade algorithm, which all of you already know 🙂

Let’s start with these two numbers: 5678 x 1234. In order to compute their product you start with 4*5678, represented as:

(2)(3)(3)
   5  6  7  8
x  1  2  3 |4|
-------------
2  2  7  1  2

Let’s count the number of operations performed in this step:

– If n is the number of digits of the first number, there are n products and at most n sums (carried numbers).

– So in total, you need 2n operations.

If we continue with the product, we have to repeat n times the same operation (where we assume that n is also the number of digits of the second number):

5  6  7  8
      x  1  2  3  4
      -------------
      2  2  7  1  2
   1  7  0  3  4
1  1  3  5  6
5  6  7  8

which gives a total of 2n2 operations.

Finally, you need to sum all these numbers,

5  6  7  8
         x  1  2  3  4
         -------------
         2  2  7  1  2
      1  7  0  3  4
   1  1  3  5  6
+  5  6  7  8
----------------------
   7  0  0  6  6  5  2

which takes around n2 operations more.

So in total, the number of operations is ~3n2, which means that it’s quadratic with the input (proportional to n2).

One point I would like to mention about this algorithm is that it’s complete: no matter what x and y you start with, if you perform correctly the algorithm, the algorithm terminates and finds the correct solution.

Python code for the third grade product algorithm

Here I give you an example of an implementation of the third grade algorithm, where I have included a Counter for the sum and product operations:

# third_grade_algorithm.py

def counted(fn):
    # Counter Decorator
    def wrapper(*args, **kwargs):
        if "" in args or " " in args:
            return "".join(map(lambda s: s.strip(), args))
        wrapper.called += 1
        return fn(*args, **kwargs)
    wrapper.called = 0
    wrapper.__name__ = fn.__name__
    return wrapper


@counted
def prod(x, y):
    # x, y are strings --> returns a string of x*y
    return str(eval("%s * %s" % (x, y)))


@counted
def suma(x, y):
    # x, y are strings --> returns a string of x+y
    return str(eval("%s + %s" % (x, y)))


def one_to_n_product(d, x):
    """d is a single digit, x is n-digit --> returns a string of d*x
    """
    result = ""
    carry = "0"
    for i, digit in enumerate(reversed(x)):
        r = suma(prod(d, digit), carry)
        carry, digit = r[:-1], r[-1]
        result = digit + result
    return carry + result


def sum_middle_products(middle_products):
    # middle_products is a list of strings --> returns a string
    max_length = max([len(md) for md in middle_products])
    for i, md in enumerate(middle_products):
        middle_products[i] = " " * (max_length - len(md)) + md
    carry = "0"
    result = ""
    for i in range(1, max_length + 1):
        row = [carry] + [md[-i] for md in middle_products]
        r = reduce(suma, row)
        carry, digit = r[:-1], r[-1]
        result = digit + result
    return carry + result


def algorithm(x, y):
    # x, y are integers --> returns an integer, x*y
    x, y = str(x), str(y)
    middle_products = []
    for i, digit in enumerate(reversed(y)):
        middle_products.append(one_to_n_product(digit, x) + " " * i)
    return int(sum_middle_products(middle_products))

Using that algorithm, if you run

$ python -i third_grade_algorithm.py

where (third_grade_algorithm.py is the name of the file), you will run the previous code and terminate with the Python console open. This way you can call your algorithm function and try:

>>> algorithm(6885, 1600)
1101600
>>> print("Suma was called %i times" % suma.called)
Suma was called 20 times 
>>> print("Prod was called %i times" % prod.called)
Prod was called 16 times

So it took 20 + 16 = 36 operations to compute the product of these two numbers.

Once we have this code, we can average the number of operations used in the product of n-digit numbers:

def random_prod(n):
    suma.called = 0
    prod.called = 0
    x = randint(pow(10, n - 1), pow(10, n))
    y = randint(pow(10, n - 1), pow(10, n))
    algorithm(str(x), str(y))
    return suma.called + prod.called


def average():
    ntimes = 200
    nmax = 10
    result = []
    for n in range(nmax):
        avg = sum([random_prod(n + 1) for i in range(ntimes)]) / float(ntimes)
        result.append([n + 1, avg])
    return result

In the following figure, we plot the result of the previous experiment when

ntimes = 200
 samples and
nmax = 10
 digits:

operations-3rd-grade-algorithm

We can see that these points fit the curve f(n) = 2.7 n2.

Note that 2.7 < 3, the proportionality factor we deduced before. This is because we were assuming the worst case scenario, whereas the average product takes into account random numbers.

To sum up, the 3rd grade algorithm is an Ο(n2) complete algorithm.

But can we do better? Is there an algorithm that performs the product of two numbers quicker?

The answer is yes, and in the following section we will study one of them: The Karatsuba Algorithm.

The Karatsuba Algorithm

Let’s first explain this algorithm through an example:

Imagine you want to compute again the product of these two numbers:

x = 5678
y = 1234

In order to do so, we will first decompose them in a singular way:

x = 5678
a = 56; b = 78
--> x = 100 * a + b

and the same for y:

y = 1234
c = 12; d = 34
--> y = 100 * c + d

If we want to know the product x * y:

xy = (100 * a + b)(100 * c + d) = 100^2 * ac + 100 * (ad + bc) + bd

Where we can calculate the following 3 parts separately:

A = ac 
B = ad + bc
C = bd

However, we need two compute two products for the B term. Let’s see how we can compute only one product to reduce calculations 🙂

If we expand the product:

D = (a+b) * (c+d) = ac + ad + bc + bd

we see than the right hand side of the equation contains all A,B and C terms.

In particular, if we isolate B = ad + bc, we get:

B = D - ac - bd = D - A - C

Great! Now we only need three smaller products in order to compute x * y:

xy = 100^2 A + 100(D - A - C) + C
A = ac
B = (a+b)(c+d)
C = bd

Let’s put some numbers into the previous expressions:

xy = 100^2 * A + 100 * (D - A - C) + C
A = ac = 56 * 12 = 672
C = bd = 78 * 34 = 2652
D = (a+b)(c+d) = (56 + 78)(12 + 34) = 134 * 46 = 6164

xy = 100^2 * 672 + 100 * (6164 - 672 - 2652) + 2652 
   = 6720000 + 284000 + 2652
   = 7006652

Yes! the result of 1234 * 5678 = 7006652! 🙂

In the next section, we’ll see the pseudo code for the Karatsuba Algorithm and we’re going to translate it into Python code.

Python code for the Karatsuba Algorithm

In general though, we don’t have 4-digit numbers but n-digit numbers. In that case, the decomposition to be made is:

x = n-digit number
m = n/2 if n is even
m = (n+1)/2 if n is odd

a = 10^m * x1 + x2
--> x1 = First (n-m) digits of x
--> x2 = Last m digits of x

which is slightly different if n is even or odd.

In pseudocode, the Karatsuba Algorithm is:

procedure karatsuba(x, y)
  /* Base Case */
  if (x < 10) or (y < 10)
    return x*y
  /* calculates the number of digits of the numbers */
  m = max(size_base10(x), size_base10(y))
  m2 = m/2
  /* split the digit sequences about the middle */
  a, b = split_at(x, m2)
  c, d = split_at(y, m2)
  /* 3 products of numbers with half the size */
  A = karatsuba(a, c)
  C = karatsuba(b, d)
  D = karatsuba(a+b, c+d)
  return A*10^(2*m2) + (D-A-C)*10^(m2) + C

And the order of this algorithm Ο(nlog23) ≈ Ο(n1.585) is (more info).

Let’s write now the Python code 🙂

def karatsuba(x, y):
    # Base case
    if x < 10 or y < 10:
        return x * y
    # Calculate the number of digits of the numbers
    sx, sy = str(x), str(y)
    m2 = max(len(sx), len(sy)) / 2
    # Split the digit sequences about the middle
    ix, iy = len(sx) - m2, len(sy) - m2
    a, b = int(sx[:ix]), int(sx[ix:])
    c, d = int(sy[:iy]), int(sy[iy:])
    # 3 products of numbers with half the size
    A = karatsuba(a, c)
    C = karatsuba(b, d)
    D = karatsuba(a + b, c + d)
    return A * 10**(2 * m2) + (D - A - C) * 10**m2 + C

assert(karatsuba(1234, 5678) == 7006652)

Which is much simpler than the third-grade algorithm right?

Moreover, if we can compare the number of calculations of this algorithm with respect to the third-grade one:

Karatsuba vs 3rd grade algorithms

where the red line is the Karatsuba algorithm, and the blue lines the 3rd grade one (see above).

 

Finally, I want to write here one of the quotes mentioned in the Algorithms course, which might encourage you to find a better solution 🙂

Perhaps the most important principle of the good algorithm designer is to refuse to be content.Aho, Hopcroft and Ullman, The Design and Analysis of computer algorithms, 1974

 

Google+TwitterLinkedInFacebookReddit

The post 3rd-grade & Karatsuba multiplication Algorithms appeared first on Marina Mele's site.

How to create a WordPress theme from scratch (I)

Par Jordi Cuenca

The intuitive WordPress has become the most popular CMS (Content Management System), both by individuals and by companies. The fact that this platform is used in 25% of the Internet websites certify its undeniable popularity, and at the same time helps to understand why developers not using WordPress are wasting a huge opportunity.

Although there are a lot of causes that explain the success and its acceptation grade, some of the main reasons are the simplicity, flexibility and power that it offers to the developers when they have to create themes or plugins. Because of this, some developers say that despite they work with a CMS, they have the feeling that nothing is impossible to do.

Before starting to develop our theme, I recommend to make a new naked WordPress installation in a remote or local server. This is because in order to test our theme, we need to have it active.

In this post I’ll assume that you know how to install and configure a WordPress site, that you know how to access your server via FTP, and that you know HTML, CSS and have some notions of PHP.

Create a new Theme

As you may know, WordPress installs all themes in the themes directory, located inside the wp-content folder:

directorio-pantillas-de-wordpress

To begin to develop our theme, we can create a new folder with any name we want (don’t use special characters or spaces). However, keep in mind that to add a new directory inside themes doesn’t make WordPress understand that there exists a new theme to use. To offset this problem, we need at least two new files.

  • index.php
    It’s the file that WordPress calls if it can’t find any homepage template. Also it’s a required file in the root of your theme directory.
  • styles.css
    Besides being the main file to define the theme styles, it’s also the file where we will specify the theme main parameters.

estructura-basica-plantilla-wordpress

At the moment, we have the minimum required files for WordPress to detect a new theme. If we open the Theme Manager in our WordPress installation we’ll see something like this:

gestor-plantillas-wordpress-1detalle-de-plantilla-wordpress-1

Notice that when we see something like this, it’s because WordPress has already detected our new theme.

Next we’ll improve the Theme Manager preview, to make it more attractive to potential users.

Customize the Theme Manager Preview

Now is the moment to open our style.css file and add, in comment format, some parameters at the top of the document. It’s very important to keep exactly the same structure that we show below, so that WordPress doesn’t have problems in understanding the parameters.

/*
Theme Name: My First Wordpress Theme
Template Theme URI: http://my-website.com/my-first-wordpress-theme
Author: Your name here
Description: A description about your theme here.
Version: 1.0
*/

After this action, in the WordPress Theme Manager we’ll see something like this:

detalle-de-plantilla-wordpress-2

In this example only some of the possible parameters are defined (you can find the full set of options here).

Our advice is that you don’t only copy and paste the code: try to write it, change the parameters, put a cool name to your template, test and change the parameters… and check what happens in every situation. Additionally, we encourage you to continue these suggestions in any step we take. Dare to experiment without fear!

In order to complete our theme preview in the Theme Manager, we need to add a screenshot of the theme final look. As our theme consists on a white page, we’ll add a default image to illustrate this step.

This image should be renamed as screenshot, of 800x660px and preferably has to be in .png format (although .jpg, .jpeg or .gif are also valid formats). Moreover, it has to be placed inside the root directory of our theme, at the same level of the index.php and style.css files.

estructura-basica-plantilla-wordpress-con-screenshot

Now, our theme has a better look in the Theme Manager, right?

directorio-pantillas-de-wordpress-2

detalle-de-plantilla-wordpress-3

Additional Theme files

The next step is to add the rest of the files that we’ll use throughout this tutorial. All these files have their name defined by WordPress:

  • header.php
    This file is used to define the header that will be common throughout the theme navigation. For example, it can contain the main navigation menu as well as the contents inside the <head> tag.
  • footer.php
    Like the header.php file, this file will be common for all our theme. It’s used to indicate the information that is always in the footer of the page (for example the copyright or the legal information links).
  • sidebar.php
    This file was created to show a sidebar, specially for blogs or to contain widgets as a search engine, a category list or the most viewed posts.
  • home.php
    This is the template that WordPress loads as a default home page. This file is intended to be a post listing, as a blog front page.
  • front-page.php
    This template is the one that WordPress shows when the option “Show Static Page” as a home page is active. This file is intended for cases when the home page is a static page, like a corporative homepage.
  • single.php
    It is the template that shows a complete post by default.
  • page.php
    This template shows a page by default, whenever there is no other template specified in the page editor.
  • category.php
    This template will be used to show a post listing of a specific category.
  • comments.php
    It’s the template that we’ll call inside of single.php to add support to comments to our posts.
  • search.php
    It is the template that WordPress shows when it carries out a search.
  • 404.php
    It the template that WordPress shows when links are broken or don’t work.
  • functions.php
    This file will allow us to create menu and widget zones, and customize some parameters that WordPress bring by default.

After creating these files, the directory of our theme must to be like this:

estructura-plantilla-wordpress-completa-1

Note that we have just created these files, and are empty. Therefore, they do nothing by themselves. They simply are the files that WordPress will look for depending on each situation. Some of these files, for example the header.php, the footer.php or the sidebar.php, have to be added manually inside other templates.

Great! we already have the needed structure to create a functional theme in WordPress. The last step that we need to do is leave the theme active. It will be a white page, but this way we can start to see all our progresses that we are doing.

You can see the original post in spanish here.

See you at the next post!

Jordi Cuenca

When I was fourteen I read a book named “Learning to travel around the Internet”, which had a little HTML introduction in the last two chapters. These twenty-four pages changed my life forever. From that day, I never stopped working and learning about my two vocations, web developing and graphic design. As a front-end developer, I have specialized in Wordpress developing, web architecture and the design at the service of usability.

Google+TwitterLinkedInFacebookReddit

The post How to create a WordPress theme from scratch (I) appeared first on Marina Mele's site.

Install and Configure the Atom Editor for Python

Par Marina Mele

In this post we’ll talk about the Atom editor which is, as they say, A hackable text editor for the 21st Century.

It’s a really nice, open source and modern editor, with a broad community that provides different and new packages and functionalities. Have you tried it yet?

Here, you’ll learn how to install it and how to configure it to write Python code. Let’s start! 🙂

First, download Atom from the official webpage.

Once installed, if you have a Mac or Windows, you’ll have two commands available:

atom
 and
apm
. The first one is for opening the Atom editor, and the second one for installing Atom packages. We’ll see an example of both in the following.

Edit a Python file and use Atom’s Autocomplete

Let’s start by creating a Python file with:

$ atom myfile.py

This will open the file in Atom, and you’ll see the containing folder with all its contents on the left sidebar.

In the new file, if you type

de
, you’ll see that it suggests if you want to create a new function. This is because Atom has detected that the file extension is a Python extension.

creaate-function

If you type the

Tab
 key, you’ll see a template for a new function:

create-function-2

Note that you have the

fname
 highlighted. This is because you can now type the name of your function and it will replace 
fname
. Let’s name our function 
product
.

Next, if you hit the 

Tab
 key again, the arguments of the function,
arg
, will be now selected. Just write
x, y
, as we need two different arguments for our function.

Finally, hit the 

Tab
 key again to select the body of our function,
pass
, and replace it for our code. The end function should be something like:
def product(x,y):
    return x * y

Also notice the blue circle next to the file name. This means that there are unsaved changes in your current file. You can save it just typing the usual cmd+c (or ctrl+c in windows).

Linter for Atom

Linter is an Atom package that provides a top level API so that there is a unification among all the linter atom plugins. This means that all the extra packages that you install, that highlight your code (for example to detect errors) will use a unified method.

To install it, just type:

$ apm install linter

Next, we’re going to install a Python Linter package, to help us detect errors in our Python code.

This package is called linter-flake8 and it’s an interface to flake8. To install it, you need to run:

$ pip install flake8
$ pip install flake8-docstrings
$ apm install linter-flake8

You must restart Atom to see the changes. For example, if we add the folling line of code in our file:

y = product(a, b)

without specifying what

a
 or
b
 are, we’ll see the following in our Atom screen:

atom-linter-flake8

If you open Atom and you find an error that says

The linter binary flake8 cannot be found

You will need to open the Atom init script (Atom –> Open your Init Script) and write the following:

process.env.PATH = ['/usr/local/bin/', process.env.PATH].join(':')

Restart Atom the apply these changes. It should work now 😉

Moreover, there are Linters for other languages like HTML, CSS or Javascript. You can find a list here.

Further customisation for Python to follow PEP8

Here I’ll show you how you can configure Atom to follow PEP8, the official Python styling guide.

First, open the Atom –> Preferences window.

1. Use spaces instead of tabs.

Scroll down the Settings panel until you see the Soft Tabs option. Make sure it’s checked. This setting will convert tabs into spaces automatically.

atom-settings

2. Set the tab length to 4 spaces

A little below the Soft Tab setting, you”ll see the Tab Length. Set it to 4 spaces.

3. Automatic PEP8 validation.

If you installed the linter-flake8 package discussed in the previous section, you already have automatic PEP8 validation 🙂

Keybindings customisation

In the same Preferences panel, you can see the Keybindings menu on the left. There, you’ll find a list of all the default keybindings active in your Atom editor.

However, by default, Atom confirms an autocomplete suggestion with both the Tab and Enter keys. But I only want to use the Tab key.

In order to disable Enter as an autocomplete confirm key, we need to go to the Keybindings menu where you’ll see a link that says your keymap file. Click on that link to open the keymap.cson file.

There, you need to write:

# Disable Enter key for confirming an autocomplete suggestion
'atom-text-editor:not(mini).autocomplete-active':
  'enter': 'editor:newline'

Save the file and you’ll see the changes immediately. No need to restart Atom 🙂

Other Useful Packages

Project manager: a package for saving your projects.

Atom Django: Django support for Atom

Minimap: Displays a small map of the current file on the right side of your document (like Sublime Text by default).

Do you use other Packages? Write a comment below about them! 🙂

Google+TwitterLinkedInFacebookReddit

The post Install and Configure the Atom Editor for Python appeared first on Marina Mele's site.

Selenium Tutorial: Web Scraping with Selenium and Python

Par Marina Mele

Imagine what would you do if you could automate all the repetitive and boring activities you perform using internet, like checking every day the first results of Google for a given keyword, or download a bunch of files from different websites.

In this post you’ll learn to use Selenium with Python, a Web Scraping tool that simulates a user surfing the Internet. For example, you can use it to automatically look for Google queries and read the results, log in to your social accounts, simulate a user to test your web application, and anything you find in your daily live that it’s repetitive. The possibilities are infinite! 🙂

*All the code in this post has been tested with Python 2.7 and Python 3.4.

Install and use Selenium

Selenium is a python package that can be installed via pip. I recommend that you install it in a virtual environment (using virtualenv and virtualenvwrapper).

No virtualenv or virtualenvwrapper?

Learn how to create one here, for Python 2.7 and for Python 3. It’s really useful, once you start using them you won’t stop! 🙂

Remember that to create the environment in Python 2.7, just type:

$ mkvirtualenv selenium_env

and in Python 3:

$ mkvirtualenv --python=/usr/local/bin/python3 selenium_env

where you should use your own Python 3 path.

Note: if you don’t want to use a virtual environment, you can still install the packages directly on you computer.

To install selenium, you just need to type:

$ pip install selenium

In this post we are going to initialize a Firefox driver — you can install it by visiting their website. However, if you want to work with Chrome or IE, you can find more information here.

Once you have Selenium and Firefox installed, create a python file, selenium_script.py. We are going to initialize a browser using Selenium:

import time
from selenium import webdriver

driver = webdriver.Firefox()
time.sleep(5)
driver.quit()

This just initializes a Firefox instance, waits for 5 seconds, and closes it.

Well, that was not very useful…

How about if we go to Google and search for something?

Web Scraping Google with Selenium

Let’s make a script that loads the main Google search page and makes a query to look for “Selenium”:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException


def init_driver():
    driver = webdriver.Firefox()
    driver.wait = WebDriverWait(driver, 5)
    return driver


def lookup(driver, query):
    driver.get("http://www.google.com")
    try:
        box = driver.wait.until(EC.presence_of_element_located(
            (By.NAME, "q")))
        button = driver.wait.until(EC.element_to_be_clickable(
            (By.NAME, "btnK")))
        box.send_keys(query)
        button.click()
    except TimeoutException:
        print("Box or Button not found in google.com")


if __name__ == "__main__":
    driver = init_driver()
    lookup(driver, "Selenium")
    time.sleep(5)
    driver.quit()

In the previous code:

  • the function
    init_driver
      initializes a driver instance.
    • creates the driver instance
    • adds the 
      WebDriverWait
       function as an attribute to the driver, so it can be accessed more easily. This function is used to make the driver wait a certain amount of time (here 5 seconds) for an event to occur.
  • the function
    lookup
      takes two arguments: a driver instance and a query lookup (a string).
    • it loads the Google search page
    • it waits for the query box element to be located and for the button to be clickable. Note that we are using the 
      WebDriverWait
       function to wait for these elements to appear.
    • Both elements are located by name. Other options would be to locate them by
      ID, XPATH, TAG_NAME, CLASS_NAME, CSS_SELECTOR
       , etc (see table below). You can find more information here.
    • Next, it sends the query into the box element and clicks the search button.
    • If either the box or button are not located during the time established in the wait function (here, 5 seconds), the
      TimeoutException
        is raised.
  • the next statement is a conditional that is true only when the script is run directly. This prevents the next statements to run when this file is imported.
    • it initializes the driver and calls the lookup function to look for “Selenium”.
    • it waits for 5 seconds to see the results and quits the driver

Finally, run your code with:

$ python selenium_script.py

Did it work? If you got an

ElementNotVisibleException
 , keep reading!

How to catch an ElementNotVisibleExcpetion

Google search has recently changed so that initially, Google shows this page:

Google-Search

and when you start writing your query, the search button moves into the upper part of the screen.

Query-Google-Search

Well, actually it doesn’t move. The old button becomes invisible and the new one visible (and thus the exception when you click the old one: it’s not visible to click!).

We can update the lookup function in our code so that it catches this exception:

from selenium.common.exceptions import ElementNotVisibleException

def lookup(driver, query):
    driver.get("http://www.google.com")
    try:
        box = driver.wait.until(EC.presence_of_element_located(
            (By.NAME, "q")))
        button = driver.wait.until(EC.element_to_be_clickable(
            (By.NAME, "btnK")))
        box.send_keys(query)
        try:
            button.click()
        except ElementNotVisibleException:
            button = driver.wait.until(EC.visibility_of_element_located(
                (By.NAME, "btnG")))
            button.click()
    except TimeoutException:
        print("Box or Button not found in google.com")

  • the element that raised the exception,
    button.click()
     is inside a try statement.
  • if the exception is raised, we look for the second button, using 
    visibility_of_element_located
     to make sure the element is visible, and then click this button.
  • if at any time, some element is not found within the 5 second period, the
    TimeoutException
      is raised and caught by the two end lines of code.
  • Note that the initial button name is “btnK” and the new one is “btnG”.

Method list in Selenium

To sum up, I’ve created a table with the main methods used here.

Note: it’s not a python file — don’t try to run/import it 🙂

# INITIALIZE DRIVER
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

driver = webdriver.Firefox()
driver.wait = WebDriverWait(driver, 5)


# WAIT FOR ELEMENTS
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

element = driver.wait.until(
    EC.presence_of_element_located(
    EC.element_to_be_clickable(
    EC.visibility_of_element_located(
        (By.NAME, "name")
        (By.ID, "id")
        (By.LINK_TEXT, "link text")
        (By.PARTIAL_LINK_TEXT, "partial link text")
        (By.TAG_NAME, "tag name")
        (By.CLASS_NAME, "class name")
        (By.CSS_SELECTOR, "css selector")
        (By.XPATH, "xpath")
    )
)


# CATCH EXCEPTIONS
from selenium.common.exceptions import
    TimeoutException
    ElementNotVisibleException

That’s all! Hope it was useful! 🙂

Don’t forget to share it with your friends!

Google+TwitterLinkedInFacebookReddit

The post Selenium Tutorial: Web Scraping with Selenium and Python appeared first on Marina Mele's site.

7 tips to Time Python scripts and control Memory & CPU usage

Par Marina Mele

When running a complex Python program that takes quite a long time to execute, you might want to improve its execution time. But how?

First of all, you need the tools to detect the bottlenecks of your code, i.e. which parts take longer to execute. This way, you can concentrate in speeding these parts first.

And also, you should also control the memory and CPU usage, as it can point you towards new portions of code that could be improved.

Therefore, in this post I’ll comment on 7 different Python tools that give you some insight about the execution time of your functions and the Memory and CPU usage.

1. Use a decorator to time your functions

The simpler way to time a function is to define a decorator that measures the elapsed time in running the function, and prints the result:

import time
from functools import wraps


def fn_timer(function):
    @wraps(function)
    def function_timer(*args, **kwargs):
        t0 = time.time()
        result = function(*args, **kwargs)
        t1 = time.time()
        print ("Total time running %s: %s seconds" %
               (function.func_name, str(t1-t0))
               )
        return result
    return function_timer

Then, you have to add this decorator before the function you want to measure, like

@fn_timer
def myfunction(...):
    ...

For example, let’s measure how long it takes to sort an array of 2000000 random numbers:

@fn_timer
def random_sort(n):
    return sorted([random.random() for i in range(n)])


if __name__ == "__main__":
 random_sort(2000000)

If you run your script, you should see something like

Total time running random_sort: 1.41124916077 seconds

2. Using the timeit module

Anther option is to use the timeit module, which gives you an average time measure.

To run it, execute the following command in your terminal:

$ python -m timeit -n 4 -r 5 -s "import timing_functions" "timing_functions.random_sort(2000000)"

where timing_functions is the name of your script.

At the end of the output, you should see something like:

4 loops, best of 5: 2.08 sec per loop

indicating that of 4 times running the test (-n 4), and averaging 5 repetitions on each test (-r 5), the best test result was of 2.08 seconds.

If you don’t specify the number of tests or repetitions, it defaults to 10 loops and 5 repetitions.

3. Using the time Unix command

However, both the decorator and the timeit module are based on Python. This is why the unix time utility may be useful, as it is an external Python measure.

To run the time utility type:

$ time -p python timing_functions.py

which gives the output:

Total time running random_sort: 1.3931210041 seconds
real         1.49
user         1.40
sys          0.08

The first line comes from the decorator we defined, and the other three:

  • Real indicates the total time spent executing the script.
  • User indicates the amount of time the CPU spent executing the script
  • Sys indicates the amount of time spent in kernel-level functions.

Note: as defined in wikipedia, the kernel is a computer program that manages input/output requests from software, and translates them into data processing instructions for the central processing unit (CPU) and other electronic components of a computer.

Therefore, the difference between the real time and the sum of user+sys may indicate the time spent waiting for input/output or that the system is busy running other external tasks.

4. Using the cProfile module

If you want to know how much time is spent on each function and method, and how many times each of them is called, you can use the cProfile module:

$ python -m cProfile -s cumulative timing_functions.py

Now you’ll see a detailed description of how many times each function in your code is called, and it will be sorted by the cumulative time spent on each one (thanks to the -s cumulative option).

cprofile_python

You’ll see that the total amount of time spent on running your script is higher than before. This is the penalty we pay for measuring the time each function takes to execute.

5. Using line_profiler module

The line_profiler module gives you information about the CPU time spent on each line in your code.

This module has to be installed first, with

$ pip install line_profiler

Next, you need to specify which functions you want to evaluate using the @profile decorator (you don’t need to import it in your file):

@profile
def random_sort2(n):
    l = [random.random() for i in range(n)]
    l.sort()
    return l

if __name__ == "__main__":
    random_sort2(2000000)

Finally, you can obtain a line by line description of the random_sort2 function by typing:

$ kernprof -l -v timing_functions.py

where the -l flag indicates line-by-line and the -v flag indicates verbose output. With this method, we see that the array construction takes about 44% of the computation time, whereas the sort() method takes the remaining 56%.

kernprof_line_profiler

You will also see that due to the time measurements, the script might take longer to execute.

6. Use the memory_profiler module

The memory_profiler module is used to measure memory usage in your code, on a line-by-line basis. However, it can make your code to run much more slower.

Install it with

$ pip install memory_profiler

Also, it is recommended to install the psutil package, so that the memory_profile runs faster:

$ pip install psutil

In a similar way as the line_profiler, use the @profile decorator to mark which functions to track. Next, type:

$ python -m memory_profiler timing_functions.py

yes, the previous script takes longer than the 1 or 2 seconds that took before. And if you didn’t install the psutil package, maybe you’re still waiting for the results!

memory_profiler

Looking at the output, note that the memory usage is expressed in terms of MiB, which stand for mebibyte (1MiB = 1.05MB).

7. Using the guppy package

Finally, with this package you’ll be able to track how many objects of each type (str, tuple, dict, etc) are created at each stage in your code.

Install it with

$ pip install guppy

Next, add it in your code as:

from guppy import hpy


def random_sort3(n):
    hp = hpy()
    print "Heap at the beginning of the function\n", hp.heap()
    l = [random.random() for i in range(n)]
    l.sort()
    print "Heap at the end of the function\n", hp.heap()
    return l


if __name__ == "__main__":
    random_sort3(2000000)

And run your code with:

$ python timing_functions.py

You’ll see something like the following output:

Guppy_Python

By placing the heap at different places in your code, you can study the object creation and deletion in the script flow.

If you want to learn more about speeding your Python code, I recommend you the book High Performance Python: Practical Performant Programming for Humans, september 2014.

Hope it was useful! 🙂

Don’t forget to share it with your friends!

Google+TwitterLinkedInFacebookReddit

The post 7 tips to Time Python scripts and control Memory & CPU usage appeared first on Marina Mele's site.

Use the Google Analytics API with Django

Par Marina Mele

Google Analytics is a powerful tool that gives you useful insight about the traffic that arrives in your website. But together with Django, you will be able to retrieve data of your Analytics account and display it to the user, on demand.

In this post you’ll learn how to put together Django, the oauth2 protocol and the Google Analytics API.

Let’s Start! 🙂

First of all, you might want to check this post if you’re not familiar with the Google Analytics API with Python.

Moreover, it assumes that you are familiar with Python and pip, and that you know Django and have a working application. If you navigate into the Python and Django sections of this website you’ll learn more about those 😉

Create a Google Application

I assume you have a working Django application, and that you want to extract data from your Google Analytics account using an existing Project. If you didn’t create a Project in the Google Developers Console, check this post and create one. Then, come back 😉

But if you have a live Django project, then you need to create a Web application instead. You just need to select Web application as the Application type when you are creating a Client Id in the Credentials section:

Google Web Application

 

In Authorized Javascript origins insert your website domain, and in authorized redirect uris, insert your domain plus /oauth2/oauth2callback. This endpoint will be used by django to catch the oatuh2 response.

After that, you should have the Oauth credentials in a json format.

From where should I get the JSON Credential file?

You can obtain it from the Google Developers Console. Go to your project and then select APIs & Oauth credentials –> Oauth and click on Download JSON.

Install oauth2 and Google Analytics

Next, we need to install the oaut2client python package. Activate your virtual environment (if you have one) and type:

$ pip install oauth2client

This will install oauth2client and the following packages: httplib2 (an http library), pyasn1 and pyasn1-modules (to use the ASN.1 protocol), and rsa to use the RSA cryptosystem).

No virtual environment?

Learn how to create one here, for Python 2.7 and for Python 3. It’s really useful, once you start using them you won’t stop! 🙂

Don’t forget to add these packages into the requirements file of your virtual environment.

And also, we will need the API Python Client library:

$ pip install python-gflags
$ pip install -U google-api-python-client

Create a Django App

Next, we will create a Django app to manage the Authorization process. Depending on where you want to create the app’s folder, you will run a slightly different command (this one will create the folder in the current directory):

$ python manage.py startapp oauth2_authentication

Take the client_secrets.json file and save it inside this app, and include this app into your settings.py file, in the INSTALLED_APPS.

Now that we have the app created, we are going to create two different models that will store the Flow and Credential objects.

Edit the models.py inside the oaut2_authentication app and write:

from django.db import models
from django.contrib.auth.models import User
from oauth2client.django_orm import FlowField, CredentialsField


class FlowModel(models.Model):
    id = models.ForeignKey(User, primary_key=True)
    flow = FlowField()


class CredentialsModel(models.Model):
    id = models.ForeignKey(User, primary_key=True)
    credential = CredentialsField()

Next, we’ll edit the urls of your app. First, open the main urls.py and add the following urlpattern:

url(r'^oauth2/', include('oauth2_authentication.urls', namespace="oauth2"))

And in the oauth2_authentication/urls.py write:

from django.conf.urls import patterns, url
from . import views

urlpatterns = patterns(
    '',
    url(r'^$', views.index, name='index'),
    url(r'oauth2callback', views.auth_return, name='return'),
)

This way, /oauth2 will start the authorization process, and /oauth2/oaut2callback will wait for the authorization response.

But we need to write our views.py first:

import os
import httplib2
from oauth2client import xsrfutil
from oauth2client.client import flow_from_clientsecrets
from oauth2client.django_orm import Storage

from apiclient.discovery import build

from django.contrib.auth.decorators import login_required
from django.http import HttpResponseBadRequest
from django.http import HttpResponseRedirect
from django.shortcuts import render
from django.conf import settings
from django.contrib.auth import get_user_model
from django.core.urlresolvers import reverse
from django.contrib.sites.models import get_current_site

from .models import CredentialsModel, FlowModel

CLIENT_SECRETS = os.path.join(
    os.path.dirname(__file__), 'client_secrets.json')


def get_accounts_ids(service):
    accounts = service.management().accounts().list().execute()
    ids = []
    if accounts.get('items'):
        for account in accounts['items']:
            ids.append(account['id'])
    return ids


@login_required
def index(request):
    # use the first REDIRECT_URI if you are developing your app
    # locally, and the second in production
    # REDIRECT_URI = 'http://localhost:8000/oauth2/oauth2callback'
    REDIRECT_URI = "https://%s%s" % (
        get_current_site(request).domain, reverse("oauth2:return"))
    FLOW = flow_from_clientsecrets(
        CLIENT_SECRETS,
        scope='https://www.googleapis.com/auth/analytics.readonly',
        redirect_uri=REDIRECT_URI
    )
    user = request.user
    storage = Storage(CredentialsModel, 'id', user, 'credential')
    credential = storage.get()
    if credential is None or credential.invalid is True:
        FLOW.params['state'] = xsrfutil.generate_token(
            settings.SECRET_KEY, user)
        authorize_url = FLOW.step1_get_authorize_url()
        f = FlowModel(id=user, flow=FLOW)
        f.save()
        return HttpResponseRedirect(authorize_url)
    else:
        http = httplib2.Http()
        http = credential.authorize(http)
        service = build('analytics', 'v3', http=http)
        ids = get_account_ids(service)
        return render(
            request, 'oauth2_authentication/main.html', {'ids':ids})


@login_required
def auth_return(request):
    user = request.user
    if not xsrfutil.validate_token(
            settings.SECRET_KEY, request.REQUEST['state'], user):
        return HttpResponseBadRequest()
    FLOW = FlowModel.objects.get(id=user).flow
    credential = FLOW.step2_exchange(request.REQUEST)
    storage = Storage(CredentialsModel, 'id', user, 'credential')
    storage.put(credential)
    return HttpResponseRedirect("/oauth2")

Note that you have to choose which REDIRECT_URI to use, depending on if you are developing your Django app locally or your app is live on a server.

Inside the FLOW object, we have used the readonly scope, but you could also choose:

https://www.googleapis.com/auth/analytics for writing permissions, and https://www.googleapis.com/auth/analytics.manage.users to view and manage permission of users in the Analytics accounts (more info).

And finally, write something in the oauth2_authentication/main.html template, like:

Your credentials are up to date! :-)

{% if ids %}
The List of Ids of your Google Analytics accounts are: {{ids}}
{% endif %}

Now you just need to start the server and visit /oauth2 to start the authentication process. If everything worked correctly, you should see the list of ids of your Google Analytics account.

Note that once you have the Google Analytics service object you can retrieve all the data you want from your accounts. To see more examples, you can check this post: Google Analytics API with Python.

Google+TwitterLinkedInFacebookReddit

The post Use the Google Analytics API with Django appeared first on Marina Mele's site.

From a Python script to a portable Mac application with py2app

Par Marina Mele

Do you want to share a Python script to a friend that doesn’t have Python installed? In this post you’ll learn how 🙂

With the py2app package you can compile a Python script and create a portable Mac application. If you’re using windows, you can check py2exe.

First of all, we need to install the py2app python package. If you have a virtual environment, activate it now 🙂

$ pip install -U py2app

Next, we need to create a setup.py file for our project. In your working directory (the one that contains your Python script), type:

$ py2applet --make-setup MyApplication.py

This will create the file setup.py, which is responsible to tell setuptools how to build your application.

If you edit this file, you should see something like:

"""
This is a setup.py script generated by py2applet

Usage:
    python setup.py py2app
"""

from setuptools import setup

APP = ['MyApplication.py']
DATA_FILES = []
OPTIONS = {'argv_emulation': True}

setup(
    app=APP,
    data_files=DATA_FILES,
    options={'py2app': OPTIONS},
    setup_requires=['py2app'],
)

You should put the name of your starting python script (the one your run to start) in the APP variable.

APP = ['application_name.py']

Also, if your application uses some data files, like a json or a txt file, you should include it in DATA_FILES. For example:

DATA_FILES = ['myfile.json', 'myfile.txt']

Now, we will compile our project in Alias mode. This instructs py2app to build an application that uses the source data files in-place. Note that this means that the application will not be portable to other machines (we’ll do that next!).

$ python setup.py py2app -A

You will have two new folders inside your working directory: build and dist. The first one is used for building your app (you don’t have to touch it) and the second one contains your application bundle.

From the working directory, you can run your app using:

$ ./dist/application_name.app/Contents/MacOS/application_name

or you can find your application in Finder and open it from there (you’ll find it in dist/application_name).

Once your application is running, we are ready to create the stand alone version. First remove the build and dist folders with:

$ rm -rf build dist

and next, build your application:

$ python setup.py py2app

During the building process, I encounter some problems with the ModuleGraph package (remember that this package is installed at the same time as py2app).

If you get one of the following errors:

AttributeError: 'ModuleGraph' object has no attribute 'scan_code'
AttributeError: 'ModuleGraph' object has no attribute 'load_module'

Edit the file where this error ocured (for me, it was inside the virtual environment folder, named myenv), at:

myenv/lib/python2.7/site-packages/py2app/recipes/virtualenv.py

Look for the functions scan_code or load_module and add it an underscore before them, _scan_code and _load_module. Build again your application, it should work now 🙂

Next, try to run your app, and see if it works.

If you get errors with external packages, like certificates with httplib2 or requests, missing files in selenium webdriver, or import errors that normally work, you’ll have to fully include these packages in your application. Edit again the setup.py file and add the package that raised the error:

OPTIONS = {
    'argv_emulation': True,
    'packages': ['httplib2', 'requests', 'selenium']
    }

After saving the file, remove again the build and dist directories, build your application and try to run it. Maybe another package failed? Try to include it in the setup file and start over.

However, not all the errors you get will be solved like this, sorry…!

Once you’re application runs correctly, and you’re happy with the results, you can share it by copying the application bundle dist/application_name.

That’s all! Hope it was useful!

And please, don’t forget to share it with your friends, they might find it useful too 😉

Google+TwitterLinkedInFacebookReddit

The post From a Python script to a portable Mac application with py2app appeared first on Marina Mele's site.

13 WordPress Plugins to improve your Website

Par Marina Mele

To manage this website, I changed from Blogger to WordPress because I wanted to convert my Blog into a Website. I liked the idea to have structured pages where people could navigate through, and at the same time, maintain the Blog in a subfolder.

In this post, I want to share some WordPress plugins I installed for the website, and that helped me obtain the look and feel that I wanted. For example, a plugin for security aspects, like attempts to enter to your admin site, or a plugin for Search Engine Optimization (SEO), which gives you some tips that improve how search engines see your content.

Keep your website Secure

One of the first things you should care about is security. Is anybody trying to access to your WP Admin interface? How can you block them? Is there someone making automated requests to your site?

To manage these issues, you can use a security plugin like iThemes Security, which among other things:ithemes_security

  • it obscures some WordPress security vulnerabilities by changing the default admin url, the default admin id (usually id 1), and allows for an away mode (blocks any admin login during some periods of time).
  • it protects your site from hacker attacks by banning troublesome user agents, or detecting and blocking attacks to your filesystem and database.

Create another environment to develop your site

When developing a site, it’s very useful to have a copy in a local environment, e.g. in your computer. This way, you can modify all the files and see the changes automatically, without having to upload them into the server.

Moreover, once you are satisfied with the changes you’ve made, you should upload them into a pre-production environment. This environment should be as similar as possible to the production environment (same server, same kind of database, etc). This way, you can catch some unexpected error before going to production.duplicator_plugin

One way to move all the WordPress site between your local, pre-production and production environments is to use a plugin like Duplicator. It allows you to duplicate, clone, backup, move and transfer an entire site from one location to another.

If you want to see a detailed description in how you can create a local environment and use Duplicator to move the local WordPress installation into your production environment, check this post.

Coming Soon and Maintenance Pages

At some point during the development process, and before the website goes live, you have to work in the production environment (or at least having it active). In these cases, you don’t want that users see the current content of your site, because it’s probably unfinished and with some style issues. However, at the same time you want to inform them that the page is still in working process, using a page with the message Coming Soon or Page under construction.

A similar argument applies when your site is running through some technical issues and you need to stop the user from visiting your site. In this case, you can show him a Maintenance Page to inform about the specific issue, and predict when the site will be active again.Coming Soon Plugin

To manage these cases, you can use the plugin Coming Soon Page & Maintenance Mode, which is very easy to implement. Basically, it prevents unregistered users to see the contents of your site by showing them a Coming Soon or a Maintenance page (depending on which you have defined).

At the same time, it allows registered users (like admin, writers, etc) to see both the contents and the admin interface. This way you can still work, develop, and add new content in your site.

Don’t forget about Search Engines

Have you heard about SEO? It stands for Search Engine Optimization, and basically it means that you should optimize your posts so that search engines like Google or Bing know what your post talks about. This way, when someone searches for the topic of your post, it will appear amongst the first entries.Wordpress SEO

This is very, very important. Studies have shown that people clicks on one of the first three results of Google around 60% of the time. And less than 10% of the users clic on a result in the second page of results! So placing your website among the first results can make a difference regarding the amount of traffic to your website!

The WordPress SEO plugin helps you improve the optimization of your posts. It have many functionalities, in particular:

  • It provides you with useful tips about how to write your posts focusing in one specific keyword (the one that your post will rank for on search engines).
  • It allows you to write the meta description of your posts (this is the text that will appear on search engines when listing your page).
  • It takes care of your website Sitemap and alows you to edit the robots.txt file (to prevent search engines to index specific pages, such the admin interface).

This plugin is a must-have, don’t wait to install it!

Customize the WordPress Editor

The default WordPress Editor lacks from several functionalities that we are used to have in our day-to-day text editor.

The TinyMCE Advanced plugin allows you to add buttons in your editor to control some extra functionalities, like the font family, font size or background color, and at the same time it provides you with buttons to use as a shortcuts of the usual functionalities, like the Formats menu or the horizontal line.

TinyMCE    Shortcodes Ultimate

Another useful plugin for your text editor is the Shortcodes Ultimate plugin, which adds a button in your editor to insert a variety of WordPress shortcodes, like spoiler lists, carousel images, audio, etc.Crayon Sintax Highlighter

And finally, for those of you who write technical posts like me, I recommend the Crayon Syntax Highlighter plugin.

It adds a small button in your text editor that opens a nice code editor. There, you can select the language of your code (Python, Java, etc) so it will highlight it accordingly. Moreover, there are many options to customize the style of the code snippets, colors, numbered lines, interactivity, etc.

Social Media Plugins

For every post and page of your site, you probably want to add buttons for sharing your content in social media. The Share Buttons plugin lets you do exactly that.

You can choose which social media buttons you want to include, like Google+, Facebook, Twitter or Reddit, and in some cases, even add a share counter.

Moreover, apart from the usual place of the sharing buttons, just after the post title, or at the end of the post, you can have a floating right menu with your buttons. The good thing about this method is that these buttons are visible anytime by the user. To you see them right now on the right of your screen? Sorry for the mobile users, they are hidden for you 🙂

Share buttons    Social Comments

Next on the list is the Social Comments plugin. Basically, it allows users to comment and share your posts on any of these social media: Google+, Facebook and Disqus.

For example, if you want to migrate a Blogger Blog into WordPress, I’m sure you would like to maintain your Google+ comments. Yeah, this website is an example for this case too 🙂

Plugins to enhance the Widget functionality

Finally, in this last section I will talk about the Widget functionality. Widgets are small pieces that customize the look and usability of your site.

Usually, you can place widgets in your sidebar or your footer, but depending on your specific Theme, you may have extra spaces.

WordPress comes with some default widgets to add, like a list of Categories of your posts, a Calendar, or a list of the recent comments. However, you may want to include much more things there! 🙂

One plugin I really like is Widget Logic, which as the name suggests, it allows to add some logic into your plugins. For example, you may want to show one widget or another depending on if you’re in a page or in post, or only show a specific widget for a particular post.

Widget Logic    Enhanced Text

Another important widget is the Enhanced Text Widget, which allows you to write HTML, CSS, JavaScript, Flash, Shortcodes or PHP content inside your widget. This might be useful when you want to insert some piece of code in your website, like for example, a MailChimp form.

Two useful Widget plugins

Next, you can use the Custom Meta Widget to customize the default WordPress Meta Widget. This is the one responsible for showing the login, admin, entry and comment feed. The problem is that in the default Meta Widget you can’t control which links to show, but with this widget you can 🙂

Custom Meta Widget    BE Subpages Widget

And finally, the last widget I will mention here is the BE Subpages Widget. Basically, it allows you to display another menu as a widget. For example, if your theme don’t implement a menu in your sidebar, you can create one and add it using this widget it your sidebar 🙂

That’s all!

Do you have other useful widgets that you want to share? Write your comments above! 😉

Google+TwitterLinkedInFacebookReddit

The post 13 WordPress Plugins to improve your Website appeared first on Marina Mele's site.

How to correctly remove a Model or an App in Django

Par Marina Mele

When you remove a Django Model or a full app in Django, you have to be careful in how you do it, and how the database will be affected.

Thus, in this short post I’ll explain what I do when deleting a model or an app.

Remove the Models from models.py

Weather you’re removing a single model or a full app, you should first remove the desired models from the models.py file.

Moreover, you have to make sure that no other file imports these models or uses them (admin.py, views.py, etc).

Next, run South or migrate your database to properly delete these models from your database.

If you are using South, you should run something like

$ python manage.py schemamigration --auto your-app
$ python manage.py migrate your-app
changing your-app by the name of the app to migrate (where your deleted models were).

Remove an App from your Django project

Next, if you are removing an entire app, you can now remove all the files in that app (models.py, views.py, etc). Again, make sure that the other apps don’t make an import from these files.

And finally, remove this app from your Installed Apps in the settings.py file.

Remove the remnant ContentTypes

But after doing all this, you will still have remnants of your models in the ContentType table, like permissions related to the deleted models.

One way to see this is when creating a Group in the Auth app using the admin. When you have to choose the permissions of the group, the box will show permissions that refer to the deleted models.

To clean these tables, we will create the file clean_model_app_remove.py and write the following:

from django.contrib.contenttypes.models import ContentType

# List of deleted apps
DEL_APPS = ["app-you-deleted", "second-app-deleted"]
# List of deleted models (that are not in the app deleted) In lowercase!
DEL_MODELS = ["model-you-deleted", "second-model-deleted"]

ct = ContentType.objects.all().order_by("app_label", "model")

for c in ct:
    if (c.app_label in DEL_APPS) or (c.model in DEL_MODELS):
        print "Deleting Content Type %s %s" % (c.app_label, c.model)
        c.delete()
where you should write in DEL_APPS the apps that you deleted, and in DEL_MODELS the models that you delete and that do not belong to any of the deleted apps. Moreover, note that the models have to be written in lowercase.

Finally, we just need to run our script with

$ python manage.py shell
>>>execfile("clean_model_app_remove.py")

Great! Now we cleanly removed our apps/models! 🙂

Google+TwitterLinkedInFacebookReddit

The post How to correctly remove a Model or an App in Django appeared first on Marina Mele's site.

Use the Google Analytics API with Python

Par Marina Mele

Python and Google AnalyticsIn this post you will learn to create a Python script that allows users to enter its Google Analytics account and get information from there.

In order to do that, we will create a Project in the Google Developers Console and authorize it to use the Analytics API.

Next, we will use the Oauth 2.0 protocol to allow users to connect to their Analytics account through our Project.

And finally, we will retrieve the number of sessions of our view, segmented by traffic source.

Let’s start!

Create a Project in Google Developers Console

Go to the Google Developers Console and login with your account.

Click on Create Project and write your Project name and choose (if you want) your project ID.

Next, on your new project menu, go to APIs & auth –> Credentials. Here, in the Oauth section, click on Create new Client ID.

In this case, as we are creating a script that will run on our computer, we will choose Installed application as the application type, and Other as the installed application type.

Create a Project for Google Analytics API

Finally, click on Create Client ID.

You will see, next to the OAuth section, the credentials for your project, which contain your Client ID, the Client Secret, and the redirect URIS. Click on Download JSON to download them, and save the file as client_secrets.json.

From here, go to APIs & auth –> Consent screen and personalize the message that your users will see when requesting access to their accounts.

Next, we need to activate the Goolge Analytics API in your Project. Go to APIs & auth –> APIs and look for the Analytics API. You just need to activate it by clicking at the OFF button on the right.

Ok! now that we have our Project created we can move on to our Python script!

The Google API Python client library

In order to use the Analytics API with Python, we will use the Google API Python Client library. You can install it in your working environment using pip (how? learn to install Python, virtualenv and virtualenvwrapper to work with virtual environments).

$ pip install python-gflags
$ pip install -U google-api-python-client

We also install the python-gflags library, which we will use latter in the code.

Next create the file analytics_service_object.py in your working directory (where client_secrets.json is located). This file will create an authorized Analytics Service object, used to interact with the user’s analytics accounts.

import httplib2
from apiclient.discovery import build
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client import tools
import argparse

CLIENT_SECRETS = 'client_secrets.json'

# The Flow object to be used if we need to authenticate.
FLOW = flow_from_clientsecrets(
    CLIENT_SECRETS,
    scope='https://www.googleapis.com/auth/analytics.readonly',
    message='%s is missing' % CLIENT_SECRETS
    )

# A file to store the access token
TOKEN_FILE_NAME = 'credentials.dat'


def prepare_credentials():
    parser = argparse.ArgumentParser(parents=[tools.argparser])
    flags = parser.parse_args()
    # Retrieve existing credendials
    storage = Storage(TOKEN_FILE_NAME)
    credentials = storage.get()
    # If no credentials exist, we create new ones
    if credentials is None or credentials.invalid:
        credentials = tools.run_flow(FLOW, storage, flags)
    return credentials


def initialize_service():
    # Creates an http object and authorize it using
    # the function prepare_creadentials()
    http = httplib2.Http()
    credentials = prepare_credentials()
    http = credentials.authorize(http)
    # Build the Analytics Service Object with the authorized http object
    return build('analytics', 'v3', http=http)

if __name__ == '__main__':
    service = initialize_service()

In the previous script:

  • CLIENT_SECRETS loads your credentials from the client_secrets.json file.
  • TOKEN_FILE_NAME is the file where the user-specific credentials will be stored (this file also includes some project-specific credentials).
  • the prepare_credentials() function tries to load the credentials from the TOKEN_FILE_NAME and if they don’t exist it creates new ones using the run_flow function.
  • the initialize_service() function uses the credentials to build an authorized Analytics Service object, and returns this object.

Now, when you type

$ python analytics_service_object.py

you will see, in a browser window, the consent screen you customized before. This means that your Project is asking your permission to access your Analytics account through the API. After clicking yes, your new credentials will be stored in TOKEN_FILE_NAME so that you won’t have to enter them again (except when the access_token expires).

The Analytics Service object

Once we have an authorized Analytics service object, we can use it to retrieve all the data in the user’s analytics accounts.

For example, to get a list of all the existing accounts of the user, just type:

accounts = service.management().accounts().list().execute()

This will give you a dictionary containing the following keys:

  • username: the email address of the user
  • kind: analytics#accounts
  • items: a list of the user’s accounts.
  • totalResults
  • itemsPerPage
  • startIndex

As we will see, this is a common structure when getting data from analytics, even when we ask for properties or views instead of accounts (the returned object has the same keys).

Moreover, the items value is a list of accounts, each of which is in turn a dictionary with keys:

  • id: your account id
  • kind: analytics#account
  • childLink
  • created
  • permissions
  • selfLink
  • updated

Therefore, you can get a list of your users accounts with:

def get_accounts_ids(service):
    accounts = service.management().accounts().list().execute()
    ids = []
    if accounts.get('items'):
        for account in accounts['items']:
            ids.append(account['id'])
    return ids

You can also see the account ids in the Google Analytics web. You have to go to the Admin tab, and open the top-left drop down menu. There, your different accounts will be displayed, with their id on the right.

But as you may know, each Account can have multiple Properties, each of which has a different tracking code. To obtain a list of the Properties inside the Account with an id of account_id, you can use:

webproperties = service.management().webproperties().list(
    accountId=account_id).execute()

where webproperties is a dictionary with the same keys as accounts, but in which

  • kind: analytics#webproperties
  • items: list of web properties for this account

Again, each web property is a dictionary that contains the keys:

  • id: the web property id
  • kind: analytics#webproperty

and many more (you can print the webproperties object to see its keys).

You’ll see that the web property id is the tracking code of this property, which you can also obtain in the Google Analytics Admin tab.

But there is another level! Inside each Property there can be multiple views! You can obtain a list of views (or profiles) of each web property with:

profiles = service.management().profiles().list(
    accountId=firstAccountId,
    webPropertyId=firstWebpropertyId).execute()

The profiles dictionary contains the same keys as accounts and webproperties, but with

  • kind: analytics#profiles
  • items: list of profiles for this account and web property

and each profile has:

  • id: the profile id
  • kind: analytics#profile
  • name: the profile name

Get the number of Sessions of a Google Analytics View

Now that we know how to get information about our accounts, properties and views, let’s obtain the number of sessions of a view during a period of time.

Create the file get_sessions.py and write:

from analytics_service_object import initialize_service


def get_sessions(service, profile_id, start_date, end_date):
    ids = "ga:" + profile_id
    metrics = "ga:sessions"
    data = service.data().ga().get(
        ids=ids, start_date=start_date, end_date=end_date, metrics=metrics
        ).execute()
    return data["totalsForAllResults"][metrics]


if __name__ == '__main__':
    service = initialize_service()
    profile_id = "your_profile_id"
    print get_sessions(service, profile_id, "2014-09-01", "2014-09-30")

Note: you have to add your view id in “your_profile_id”, and then, run this script with:

$ python get_sessions.py

Check all the functionalities of the service.data().ga().get() method, and retrieve all the data you want form your view!

Get the number of Sessions for each traffic source

Obtaining the number of sessions for each traffic source (i.e. organic, referral, social, direct, email and other) is a little bit trickier. You have to work with filters in order to segment your data.

Here’s a little script that does this, thanks to Michael for the update 🙂

from analytics_service_object import initialize_service

def get_source_group(service, profile_id, start_date, end_date):
    ids = "ga:" + profile_id
    metrics = "ga:sessions"
    dimensions = "ga:channelGrouping"
    data = service.data().ga().get(
        ids=ids, start_date=start_date, end_date=end_date, metrics=metrics,
        dimensions=dimensions).execute()
    return dict(
        data["rows"] + [["total", data["totalsForAllResults"][metrics]]])


if __name__ == '__main__':
    service = initialize_service()
    profile_id = "your_profile_id"
    start_date = "2014-09-01"
    end_date = "2014-09-30"
    data = get_source_group(service, profile_id, start_date, end_date)
    for key, value in data.iteritems():
        print key, value

Again, add your view’s id in “your_profile_id”, and change the start_date and end_date to match the time interval you want.

After running this script, you’ll see the desired information in your terminal.

Another solution to get the number of sessions by traffic source, less optimized but instructive, is to use filter instead of dimensions:

from analytics_service_object import initialize_service


not_source_filters = {
    "social": "ga:hasSocialSourceReferral==No",
    "organic": "ga:medium!=organic",
    "direct":  "ga:source!=(direct),ga:medium!=(none);ga:medium!=(not set)",
    "email": "ga:medium!=email",
    "referral": "ga:medium!=referral,ga:hasSocialSourceReferral!=No"
}

source_filters = {
    "social": "ga:hasSocialSourceReferral==Yes",
    "organic": "ga:medium==organic",
    "direct":  "ga:source==(direct);ga:medium==(none),ga:medium==(not set)",
    "email": "ga:medium==email",
    "referral": "ga:medium==referral;ga:hasSocialSourceReferral==No",
    "other": "%s;%s;%s;%s;%s" % (
        not_source_filters["social"], not_source_filters["organic"],
        not_source_filters["direct"], not_source_filters["email"],
        not_source_filters["referral"])
}


def get_source_sessions(service, profile_id, start_date, end_date, source):
    ids = "ga:" + profile_id
    metrics = "ga:sessions"
    filters = source_filters[source]
    data = service.data().ga().get(
        ids=ids, start_date=start_date, end_date=end_date, metrics=metrics,
        filters=filters).execute()
    return data["totalsForAllResults"][metrics]


if __name__ == '__main__':
    service = initialize_service()
    profile_id = "your_profile_id"
    start_date = "2014-09-01"
    end_date = "2014-09-30"
    for source in ["social", "organic", "direct", "email", "referral", "other"]:
        print source, get_source_sessions(
            service, profile_id, start_date, end_date, source)

Again, add your view’s id in “your_profile_id”, and change the start_date and end_date to match the time interval you want.

After running this script, you’ll see the desired information in your terminal.

Some information you may find useful when working with filters:

  • , means OR
  • ; means AND
  • == means exact match
  • != means does not match
  • =@ means contains substring
  • !@ means does not contain substring
  • learn more in the Google Reference Guide

That’s all for today! 🙂

Please, +1 if was useful and share it with your friends! Thaaanks!

Google+TwitterLinkedInFacebookReddit

The post Use the Google Analytics API with Python appeared first on Marina Mele's site.

❌