Categories
Tips

11 Tips And Tricks To Write Better Python Code

Here are 11 tips and tricks that will help you write better Python code and become a better programmer:

1. Iterate with enumerate instead or range(len(x))

In Python, we generally use a for loop to iterate over an iterable object. A for loop in Python uses collection based iteration i.e. Python assigns the next item from an iterable to the loop variable on every iteration. The usual usecase of a for loop is as follows:

values = ["a", "b", "c"]

for value in values:
  print(value)

# a
# b
# c

Now, if in addition to the value, you want to print the index as well, you can do it like this:

index = 0

for value in values:
  print(index, value)
  index += 1

# 0 a
# 1 b
# 2 c

or another common way to do this is by using range(len(x)):

for index in range(len(values)):
  value = values[index]
  print(index, value)

# 0 a
# 1 b
# 2 c

However, there is an easier and more pythonic way to iterate over iterable objects by using enumerate(). It is used in a for loop almost the same way as you use the usual way, but instead of putting the iterable object directly after in in the for loop, or putting using it as range(len(values)) , you put it inside the parentheses of enumerate() as shown below:

for count, value in enumerate(values):
  print(count, value)

# 0 a
# 1 b
# 2 c

We can also define a start argument for enumerate() as shown below :

for count, value in enumerate(values, start=1):
  print(count, value)

# 1 a
# 2 b
# 3 c

The enumerate() function gives back two variables:

  • the count of the current iteration
  • the value of the item at the current iteration

Just like the loop variables in a for loop, the loop variables can be named anything, for instance, we can call then index and value and they’ll still work. enumerate() is more efficient than a for loop as it saves you from the hassle to remember to access the value inside the loop and use it correctly and then also remember to advance the value of the loop variable, it is all handled automatically by Python.

2. Use list comprehension instead of raw for loops

List comprehension is an easier and elegant way to define an create lists based on the existing lists. They are just a single line of code consisting of brackets containing the expression that is repeatedly executed at each iteration. Hence, they are more time and space efficient than loops and transform iterative statements in a single line of code.

The usual syntax of a list comprehension looks like this:

newList = [ expression(element) for element in oldList if condition ] 

Here’s an example of list comprehension in code:

# Using list comprehension to iterate through loop
List = [character for character in 'HackerNoon']
 
# Displaying list
print(List)

# Output
# ['H', 'a', 'c', 'k', 'e', 'r', 'N', 'o', 'o', 'n']

3. Sort complex iterables with sorted()

The Python sorted() function sorts the elements of an iterable object in a specific order (ascending or descending) and returns them as a sorted list. It can be used to sort a sequence (string, tuple, list) or collection (set, dictionary, frozen set) or any other iterator.

The syntax of the sorted() function is as follows:

sorted(iterable, key=None, reverse=False)

sorted() function takes at max three parameters:

  • iterable: It could be any iterator
  • key: It is an optional argument that serves as a key for sort comparison.
  • reverse: It is also an optional argument that is used to specify a reversed sorted list as the output

4. Store unique values with Sets

A Python Set stores a single copy of the duplicate values into it. Hence, it can be used to check for unique values in a list. For example:

list_inp = [100, 75, 100, 20, 75, 12, 75, 25] 

set_res = set(list_inp) 
print("The unique elements of the input list using set():\n") 
list_res = (list(set_res))
 
for item in list_res: 
    print(item)

So the output of the above program would look like this:

The unique elements of the input list using set():

25
75
100
20
12

5. Save memory with Generators

The basic function of the generator is to evaluate the elements on demand. It is very similar to the syntax for list comprehension, where instead of square brackets, we use parentheses.

Let’s consider an example where we want to print the square of all the even numbers in a list:

myList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print("The given list is:", myList)
mygen = (element ** 2 for element in myList if element % 2 == 0)
print("Elements obtained from the generator are:")
for ele in mygen:
    print(ele)

The output of the above code would look like this:

The given list is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Elements obtained from the generator are:
4
16
36
64
100

Having said that their syntax is quite similar to list comprehension, you must be wondering how it is different from list or set comprehension. Unlike list or set comprehension, generator comprehension does not initialize any objects. As a result, you may utilize generator comprehension instead of list or set comprehension to lower the program’s memory requirements.

6. Define default values in Dictionaries with .get() and .setdefault()

.setdefault() method allows to set dict[key]=default if key is not already in dict.

The syntax of .setdefault() looks like following:

dict.setdefault(key, default=None)

Here’s an example code snippet to understand how to use .setdefault():

a_dictionary = {"a": 1, "b": 2, "d": 4}
a_dictionary.setdefault("c", 3)

print(a_dictionary)

The output of the above code would look like:

{'a': 1, 'b': 2, 'd': 4, 'c': 3}

The same thing can also be achieved by using .get() method by passing a default value for the key, as you can see below:

a_dictionary = {"a": 1, "b": 2, "d": 4}
print(a_dictionary.get("c", 3))

print(a_dictionary)

The output of the above code would look like following:

3
{'a': 1, 'b': 2, 'd': 4}

7. Count hashable objects with collections.Counter

The Collections module supports high-performance container datatypes (in addition to the built-in types list, dict, and tuple) and contains a variety of useful data structures for storing information in memory.

A counter is a container that keeps track of the number of times equal values are added.

It may be used to implement the same algorithms that other languages employ bag or multiset data structures to implement.

Import collections makes the stuff in collections available as:

import collections

Since we are only going to use the Counter, we can simply do this:

from collections import Counter

It can be used as follows:

import collections

c = collections.Counter('abcdaab')

for letter in 'abcde':
    print '%s : %d' % (letter, c[letter])

The output of the above code would look like this:

a : 3
b : 2
c : 1
d : 1
e : 0

8. Format strings with f-Strings (Python 3.6+)

f-strings, also called as “formatted string literals“, are a new and more pythonic way to format strings, supported by Python 3.6+. They are a faster, more readable, more concise, and a less error prone way of string formatting in Python.

As the name “f-string” says, they are string literals that have an f at the beginning and curly braces containing expressions that will be replaced with their values at the runtime and then formatted using the __format__ protocol.

f-strings can be used as following:

name = "Eric"
age = 74
print(f"Hello, {name}. You are {age}.")

# 'Hello, Eric. You are 74.'

9. Concatenate strings with .join()

In Python, we can use the .join() method to concatenate a list of strings into a single string. The usual syntax for this method looks like below:

'String to insert'.join([List of strings])

It can be used in multiple ways — if you use the empty string ““, [List of strings] is simply concatenated, and if you use a comma, a comma-delimited string is created. When the newline character \n is used, a newline is appended after each string. See the example below:

l = ['aaa', 'bbb', 'ccc']

s = ''.join(l)
print(s)
# aaabbbccc

s = ','.join(l)
print(s)
# aaa,bbb,ccc

s = '-'.join(l)
print(s)
# aaa-bbb-ccc

s = '\n'.join(l)
print(s)
# aaa
# bbb
# ccc

10. Merge dictionaries with {**d1, **d2} (Python 3.5+)

The easiest way to merge dictionaries is by using the unpacking operator (**). The syntax for this method looks like this:

{**dict1, **dict2, **dict3}

Here’s an example to understand this method better:

d1 = {'k1': 1, 'k2': 2}
d2 = {'k3': 3, 'k4': 4}

print({**d1, **d2})
# {'k1': 1, 'k2': 2, 'k3': 3, 'k4': 4}

11. Simplify if-statements with if x in list

Assume we have a list with the primary colours red, green, and blue. And somewhere in our code, we have a new variable with a colour, so c = red. Then we’ll see if this is one of our primary colours. Of course, we might check this against each item on our list as follows:

colors = ["red", "green", "blue"]

c = "red"

# cumbersome and error-prone
if c == "red" or c == "green" or c == "blue":
    print("is main color")

However, this may become quite time consuming, and we can easily make mistakes, such as if we have a typo here for red. It is more simpler and far preferable to just use the expression if x in list:

colors = ["red", "green", "blue"]

c = "red"

# better:
if c in colors:
    print("is main color")

Conclusion

Python is a widely used programming language and by using the above tips and tricks, you can become a better Python programmer.

I hope this article was helpful. Keep reading!

Categories
Tips

Git Internals Part 3: Understanding the staging area in Git

Software development is a messy and intensive process, which in theory, should be a linear, cumulative construction of functionalities and improvements in code, but is rather more complex. More often than not it is a series of intertwined, non-linear threads of complex code, partly finished features, old legacy methods, collections of TODO comments, and other things common to any human-driven and a largely hand-crafted process known to mankind.

Git was built to make our lives easier when dealing with this messy and complex approach to software development. Git made it possible to work effortlessly on many features at once and decide what you want to stage and commit to the repository. The staging area in Git is the main working area, but most of the developers know only a little about it.

In this article, we will be discussing the staging area in Git and how it is a fundamental part of version control and can be used effectively to make version control easier and uncomplicated.

What is Staging area?

To understand what is staging area is, let’s take a real-world example – suppose that you are moving to another place, and you have to pack your stuff into boxes and you wouldn’t want to mix the items meant for the bathroom, kitchen, bedroom, and the living room in the same box. So, you will take a box and start putting stuff into it, and if doesn’t make sense, you can also remove it before finally packing the box and labeling it.

Here, in this example, the box serves as the staging area, where you are doing the work (crafting your commit), whereas when you are done, then you are packing it and labeling it (committing the code).

In technical terms, the staging area is the middle ground between what you have done to your files (also known as the working directory) and what you had last committed (the HEAD commit). As the name implies, the staging area gives you space to prepare (stage) the changes that will be reflected on the next commit. This surely adds up some complexity to the process, but it also adds more flexibility to selectively prepare the commits as they can be modified several times in the staging area before committing.

Assume you’re working on two files, but only one is ready to commit. You don’t want to be forced to commit both files, but only the one that is ready. This is where Git’s staging area comes in handy. We place files in a staging area before committing what has been staged. Even the deletion of a file must be recorded in Git’s history, therefore deleted files must be staged before being committed.

What are git commands for the staging area?

git add

The command used to stage any change in Git is git add. The git add command adds a modification to the staging area from the working directory. It informs Git that you wish to include changes to a specific file in the next commit. However, git add has little effect on the repository—changes are not truly recorded until you execute git commit.

The common options available along with this command are as follows:

You can specify a <file> from which all changes will be staged. The syntax would be as follows:

git add <file>

Similarly, you can specify a <directory> for the next commit:

git add <directory>

You can also use a . to add all the changes from the present directory, such as the following:

git add .

git status

git status command is used to check the status of the files (untracked, modified, or deleted) in the present branch. It can be simply used as follows:

git status

git reset

In case, you have accidentally staged a file or directory and want to undo it or unstage it, then you can use git reset command. It can be used as follows:

git reset HEAD example.html

git rm

If you remove files, they will appear as deleted in git status, and you must use git add to stage them. Another option is to use the git rm command, which deletes and stages files in a single command:

To remove a file (and stage it)

git rm example.html

To remove a folder (and stage it)

git rm -r myfolder 

git commit

The git commit command saves a snapshot of the current staged changes in the project. Committed snapshots are “secure” versions of a project that Git will never alter unless you specifically ask it to.

Git may be considered a timeline management utility at a high level. Commits are the fundamental building blocks of a Git project timeline. Commits may be thought of as snapshots or milestones along a Git project’s history. Commits are produced with the git commit command to record the current status of a project.

Git Snapshots are never committed to the remote repository. As the staging area serves as a wall between the working directory and the project history, each developer’s local repository serves as a wall between their contributions and the central repository.

The most common syntax followed to create a commit in git is as follows:

git commit -m "commit message"

The above commands and their functionalities can be summed up simply in the following image:

git commit -m commit message

Conclusion

To summarize, git add is the first command in a series of commands that instructs Git to “store” a snapshot of the current project state into the commit history. When used alone, git add moves pending changes from the working directory to the staging area. The git status command examines the repository’s current state and can be used to confirm a git add promotion. To undo a git add, use the git reset command. The git commit command is then used to add a snapshot of the staging directory to the commit history of the repository.

This is all for this article, we will discuss more Git Internals in the next article. Do let me know if you have any feedback or suggestions for this series. 

If you want to read what we discussed in the earlier instalments of the series, you can find them below.

Git Internals Part 1- List of basic Concepts That Power your .git Directory here

Git Internals Part 2: How does Git store your data? here

Keep reading!

Categories
Tips

Git Internals Part 2: How does Git store your data?

In this article, we’ll be learning about the basics of the data storage mechanism for git. 

The most fundamental term we know regarding git and data storage is repositories. Let’s first understand what a git repository is and where it stands in terms of data storage in git.

Are you ready to influence the tech landscape? Take part in the Developer Nation Survey and be a catalyst for change. Your thoughts matter, and you could be the lucky recipient of our weekly swag and prizes! Start Here

Repositories

A git repository can be seen as a database containing all the information needed to retain and manage the revisions and history of a project. In git, repositories are used to retain a complete copy of the entire project throughout its lifetime. 

Git maintains a set of configuration values within each repository such as the repository user’s name and email address. Unlike the file data or other repository metadata, configuration settings are not propagated from one repository to another during a clone, or fork, or any other duplication operation. Instead of this, git manages and stores configuration settings on a per-site, per-user, and per-repository basis.

Inside a git repository, there are two data structures – the object store and the index. All of this repository data is stored at the root of your working directory inside a hidden folder named .git. You can read more about what’s inside your .git folder here.

As part of the system that allows a fully distributed VCS, the object store is intended to be effectively replicated during a cloning process. The index is temporary data that is private to a repository and may be produced or edited as needed.

Let’s discuss object storage and index in further depth in the next section.

Git Object Types

Object store lies at the heart of the git’s data storage mechanism. It contains your original data files, all the log messages, author information, and other information required to rebuild any version or branch of the project.

Git places the following 4 types of objects in its object store which form the foundation of git’s higher-level data structures:

  1. blobs
  2. trees
  3. commits
  4. tags

Let’s look a bit more about these object types:

Blobs

A blob represents each version of a file. “Blob” is an abbreviation for “binary big object,” a phrase used in computers to refer to a variable or file that may contain any data and whose underlying structure is disregarded by the application.

A blob is considered opaque it contains the data of a file but no metadata or even the file’s name.

Trees

A tree object represents a single level of directory data. It saves blob IDs, pathnames, and some metadata for all files in a directory. It may also recursively reference other (sub)tree objects, allowing it to construct a whole hierarchy of files and subdirectories.

Commits

Each change made into the repository is represented by a commit object, which contains metadata such as the author, commit date, and log message. 

Each commit links to a tree object that records the state of the repository at the moment the commit was executed in a single full snapshot. The initial commit, also known as the root commit, has no parents and the following most of the commits have single parents.

A Directed Acyclic Graph is used to arrange commits. For those who missed it in Data Structures, it simply implies that commits “flow” in one way. This is usually just the trail of history for your repository, which might be very basic or rather complicated if you have branches.

Tags

A tag object gives a given object, generally a commit, an arbitrary but presumably human-readable name such as Ver-1.0-Alpha.

All of the information in the object store evolves and changes over time, monitoring and modeling your project’s updates, additions, and deletions. Git compresses and saves items in pack files, which are also stored in the object store, to make better use of disc space and network traffic.

Index

The index is a transient and dynamic binary file that describes the whole repository’s directory structure. More specifically, the index captures a version of the general structure of the project at some point in time. The state of the project might be represented by a commit and a tree at any point in its history, or it could be a future state toward which you are actively building.

One of the primary characteristics of Git is the ability to change the contents of the index in logical, well-defined phases. The indicator distinguishes between gradual development stages and committal of such improvements.

How does git monitor object history?

The Git object store is organized and implemented as a storage system with content addresses. Specifically, each item in the object store has a unique name that is generated by applying SHA1 to the object’s contents, returning a SHA1 hash value.

Because the whole contents of an object contribute to the hash value, and because the hash value is thought to be functionally unique to that specific content, the SHA1 hash is a suitable index or identifier for that item in the object database. Any little modification to a file causes the SHA1 hash to change, resulting in the new version of the file being indexed separately.

For monitoring history, Git keeps only the contents of the file, not the differences between separate files for each modification. The contents are then referenced by a 40-character SHA1 hash of the contents, which ensures that it is almost certainly unique.

The fact that the SHA1 hash algorithm always computes the same ID for identical material, regardless of where that content resides, is a significant feature. In other words, the same file content in multiple folders or even on separate machines produces the same SHA1 hash ID. As a result, a file’s SHA1 hash ID is a globally unique identifier.

Every object has an SHA, whether it’s a commit, tree, or blob, so get to know them. Fortunately, they are easily identified by the first seven characters, which are generally enough to identify the entire string.

One fantastic benefit of saving only the content is that if you have two or more copies of the same file in your repository, Git will only save one internally.

Conclusion

In this article, we learned about the two primary data structures used by git to enable data storage, management, and tracking history. We also discussed the 4 types of object types and the different roles played by them in git’s data storage mechanism. 

This was all for this article, I hope you find it helpful. These are the fundamental components of Git as we know it today and use on a regular basis. We’ll be learning more about these Git internal concepts in the upcoming articles.
Keep reading. In case you want to connect with me, follow the links below:

LinkedIn | GitHub | Twitter | Dev

Categories
Tips

Git Internals Part 1- List of basic Concepts That Power your .git Directory

Git is the most popular and commonly used open-source version control system in the modern-day. However, we barely focus on the basic concepts that are the building blocks of this system. 

In this article, we will learn about the basic concepts that power your .git directory.

The .git directory

Whenever we initialize a git repository, a .git directory gets created in the project’s root. This is the place where Git stores all its information. Digging a bit deeper you can see the directory structure as below:

$ ls -C .git
COMMIT_EDITMSG  MERGE_RR    config      hooks       info        objects     rr-cache
HEAD        ORIG_HEAD   description index       logs        refs

The detailed structure looks like the following:
.
|-- COMMIT_EDITMSG
|-- FETCH_HEAD
|-- HEAD
|-- ORIG_HEAD
|-- branches
|-- config
|-- description
|-- hooks
|   |-- applypatch-msg
|   |-- commit-msg
|   |-- post-commit
|   |-- post-receive
|   |-- post-update
|   |-- pre-applypatch
|   |-- pre-commit
|   |-- pre-rebase
|   |-- prepare-commit-msg
|   `-- update
|-- index
|-- info
|   `-- exclude
|-- logs
|   |-- HEAD
|   `-- refs
|-- objects
`-- refs
    |-- heads
    |-- remotes
    |-- stash
    `-- tags

Directories inside the .git directory

The .git directory consists of the following directories:

hooks:
This directory contains scripts that are executed at certain times when working with Git, such as after a commit or before a rebase.

info:
You can use this file to ignore files for this project, however, it’s not versioned like a .gitignore file would be.

logs:
Contains the history of different branches. It is most commonly used with the git reflog command.

objects:
Git’s internal warehouse of blobs, all indexed by SHAs. You can see them as following:

$ ls -C .git/objects
09  24  28  45  59  6a  77  80  8c  97  af  c4  e7  info
11  27  43  56  69  6b  78  84  91  9c  b5  e4  fa  pack

These directory names are the first two letters of the SHA1 hash of the objects stored in git.

You can enquire a little further as following:

$ ls -C .git/objects/09
6b74c56bfc6b40e754fc0725b8c70b2038b91e  9fb6f9d3a104feb32fcac22354c4d0e8a182c1

These 38 character strings are the names of the files that contain objects stored in git. They are compressed and encrypted, so it’s impossible to view their contents directly. 

rebase-apply: 

The workbench for git rebase. It contains all the information related to the changes that have to be rebased.

refs:

The master copy of all refs that live in your repository, be they for stashes, tags, remote-tracking branches, or local branches. 

You can see the existing refs in your .git directory as below:

$ ls .git/refs
heads
tags
$ ls .git/refs/heads
master
$ ls .git/refs/tags
v1
v1-beta
$ cat .git/refs/tags/v1
fa3c1411aa09441695a9e645d4371e8d749da1dc

Now, having discussed the directories inside the .git directory, let’s explore the files that reside inside the .git directory and their uses.

Files in the .git directory

  1. COMMIT_EDITMSG:

This file contains the commit message of a commit in progress or the last commit. Any commit message provided by the user (e.g., in an editor session) will be available in this file. 

If the git commit exits due to an error before generating a commit, it will be overwritten by the next invocation of git commit.

It’s there for your reference once you have made the commit and is not actually used by Git.

2. config:

This configuration file contains the settings for this repository. Project-specific configuration variables can be dumped in here including aliases. 

$ cat .git/config
[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
    ignorecase = true
[user]
    name = Pragati Verma
    email = pragati.verma@gmail.com

This file is mostly used to define where the remote repository lives and some core settings, such as if your repository is bare or not.

3. description:

This description will appear when you see your repository or the list of all versioned repositories available while using Git web interfaces like gitweb or instaweb.

4. FETCH_HEAD:

FETCH_HEAD is a temporary ref that keeps track of what has recently been fetched from a remote repository. 

In most circumstances, git fetch is used first, which fetches a branch from the remote; FETCH_HEAD points to the branch’s tip (it stores the SHA1 of the commit, just as branches do). After that, git merge is used to merge FETCH_HEAD into the current branch.

5. HEAD:

HEAD is a symbolic reference pointing to wherever you are in your commit history. It’s the current ref that you’re looking at. 

HEAD can point to a commit, however, typically it points to a branch reference. It is attached to that branch, and when you do certain things (e.g., commit or reset), the attached branch will move along with HEAD. In most cases, it’s probably refs/heads/master. You can check it as follows:

$ cat .git/HEAD
ref: refs/heads/master

6. ORIG_HEAD:

When doing a merge, this is the SHA of the branch you’re merging into.

7. MERGE_HEAD:

When doing a merge, this is the SHA of the branch you’re merging from.

8. MERGE_MODE:

Used to communicate constraints that were originally given to git merge to git commit when merge conflicts and a separate git commit is needed to conclude it.

9. MERGE_MSG:

Enumerates conflicts that happen during your current merge.

10. index:

Git index refers to the “staging area” between the files you have on your filesystem and your commit history with meta-data such as timestamps, file names, and also SHAs of the files that are already wrapped up by Git. 

The files in your working directory are hashed and stored as objects in the index when you execute git add, making them “staged changes.”

11. packed-refs:

It solves the storage and performance issues by keeping the refs in a single file. When a ref is missing from the /refs directory hierarchy, it is searched for in this file and used if it is found.

Conclusion

In this article, we covered a brief overview of the basic concepts that make up your git directory. These are the fundamental components of Git as we know it today and use on a regular basis. We’ll be learning more about these Git internal concepts in the upcoming articles.

Keep reading. In case you want to connect with me, follow the links below:

LinkedIn | GitHub | Twitter | Dev

Bio 

Pragati Verma is a software developer and open-source enthusiast. She has also been an active writer on various platforms and has written for many organizations as a freelance writer. As a Junior Editor at Hackernoon, Pragati helps numerous writers every day to publish their content on Hackernoon.

In her spare time, Pragati loves to read books or watch movies.