Kalanand's November 2016 Log

October 2016

December 2016

November 1st

Gradient descent vs stochastic gradient descent

Apparently, in gradient descent (GD) you need to run through ALL entries in the training data to do single parameter update in a particular iteration. In stochastic GD, however, you can use ONE training entry to update a parameter in a particular iteration. SGD often converges much faster compared to GD but the error function is not as well minimized (local minima, etc.).

See this Quora thread for some discussion on this topic.

November 3rd

Adding line number to a file

Often I want to use the line number in a data file for indexing. For this I first need to save the line number value in each line as a separate field. Actually the very first field in the line. Here is one way to do this.

awk '{printf "%s,%s\n",NR-1,$0}' myfile.csv > linenum_myfile.csv

Saving unique values from a pandas dataframe clolumn into a list − based on some other column being (non-)null

I have a CSV file containing the following fields: 'first_name', 'last_name', 'street_addr', 'zip_code', 'comment'. I want to save the list of all unique zip codes as separate lines into a text file − but, only if the comment field is empty. Here is what I ended up doing.

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.read_csv('name_address.csv')
>>> df1 = df[pd.isnull(df.comment)]
>>> df1.shape
(100, 5)
>>> my_list = df1['zip_code'].tolist()
>>> sorted_uniq = sorted(list(set(my_list)))
>>> len(sorted_uniq)
1391
>>> import sys
>>> sys.stdout = open('zip_codes.text', 'w')
>>> print '\n'.join(sorted_uniq)
>>> ^D

Go to October's log

Last modified: Tue Nov 3 15:15:28 PDT 2016