Kalanand's November 2016 Log
November 1st
Gradient descent vs stochastic gradient descent
Apparently, in gradient descent (GD) you need to run through ALL entries in the
training data to do single parameter update in a particular iteration.
In stochastic GD, however, you can use ONE training entry to update a parameter
in a particular iteration. SGD often converges much faster compared to GD but
the error function is not as well minimized (local minima, etc.).
See this Quora thread for some discussion on this topic.
November 3rd
Adding line number to a file
Often I want to use the line number in a data file for indexing.
For this I first need to save the line number value in each line as a separate field.
Actually the very first field in the line. Here is one way to do this.
awk '{printf "%s,%s\n",NR-1,$0}' myfile.csv > linenum_myfile.csv
Saving unique values from a pandas dataframe clolumn into a list − based on some other column being (non-)null
I have a CSV file containing the following fields: 'first_name', 'last_name', 'street_addr', 'zip_code', 'comment'.
I want to save the list of all unique zip codes as separate lines into a text file − but, only if the comment field is empty.
Here is what I ended up doing.
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.read_csv('name_address.csv')
>>> df1 = df[pd.isnull(df.comment)]
>>> df1.shape
(100, 5)
>>> my_list = df1['zip_code'].tolist()
>>> sorted_uniq = sorted(list(set(my_list)))
>>> len(sorted_uniq)
1391
>>> import sys
>>> sys.stdout = open('zip_codes.text', 'w')
>>> print '\n'.join(sorted_uniq)
>>> ^D
Go to October's log
Last modified: Tue Nov 3 15:15:28 PDT 2016