Kalanand's May 2014 Log

   April 2014   
May 2014
SuMoTuWeThFrSa
    123
45678910
11121314151617
18192021222324
25262728293031
   June 2014   

May 5th

Remove redundant parentheses

I recently encountered an interesting practical problem. I have several mathematical expressions like (a + (b*c)) * (d * ( f * j) )
in a large file and I want to remove the redundant parentheses. So, the desired output in this particular instance should be: (a + b * c) *d * f * j .
How can we do this in an automated way ?


The easiest way I can think of is the commandline sed utility:
echo "(a + (b*c)) * (d * ( f * j) )" | sed 's/[ ]//g' | sed 's/([a-z]\*[a-z])/{&}/g' | sed 's/{(//g; s/)}//g' | sed 's/([a-z]\*[a-z]\*[a-z])/{&}/g' | sed 's/{(//g; s/)}//g'
Output: (a+b*c)*d*f*j

I suspect there is a simpler way, but it took me five successive invocations of sed command to get the desired output.
The first invocation gets rid of all the blank spaces.
echo "(a + (b*c)) * (d * ( f * j) )" | sed 's/[ ]//g'

(a+(b*c))*(d*(f*j))
The second invocation puts curly brackets around expressions like (?*?).
echo "(a + (b*c)) * (d * ( f * j) )" | sed 's/[ ]//g' | sed 's/([a-z]\*[a-z])/{&}/g'

(a+{(b*c)})*(d*{(f*j)})
The third invocation removes the combination of parentheses and curly brackets around ?*?.
echo "(a + (b*c)) * (d * ( f * j) )" | sed 's/[ ]//g' | sed 's/([a-z]\*[a-z])/{&}/g' | sed 's/{(//g; s/)}//g'

(a+b*c)*(d*f*j)
We are still left with expressions like ?*?*?. So we need to repeat the previous two steps, and then we are done. Clearly, the above solution is not scalable if we also have curly and angular brackets in the mix. In that case we do need to use a regular programming language. A pair of parentheses (or brackets) is necessary if and only if they enclose an unparenthesized expression of the form X % X % ... % X, where X are either parenthesized expressions or atoms, and % are binary operators, and if at least one of the operators % has lower precedence than an operator attached directly to the parenthesized expression on either side of it.

So, we can check any pair of matching parentheses directly for redundancy by the following algorithm: Here is a python code to implement this:
#!/usr/bin/python

def compare(operator1, operator2): 
    list1 = ['*', '/', '%']
    list2 = ['+', '-', '(', ')', '{', '}', '[', ']']
    priority = 0
    if operator1 in list1 and operator2 in list1: priority = 0
    if operator1 in list1 and operator2 in list2: priority = 1
    if operator1 in list2 and operator2 in list1: priority = -1
    if operator1 in list2 and operator2 in list2: priority = 0
    return priority


def findLowestPriorityOperator(operators):
    lowest = operators[0]
    for operator in operators:
        if compare(operator, lowest) < 0:  lowest = operator
    return lowest


def getOperators(string, start, end):
    operators = []
    symbols = ['+', '-', '*', '/', '%', '(', ')', '{', '}', '[', ']']
    for symbol in symbols:
        if(symbol in string): operators.extend(symbol)
    return operators



def removeRedundantParentheses(str, symbol1, symbol2):

    str = str.replace(' ', '')  # get rid of white space 
    if not symbol1 in str or not symbol2 in str: 
        return str # do nothing

    size = len(str)
    open    = str.rfind(symbol1)
    close = str.find(symbol2, open, size)

    # First check the operators outside the parentheses
    outer1 = '', 
    if open != 0: outer1 = str[open-1]
    outer2 = ''
    if close != size-1: outer2 = str[close+1]

    # Now get the list of operators inside the parentheses
    ref = getOperators(str, open, close) 
    lowest = findLowestPriorityOperator(ref) # find lowest priority 

    # Check if redundant
    isRedundant = False
    if  outer1 == '' and outer2 == '': isRedundant = True
    elif compare(lowest, outer1)>=0 and compare(lowest, outer2)>=0: 
        isRedundant = True

    # If redundant, we need to check recursively 
    if not isRedundant: return str
    str = str[0:open] + str[open+1: close] + str[close+1: size]
    str = removeRedundantParentheses(str, symbol1, symbol2)
    return str


def removeRedundant(str):
    str = removeRedundantParentheses(str, '[', ']')
    str = removeRedundantParentheses(str, '{', '}')
    str = removeRedundantParentheses(str, '(', ')')
    return str


# Try with a few complicated expressions
print removeRedundant('a * ( { [( b+ c )] } )')
print removeRedundant('(a+(((b-c)))*d)')
print removeRedundant('((a+b))*c')
print removeRedundant('((a+b)/(c+d))')
print removeRedundant('(a+(((b-c)))*d)')
The output is
a*(b+c)
(a+(b-c)*d)
(a+b)*c
((a+b)/(c+d))
(a+(b-c)*d)

Go to April's log


Last modified: Wed May 7 19:00:01 CST 2014