Purfe

Learn, create and have fun with Python

Find longest substring in alphabetical order with Python

Let’s find the longest substring in alphabetical order using Python!

Problem.
Given s is a string of letters containing no whitespaces, numbers or special characters, we need to write a program that prints the longest substring of s in which letters occur in alphabetical order. In the case of ties, output should be the first substring. Ignore the case.
Example: if s = “abcujawdeFgghgk” the output should be “deFggh”
TL;DR Jump to code

Step 1. What do we know?
We have a string as an input, we need to iterate through it checking for some conditions and we need to produce a string as an output.
We are going to define a variable *result* which will hold our final longest string in alphabetical order and assign it to empty string: result = ""

We also can write the return statement: return result
Additionally we can check if an input string is empty. In that case we do not need to do anything just return the string:
if s == "":
<code style="margin-left: 40px">return s

If the string contains any characters we assign the first character of s to temporary_result variable to prepare for comparison:
else:
<code style="margin-left: 40px">temporary_result = s[0]

Step 2. String normalisation.
Almost always when working with textual data we need to normalise it (find and remove all unwanted characters and whitespaces, unify case and so on).
As we are explicitly have been told the input is a string of letters, we do not need to check it for other characters. But what we do not know is the case of characters in a string. Is it upper, lower or both? So at this point, we might need to transform characters in the string to be comparable with each other. This is done by calling .lower() method on string.

Step 3. Looping with condition.
What else?
We know that strings in Python are lists (arrays) of bytes representing Unicode characters.

For introduction into Unicode, I highly recommend Joel Spolsky’s timeless (I assume) and hilarious (of that I am sure) The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) and Ned Batchelder’s specific to Python (2 and 3) Pragmatic Unicode presentation as post, YouTube video or presentation. These posts in a fun and comprehensive way explain Unicode and string encoding.

We need to compare successively each character of the string with the next. We can do it using a comparison operator (==) within a loop.
For our purpose iterating with list index looks like a logical choice.
With range() in length of string we can easily compare each character of the string with the next. To ensure we do not exceed the string boundaries, we reduce the range by 1 => range(len(s)-1)) (As we will be comparing each char (s[i]) with the next (s[i+1]).

Step 3a. Handle result
If characters appear in alphabetical order concatenate them to our result variable.
The last thing we need is another variable to store value to compare with. Let’s call it temporary_result

By now the program is doing the following:
– It checks and handles margin case of empty string
– It assigns temporary_result to the first character
– For every character in a string it compares its lowercase version with the next, and
– If the next char is greater it adds this character to temporary_result variable
– Else it compares the length of *result* with temporary_result
– If temporary_result length is greater, it assigns this value to result
– If they are equal the program does nothing (so in case of ties the output will be the first substring)
– In the else block it updates temporary_result to s[i+1] (to start next iteration of comparison)

def longest_alpha_substring(s):
    result = ''
    temporary_result = ''
    if s == '':
        return s
    else:
        temporary_result = s[0]
    for i in range(len(s)-1):
        if s[i].lower()<=s[i+1].lower():
            temporary_result+=s[i+1]
        else:
            if len(temporary_result)>len(result):
                result = temporary_result
            temporary_result = s[i+1]        
    if len(result)==0 or len(temporary_result)>len(result):
        result=temporary_result
    return result

We are now ready to test our function.
Step 4. Write tests
First, we need to check for boundary cases that is:
– all characters are in alphabetical order (s= ‘abcdefghijklMnopqrStuvwxyz’)
– no characters are in alphabetical order (s= ‘zyxwvu’)
– empty string (s=”)
– ties (s = ‘abcnop’)

For each case we need to define the correct output.
We can use a dictionary to pair string and the correct answer and check the output of the function against it.
In code it looks like this:

def test_in_alpha_order():
    test_dict = {'abcdefghijklMnopqrStuvwxyz': 'abcdefghijklMnopqrStuvwxyz', 
                    'x': 'x',
                    'acbABcam': 'ABc',
                    'zyxjKklOppba': 'jKklOpp',
                    'abcaznop': 'abc',
                    'zyxwvu': 'z',
                    '':''}
    for key in test_dict:
        func_res = longest_alpha_substring(key)
        if func_res == test_dict[key]:
            print(True)
        else:
            print(False, "the result is {}, expecting {}".format(func_res, test_dict[key]))

That’s it. Hope you enjoyed it. Let me know what you think in comments below. Join my telegram channel to get you daily slice of [Python]

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top