commit fdb5797ee0b601826269d775511e22db786587cb Author: Juhani Krekelä Date: Thu May 30 22:36:17 2019 +0300 First commit diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..15c993e --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +__pycache__ +*.pyc +*.swp diff --git a/CC0 b/CC0 new file mode 100644 index 0000000..670154e --- /dev/null +++ b/CC0 @@ -0,0 +1,116 @@ +CC0 1.0 Universal + +Statement of Purpose + +The laws of most jurisdictions throughout the world automatically confer +exclusive Copyright and Related Rights (defined below) upon the creator and +subsequent owner(s) (each and all, an "owner") of an original work of +authorship and/or a database (each, a "Work"). + +Certain owners wish to permanently relinquish those rights to a Work for the +purpose of contributing to a commons of creative, cultural and scientific +works ("Commons") that the public can reliably and without fear of later +claims of infringement build upon, modify, incorporate in other works, reuse +and redistribute as freely as possible in any form whatsoever and for any +purposes, including without limitation commercial purposes. These owners may +contribute to the Commons to promote the ideal of a free culture and the +further production of creative, cultural and scientific works, or to gain +reputation or greater distribution for their Work in part through the use and +efforts of others. + +For these and/or other purposes and motivations, and without any expectation +of additional consideration or compensation, the person associating CC0 with a +Work (the "Affirmer"), to the extent that he or she is an owner of Copyright +and Related Rights in the Work, voluntarily elects to apply CC0 to the Work +and publicly distribute the Work under its terms, with knowledge of his or her +Copyright and Related Rights in the Work and the meaning and intended legal +effect of CC0 on those rights. + +1. Copyright and Related Rights. A Work made available under CC0 may be +protected by copyright and related or neighboring rights ("Copyright and +Related Rights"). Copyright and Related Rights include, but are not limited +to, the following: + + i. the right to reproduce, adapt, distribute, perform, display, communicate, + and translate a Work; + + ii. moral rights retained by the original author(s) and/or performer(s); + + iii. publicity and privacy rights pertaining to a person's image or likeness + depicted in a Work; + + iv. rights protecting against unfair competition in regards to a Work, + subject to the limitations in paragraph 4(a), below; + + v. rights protecting the extraction, dissemination, use and reuse of data in + a Work; + + vi. database rights (such as those arising under Directive 96/9/EC of the + European Parliament and of the Council of 11 March 1996 on the legal + protection of databases, and under any national implementation thereof, + including any amended or successor version of such directive); and + + vii. other similar, equivalent or corresponding rights throughout the world + based on applicable law or treaty, and any national implementations thereof. + +2. Waiver. To the greatest extent permitted by, but not in contravention of, +applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and +unconditionally waives, abandons, and surrenders all of Affirmer's Copyright +and Related Rights and associated claims and causes of action, whether now +known or unknown (including existing as well as future claims and causes of +action), in the Work (i) in all territories worldwide, (ii) for the maximum +duration provided by applicable law or treaty (including future time +extensions), (iii) in any current or future medium and for any number of +copies, and (iv) for any purpose whatsoever, including without limitation +commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes +the Waiver for the benefit of each member of the public at large and to the +detriment of Affirmer's heirs and successors, fully intending that such Waiver +shall not be subject to revocation, rescission, cancellation, termination, or +any other legal or equitable action to disrupt the quiet enjoyment of the Work +by the public as contemplated by Affirmer's express Statement of Purpose. + +3. Public License Fallback. Should any part of the Waiver for any reason be +judged legally invalid or ineffective under applicable law, then the Waiver +shall be preserved to the maximum extent permitted taking into account +Affirmer's express Statement of Purpose. In addition, to the extent the Waiver +is so judged Affirmer hereby grants to each affected person a royalty-free, +non transferable, non sublicensable, non exclusive, irrevocable and +unconditional license to exercise Affirmer's Copyright and Related Rights in +the Work (i) in all territories worldwide, (ii) for the maximum duration +provided by applicable law or treaty (including future time extensions), (iii) +in any current or future medium and for any number of copies, and (iv) for any +purpose whatsoever, including without limitation commercial, advertising or +promotional purposes (the "License"). The License shall be deemed effective as +of the date CC0 was applied by Affirmer to the Work. Should any part of the +License for any reason be judged legally invalid or ineffective under +applicable law, such partial invalidity or ineffectiveness shall not +invalidate the remainder of the License, and in such case Affirmer hereby +affirms that he or she will not (i) exercise any of his or her remaining +Copyright and Related Rights in the Work or (ii) assert any associated claims +and causes of action with respect to the Work, in either case contrary to +Affirmer's express Statement of Purpose. + +4. Limitations and Disclaimers. + + a. No trademark or patent rights held by Affirmer are waived, abandoned, + surrendered, licensed or otherwise affected by this document. + + b. Affirmer offers the Work as-is and makes no representations or warranties + of any kind concerning the Work, express, implied, statutory or otherwise, + including without limitation warranties of title, merchantability, fitness + for a particular purpose, non infringement, or the absence of latent or + other defects, accuracy, or the present or absence of errors, whether or not + discoverable, all to the greatest extent permissible under applicable law. + + c. Affirmer disclaims responsibility for clearing rights of other persons + that may apply to the Work or any use thereof, including without limitation + any person's Copyright and Related Rights in the Work. Further, Affirmer + disclaims responsibility for obtaining any necessary consents, permissions + or other rights required for any use of the Work. + + d. Affirmer understands and acknowledges that Creative Commons is not a + party to this document and has no duty or obligation with respect to this + CC0 or use of the Work. + +For more information, please see + diff --git a/nfa_to_regex.py b/nfa_to_regex.py new file mode 100644 index 0000000..3f89576 --- /dev/null +++ b/nfa_to_regex.py @@ -0,0 +1,165 @@ +import enum +from collections import namedtuple + +from regex import lit, concat, bar, star + +NFA = namedtuple('NFA', ['start', 'accept', 'transitions']) + +def copy_nfa(nfa): + transitions_copy = {} + for from_state in nfa.transitions: + transitions_copy[from_state] = nfa.transitions[from_state].copy() + + return NFA(nfa.start, nfa.accept, transitions_copy) + +def remove_states(nfa): + start, accept, transitions = nfa + states = transitions.keys() + + states_to_remove = [i for i in states if i != start and i not in accept] + + while len(states_to_remove) > 0: + # Select a state to remove this round + removed_state = states_to_remove.pop() + print('\nRemoving state:', removed_state)#debg + + # Remove loops from this state back into itself + if removed_state in transitions[removed_state]: + loop_condition = transitions[removed_state][removed_state] + del transitions[removed_state][removed_state] + + # Prepend (condition)* to all transitions leading out + # of this state + for to_state in transitions[removed_state]: + condition = transitions[removed_state][to_state] + transitions[removed_state][to_state] = concat(star(loop_condition), condition) + + print(); prettyprint(nfa)#debg + + # Rewrite all transitions A→this→B as A→B transitions + # + # If the condition A→this is foo and this→B is bar, the + # condition for A→B becomes simply foobar + # + # Since we've removed all loops back into this state, this + # results in there being no transitions into this state + for from_state in transitions: + if removed_state in transitions[from_state]: + # Create a list of new transitions to add to the + # transition table for from_state + new_transitions = {} + condition_to_here = transitions[from_state][removed_state] + for to_state in transitions[removed_state]: + condition_from_here = transitions[removed_state][to_state] + new_transitions[to_state] = concat(condition_to_here, condition_from_here) + + # Remove the transition to the state being deleted + del transitions[from_state][removed_state] + + # Add the new transitions + # Since they may lead to the same place as + # already-existing transitions, we may need to + # combine the conditions with pre-existing ones + for to_state in new_transitions: + if to_state in transitions[from_state]: + # Already a transition leading + # to the same state + # If its condition is foo and + # ours is bar, then the new + # condition will be foo|bar + other_condition = transitions[from_state][to_state] + our_condition = new_transitions[to_state] + transitions[from_state][to_state] = bar(other_condition, our_condition) + + else: + # No pre-existing transition + transitions[from_state][to_state] = new_transitions[to_state] + + # Finally, remove the state we no longer need + del transitions[removed_state] + + print(); prettyprint(nfa)#debg + + return NFA(start, accept, transitions) + +def to_regex(nfa): + # Rewrite the NFA so that there are no transitions leading in to the + # start state or any leading out of an accept state. The easy way to + # do this is by creating a new start state that leads to the old one + # with empty condition (i.e. it consumes no input), and creating a new + # accept state that has similar empty condition transitions from all + # the old ones. Since we have an NFA and not a DFA, that operation is + # safe + # + # As a bonus, this rewrite gives us two useful properties: + # a) There is exactly one start state and one accept state + # b) After running remove_state() there will be only one transition, + # that of start to accept + # + # S + class _(enum.Enum): start, end = range(2) + + start, accept, transitions = copy_nfa(nfa) + + # Add new start state + transitions[_.start] = {start: lit('')} + + # Add new accept state and transitions to it + transitions[_.end] = {} + for state in accept: + transitions[state][_.end] = lit('') + + # Package everything into a new NFA + nfa = NFA(_.start, [_.end], transitions) + + print();prettyprint(nfa)#debg + + processed = remove_states(nfa) + + return processed.transitions[_.start][_.end] + +def prettyprint(nfa): + def process_state(state): + nonlocal start, accept + + t = '' + if state == start: + # Bold + t += '\x1b[1m' + if state in accept: + # Green + t += '\x1b[32m' + + if t != '': + return t + str(state) + '\x1b[0m' + else: + return str(state) + + start, accept, transitions = nfa + states = transitions.keys() + + print('\t' + '\t'.join(map(process_state, states))) + for from_state in states: + t = [] + for to_state in states: + if to_state in transitions[from_state]: + t.append(str(transitions[from_state][to_state])) + else: + t.append('\x1b[90m-\x1b[0m') + + print(process_state(from_state) + '\t' + '\t'.join(t)) + +def main(): + nfa = NFA('start', ['end'], { + 'start': {'0': lit('s')}, + '0': {'0': lit('0'), '1': lit('1'), 'end': lit('e'), 'start': lit('r')}, + '1': {'0': lit('1'), '1': lit('0'), 'start': lit('r')}, + 'end': {'end': lit('e'), 'start': lit('n')} + }) + + prettyprint(nfa) + + print(to_regex(nfa)) + +if __name__ == '__main__': + main() diff --git a/regex.py b/regex.py new file mode 100644 index 0000000..0de89ec --- /dev/null +++ b/regex.py @@ -0,0 +1,118 @@ +class Literal: + def __init__(self, text): + self.text = text + self.single_char = len(text) == 1 + + def __repr__(self): + return 'Literal(%s)' % repr(self.text) + + def __str__(self): + # ERE-style quotation rules + # A-Za-z0-9 and space are safe, as are non-ASCII + # Everything else is safe to quote with backslash + return ''.join( + char if ord('A') <= ord(char) <= ord('Z') else + char if ord('a') <= ord(char) <= ord('z') else + char if ord('0') <= ord(char) <= ord('9') else + char if char == ' ' else + char if ord(char) >= 128 else # Non-ASCII + '\\' + char # Quote + + for char in self.text + ) + +class Concatenation: + def __init__(self, *elements): + self.elements = elements + + def __repr__(self): + return 'Concatenation(%s)' % ', '.join(map(repr, self.elements)) + + def __str__(self): + return ''.join( + # Only alternation binds looser than concatenation, + # so we can pass everything else pass through with + # no parenthesizing + str(element) if type(element) != Alternation else + '(' + str(element) + ')' + + for element in self.elements + ) + +class Alternation: + def __init__(self, *elements): + self.elements = elements + + def __repr__(self): + return 'Alternation(%s)' % ', '.join(map(repr, self.elements)) + + def __str__(self): + if all(type(i) == Literal and i.single_char for i in self.elements): + # Special case: [abc] + return '[%s]' % ''.join(map(str, self.elements)) + else: + # Nothing binds looser than alternation, so just + # pass everything through as-is + return '|'.join(map(str, self.elements)) + +class Star: + def __init__(self, element): + self.element = element + + def __repr__(self): + return 'Star(%s)' % repr(self.element) + + def __str__(self): + # * applies to the previous character or a parenthesized + # group. Therefore, we parentesize unless we havea Literal + # and it is one-char long + if type(self.element) == Literal and self.element.single_char: + return str(self.element) + '*' + else: + return '(%s)*' % str(self.element) + +def lit(text): + return Literal(text) + +def concat(*elements): + flattened = [] + for element in elements: + if type(element) == Concatenation: + flattened.extend(element.elements) + else: + flattened.append(element) + + combined = [] + for element in flattened: + if len(combined) > 0 and type(combined[-1]) == Literal and type(element) == Literal: + # Combine two literals next to each other + # into one literal + previous = combined.pop() + combined.append(Literal(previous.text + element.text)) + + else: + combined.append(element) + + if len(combined) == 1: + element, = combined + return element + else: + return Concatenation(*combined) + +def bar(*elements): + # TODO: rewrite (foo|foo) → foo + flattened = [] + for element in elements: + if type(element) == Alternation: + flattened.extend(element.elements) + else: + flattened.append(element) + + if len(flattened) == 1: + element, = flattened + return element + else: + return Alternation(*flattened) + +def star(element): + return Star(element)