First commit

This commit is contained in:
Juhani Krekelä 2019-05-30 22:36:17 +03:00
commit fdb5797ee0
4 changed files with 402 additions and 0 deletions

3
.gitignore vendored Normal file
View File

@ -0,0 +1,3 @@
__pycache__
*.pyc
*.swp

116
CC0 Normal file
View File

@ -0,0 +1,116 @@
CC0 1.0 Universal
Statement of Purpose
The laws of most jurisdictions throughout the world automatically confer
exclusive Copyright and Related Rights (defined below) upon the creator and
subsequent owner(s) (each and all, an "owner") of an original work of
authorship and/or a database (each, a "Work").
Certain owners wish to permanently relinquish those rights to a Work for the
purpose of contributing to a commons of creative, cultural and scientific
works ("Commons") that the public can reliably and without fear of later
claims of infringement build upon, modify, incorporate in other works, reuse
and redistribute as freely as possible in any form whatsoever and for any
purposes, including without limitation commercial purposes. These owners may
contribute to the Commons to promote the ideal of a free culture and the
further production of creative, cultural and scientific works, or to gain
reputation or greater distribution for their Work in part through the use and
efforts of others.
For these and/or other purposes and motivations, and without any expectation
of additional consideration or compensation, the person associating CC0 with a
Work (the "Affirmer"), to the extent that he or she is an owner of Copyright
and Related Rights in the Work, voluntarily elects to apply CC0 to the Work
and publicly distribute the Work under its terms, with knowledge of his or her
Copyright and Related Rights in the Work and the meaning and intended legal
effect of CC0 on those rights.
1. Copyright and Related Rights. A Work made available under CC0 may be
protected by copyright and related or neighboring rights ("Copyright and
Related Rights"). Copyright and Related Rights include, but are not limited
to, the following:
i. the right to reproduce, adapt, distribute, perform, display, communicate,
and translate a Work;
ii. moral rights retained by the original author(s) and/or performer(s);
iii. publicity and privacy rights pertaining to a person's image or likeness
depicted in a Work;
iv. rights protecting against unfair competition in regards to a Work,
subject to the limitations in paragraph 4(a), below;
v. rights protecting the extraction, dissemination, use and reuse of data in
a Work;
vi. database rights (such as those arising under Directive 96/9/EC of the
European Parliament and of the Council of 11 March 1996 on the legal
protection of databases, and under any national implementation thereof,
including any amended or successor version of such directive); and
vii. other similar, equivalent or corresponding rights throughout the world
based on applicable law or treaty, and any national implementations thereof.
2. Waiver. To the greatest extent permitted by, but not in contravention of,
applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and
unconditionally waives, abandons, and surrenders all of Affirmer's Copyright
and Related Rights and associated claims and causes of action, whether now
known or unknown (including existing as well as future claims and causes of
action), in the Work (i) in all territories worldwide, (ii) for the maximum
duration provided by applicable law or treaty (including future time
extensions), (iii) in any current or future medium and for any number of
copies, and (iv) for any purpose whatsoever, including without limitation
commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes
the Waiver for the benefit of each member of the public at large and to the
detriment of Affirmer's heirs and successors, fully intending that such Waiver
shall not be subject to revocation, rescission, cancellation, termination, or
any other legal or equitable action to disrupt the quiet enjoyment of the Work
by the public as contemplated by Affirmer's express Statement of Purpose.
3. Public License Fallback. Should any part of the Waiver for any reason be
judged legally invalid or ineffective under applicable law, then the Waiver
shall be preserved to the maximum extent permitted taking into account
Affirmer's express Statement of Purpose. In addition, to the extent the Waiver
is so judged Affirmer hereby grants to each affected person a royalty-free,
non transferable, non sublicensable, non exclusive, irrevocable and
unconditional license to exercise Affirmer's Copyright and Related Rights in
the Work (i) in all territories worldwide, (ii) for the maximum duration
provided by applicable law or treaty (including future time extensions), (iii)
in any current or future medium and for any number of copies, and (iv) for any
purpose whatsoever, including without limitation commercial, advertising or
promotional purposes (the "License"). The License shall be deemed effective as
of the date CC0 was applied by Affirmer to the Work. Should any part of the
License for any reason be judged legally invalid or ineffective under
applicable law, such partial invalidity or ineffectiveness shall not
invalidate the remainder of the License, and in such case Affirmer hereby
affirms that he or she will not (i) exercise any of his or her remaining
Copyright and Related Rights in the Work or (ii) assert any associated claims
and causes of action with respect to the Work, in either case contrary to
Affirmer's express Statement of Purpose.
4. Limitations and Disclaimers.
a. No trademark or patent rights held by Affirmer are waived, abandoned,
surrendered, licensed or otherwise affected by this document.
b. Affirmer offers the Work as-is and makes no representations or warranties
of any kind concerning the Work, express, implied, statutory or otherwise,
including without limitation warranties of title, merchantability, fitness
for a particular purpose, non infringement, or the absence of latent or
other defects, accuracy, or the present or absence of errors, whether or not
discoverable, all to the greatest extent permissible under applicable law.
c. Affirmer disclaims responsibility for clearing rights of other persons
that may apply to the Work or any use thereof, including without limitation
any person's Copyright and Related Rights in the Work. Further, Affirmer
disclaims responsibility for obtaining any necessary consents, permissions
or other rights required for any use of the Work.
d. Affirmer understands and acknowledges that Creative Commons is not a
party to this document and has no duty or obligation with respect to this
CC0 or use of the Work.
For more information, please see
<http://creativecommons.org/publicdomain/zero/1.0/>

165
nfa_to_regex.py Normal file
View File

@ -0,0 +1,165 @@
import enum
from collections import namedtuple
from regex import lit, concat, bar, star
NFA = namedtuple('NFA', ['start', 'accept', 'transitions'])
def copy_nfa(nfa):
transitions_copy = {}
for from_state in nfa.transitions:
transitions_copy[from_state] = nfa.transitions[from_state].copy()
return NFA(nfa.start, nfa.accept, transitions_copy)
def remove_states(nfa):
start, accept, transitions = nfa
states = transitions.keys()
states_to_remove = [i for i in states if i != start and i not in accept]
while len(states_to_remove) > 0:
# Select a state to remove this round
removed_state = states_to_remove.pop()
print('\nRemoving state:', removed_state)#debg
# Remove loops from this state back into itself
if removed_state in transitions[removed_state]:
loop_condition = transitions[removed_state][removed_state]
del transitions[removed_state][removed_state]
# Prepend (condition)* to all transitions leading out
# of this state
for to_state in transitions[removed_state]:
condition = transitions[removed_state][to_state]
transitions[removed_state][to_state] = concat(star(loop_condition), condition)
print(); prettyprint(nfa)#debg
# Rewrite all transitions A→this→B as A→B transitions
#
# If the condition A→this is foo and this→B is bar, the
# condition for A→B becomes simply foobar
#
# Since we've removed all loops back into this state, this
# results in there being no transitions into this state
for from_state in transitions:
if removed_state in transitions[from_state]:
# Create a list of new transitions to add to the
# transition table for from_state
new_transitions = {}
condition_to_here = transitions[from_state][removed_state]
for to_state in transitions[removed_state]:
condition_from_here = transitions[removed_state][to_state]
new_transitions[to_state] = concat(condition_to_here, condition_from_here)
# Remove the transition to the state being deleted
del transitions[from_state][removed_state]
# Add the new transitions
# Since they may lead to the same place as
# already-existing transitions, we may need to
# combine the conditions with pre-existing ones
for to_state in new_transitions:
if to_state in transitions[from_state]:
# Already a transition leading
# to the same state
# If its condition is foo and
# ours is bar, then the new
# condition will be foo|bar
other_condition = transitions[from_state][to_state]
our_condition = new_transitions[to_state]
transitions[from_state][to_state] = bar(other_condition, our_condition)
else:
# No pre-existing transition
transitions[from_state][to_state] = new_transitions[to_state]
# Finally, remove the state we no longer need
del transitions[removed_state]
print(); prettyprint(nfa)#debg
return NFA(start, accept, transitions)
def to_regex(nfa):
# Rewrite the NFA so that there are no transitions leading in to the
# start state or any leading out of an accept state. The easy way to
# do this is by creating a new start state that leads to the old one
# with empty condition (i.e. it consumes no input), and creating a new
# accept state that has similar empty condition transitions from all
# the old ones. Since we have an NFA and not a DFA, that operation is
# safe
#
# As a bonus, this rewrite gives us two useful properties:
# a) There is exactly one start state and one accept state
# b) After running remove_state() there will be only one transition,
# that of start to accept
#
# S
class _(enum.Enum): start, end = range(2)
start, accept, transitions = copy_nfa(nfa)
# Add new start state
transitions[_.start] = {start: lit('')}
# Add new accept state and transitions to it
transitions[_.end] = {}
for state in accept:
transitions[state][_.end] = lit('')
# Package everything into a new NFA
nfa = NFA(_.start, [_.end], transitions)
print();prettyprint(nfa)#debg
processed = remove_states(nfa)
return processed.transitions[_.start][_.end]
def prettyprint(nfa):
def process_state(state):
nonlocal start, accept
t = ''
if state == start:
# Bold
t += '\x1b[1m'
if state in accept:
# Green
t += '\x1b[32m'
if t != '':
return t + str(state) + '\x1b[0m'
else:
return str(state)
start, accept, transitions = nfa
states = transitions.keys()
print('\t' + '\t'.join(map(process_state, states)))
for from_state in states:
t = []
for to_state in states:
if to_state in transitions[from_state]:
t.append(str(transitions[from_state][to_state]))
else:
t.append('\x1b[90m-\x1b[0m')
print(process_state(from_state) + '\t' + '\t'.join(t))
def main():
nfa = NFA('start', ['end'], {
'start': {'0': lit('s')},
'0': {'0': lit('0'), '1': lit('1'), 'end': lit('e'), 'start': lit('r')},
'1': {'0': lit('1'), '1': lit('0'), 'start': lit('r')},
'end': {'end': lit('e'), 'start': lit('n')}
})
prettyprint(nfa)
print(to_regex(nfa))
if __name__ == '__main__':
main()

118
regex.py Normal file
View File

@ -0,0 +1,118 @@
class Literal:
def __init__(self, text):
self.text = text
self.single_char = len(text) == 1
def __repr__(self):
return 'Literal(%s)' % repr(self.text)
def __str__(self):
# ERE-style quotation rules
# A-Za-z0-9 and space are safe, as are non-ASCII
# Everything else is safe to quote with backslash
return ''.join(
char if ord('A') <= ord(char) <= ord('Z') else
char if ord('a') <= ord(char) <= ord('z') else
char if ord('0') <= ord(char) <= ord('9') else
char if char == ' ' else
char if ord(char) >= 128 else # Non-ASCII
'\\' + char # Quote
for char in self.text
)
class Concatenation:
def __init__(self, *elements):
self.elements = elements
def __repr__(self):
return 'Concatenation(%s)' % ', '.join(map(repr, self.elements))
def __str__(self):
return ''.join(
# Only alternation binds looser than concatenation,
# so we can pass everything else pass through with
# no parenthesizing
str(element) if type(element) != Alternation else
'(' + str(element) + ')'
for element in self.elements
)
class Alternation:
def __init__(self, *elements):
self.elements = elements
def __repr__(self):
return 'Alternation(%s)' % ', '.join(map(repr, self.elements))
def __str__(self):
if all(type(i) == Literal and i.single_char for i in self.elements):
# Special case: [abc]
return '[%s]' % ''.join(map(str, self.elements))
else:
# Nothing binds looser than alternation, so just
# pass everything through as-is
return '|'.join(map(str, self.elements))
class Star:
def __init__(self, element):
self.element = element
def __repr__(self):
return 'Star(%s)' % repr(self.element)
def __str__(self):
# * applies to the previous character or a parenthesized
# group. Therefore, we parentesize unless we havea Literal
# and it is one-char long
if type(self.element) == Literal and self.element.single_char:
return str(self.element) + '*'
else:
return '(%s)*' % str(self.element)
def lit(text):
return Literal(text)
def concat(*elements):
flattened = []
for element in elements:
if type(element) == Concatenation:
flattened.extend(element.elements)
else:
flattened.append(element)
combined = []
for element in flattened:
if len(combined) > 0 and type(combined[-1]) == Literal and type(element) == Literal:
# Combine two literals next to each other
# into one literal
previous = combined.pop()
combined.append(Literal(previous.text + element.text))
else:
combined.append(element)
if len(combined) == 1:
element, = combined
return element
else:
return Concatenation(*combined)
def bar(*elements):
# TODO: rewrite (foo|foo) → foo
flattened = []
for element in elements:
if type(element) == Alternation:
flattened.extend(element.elements)
else:
flattened.append(element)
if len(flattened) == 1:
element, = flattened
return element
else:
return Alternation(*flattened)
def star(element):
return Star(element)