splitting string by multiple delimeters in python 3 - python

I am trying to split a for statement in java by its deliemeters in python 3. For example for (int i = 0;) is split by the () and the ;. Then I would like to have another loop that goes through the string and then checks if it has the word "for" and then it will say "You have the beginning of the for loop". Basically I will do this for each part of the for loop. I plan on running this against a java script in which it will break it down into pseudocode. I will post what I have done so far below.I am new to python so I am having trouble with this.
code
import re
var = ("""for ( int i =0; i < 5; i++) { \
System.out.println("Hello World") \
""")
var2 =(re.split(';|(|)|{|}' , var))
for myvars in var2:
print(myvars)
for v in var:
if 'for' in v:
print("For loop starts here")

Related

How to read csv with multiple quoted delimiters in single field?

I'd like to be able to split a string which contains the delimiter quoted multiple times. Is there an argument to handle this type of string with the csv module? Or is there another way to process it?
text = '"a,b"-"c,d","a,b"-"c,d"'
next(csv.reader(StringIO(text), delimiter=",", quotechar='"', quoting=csv.QUOTE_NONE))
Expected output: ['"a,b"-"c,d"', '"a,b"-"c,d"']
Actual output: ['"a', 'b"-"c', 'd"', '"a', 'b"-"c', 'd"']
EDIT:
The example above is simplified, but apparently too simplified as some comments provided solutions for the simplified version but not for the full version. Below is the actual data I want to process.
import csv
text = '"3-Amino-1,2,4-triazole"-text-0-"3-Amino-1,2,4-triazole"-CD-0,"3-Amino-1,2,4-triazole"-text-0-"3-Amino-1,2,4-triazole"-LS-0'
next(csv.reader(StringIO(text), delimiter=",", quotechar='"', quoting=csv.QUOTE_NONE))
Expected output
[
'"3-Amino-1,2,4-triazole"-text-0-"3-Amino-1,2,4-triazole"-CD-0',
'"3-Amino-1,2,4-triazole"-text-0-"3-Amino-1,2,4-triazole"-LS-0'
]
Actual output
[
'"3-Amino-1',
'2',
'4-triazole"-text-0-"3-Amino-1',
'2',
'4-triazole"-CD-0','"3-Amino-1',
'2', '4-triazole"-text-0-"3-Amino-1',
'2',
'4-triazole"-LS-0'
]
I'll only answer the first part of your question: there is no way to do this with the built-in csv module.
Looking at the CPython source code, quotechar option is only processed at the start of a field:
case START_FIELD:
/* expecting field */
...
else if (c == dialect->quotechar &&
dialect->quoting != QUOTE_NONE) {
/* start quoted field */
self->state = IN_QUOTED_FIELD;
}
...
break;
Inside a field, there is no such check:
case IN_FIELD:
/* in unquoted field */
if (c == '\n' || c == '\r' || c == '\0') {
/* end of line - return [fields] */
if (parse_save_field(self) < 0)
return -1;
self->state = (c == '\0' ? START_RECORD : EAT_CRNL);
}
else if (c == dialect->escapechar) {
/* possible escaped character */
self->state = ESCAPED_CHAR;
}
else if (c == dialect->delimiter) {
/* save field - wait for new field */
if (parse_save_field(self) < 0)
return -1;
self->state = START_FIELD;
}
else {
/* normal character - save in field */
if (parse_add_char(self, module_state, c) < 0)
return -1;
}
break;
There is a check for quotechar while the parser is in the IN_QUOTED_FIELD state; however, upon encountering a quote, it goes back to the IN_FIELD state indicating we're inside an unquoted field. So this is possible:
>>> import csv
>>> import io
>>> print(next(csv.reader(io.StringIO('"a,b"cd,e'))))
['a,bcd', 'e']
But once the parser has reached the end of the initial quoted section, it will consider any subsequent quotes as part of the data. I don't know if this behaviour is to conform with any (written or unwritten) CSV specification, or if it's just a bug.
The data is in a non-standard format and so any solution would need to be tested on the full dataset. A possible workaround could be to first replace ," characters with ;" and then simply split it on the ;. This could be done without using CSV or RE:
tests = [
'"a,b"-"c,d","a,b"-"c,d"',
'"3-Amino-1,2,4-triazole"-text-0-"3-Amino-1,2,4-triazole"-CD-0,"3-Amino-1,2,4-triazole"-text-0-"3-Amino-1,2,4-triazole"-LS-0',
]
for test in tests:
row = test.replace(',"' , ';"').split(';')
print(len(row), row)
Giving:
2 ['"a,b"-"c,d"', '"a,b"-"c,d"']
2 ['"3-Amino-1,2,4-triazole"-text-0-"3-Amino-1,2,4-triazole"-CD-0', '"3-Amino-1,2,4-triazole"-text-0-"3-Amino-1,2,4-triazole"-LS-0'
If the structure is always the same with the comma sandwiched between an integer and the '"', you can use a regular expression:
import re
re.split('(?<=[0-9]),(?=")', text)

Splitting C code to statements using python

Is there any way to split a string (a complete C file) to C statements using python?
#include <stdio.h>
#include <math.h>
int main (void)
{
if(final==(final_t))
{
foo(final);
/*comment*/
printf("equal\n");
}
return(0);
}
If this is read to a string is there any way to split it into a list of strings like this:
list=['#include <stdio.h>', '#include<math.h>', 'int main(void){','if(final==(final_t)){', 'foo(final);', '/*comment*/', 'printf("equal\n);', '}', 'return(0);', '}']
Without being extremely complex, a C language program is composed of lexical tokens that form declarations and statements according to a syntax. And your splitting need some more explainations: according to the C language standard, if (cond) statement1 [else statement2]; is is statement. Simply both statement1 and statement2 can be blocks, so statements can be nested. In your requirements, you seem to concat the opening brace of a eventual block to the conditional, and leave the closing brace alone. And you say nothing about declarations or preprocessor language
So IMHO, your specifications are still incomplete...
Anyway, it is already far too complex for a simple lexical analyzer. So you should first write the complete grammar that you want to process, ideally in Backus-Naur Form, and declare the terminating tokens. Once you have that, it is easy to use lex + yaxx PYL to build a parser from that grammar.
It is probably not the expected answer, but C language parsers are far from trivial, except you want only accept a small subset of the language.
You should perform the following steps to reach the result:
Get your code as separate lines.
Cut leading and trailing spaces.
Skip empty lines.
If your code if given as a string you can use:
lines = content.split('\n')
If as a file:
with open('file.c') as f:
lines = f.readlines()
To cut extra spaces:
lines = list(map(str.strip, lines))
To skip empty lines:
lines = list(filter(lambda x: x, lines))
So the full code may look like this:
content = """
#include <stdio.h>
#include <math.h>
int main (void)
{
if(final==(final_t))
{
foo(final);
printf("equal\n");
}
return(0);
}
"""
lines = content.split('\n')
lines = list(map(str.strip, lines))
lines = list(filter(lambda x: x, lines))
print(lines)
code_list = []
with open("<your-code-file>", 'r') as code_file:
for line in code_file:
if "{" in line:
code_list[-1] = code_list[-1] + line.strip()
else:
code_list.append(line.strip())
print(code_list)
output:
['#include <stdio.h>', '#include <math.h>', '', 'int main (void){\n', 'if(final==(final_t)) {\n', 'foo(final);', 'printf("equal\\n");', '}', 'return(0);', '}']

Convert Perl syntax to Python [duplicate]

This question already has an answer here:
How to create a dict equivalent in Python from Perl hash?
(1 answer)
Closed 5 years ago.
I have a file with lines which are separated by white spaces.
I have written the program below in Perl and it works.
Now I must rewrite it in Python which is not my language, but I have solved it more or less.
I currently struggle with this expression in Perl which I can't convert it to Python.
$hash{$prefix}++;
I have found some solutions but I'm not sufficiently experienced with Python to solve this. All the solution looks complicated to me compared to the Perl one.
These Stack Overflow questions seem to be relevant.
Python variables as keys to dict
Python: How to pass key to a dictionary from the variable of a function?
Perl
#!perl -w
use strict;
use warnings FATAL => 'all';
our $line = "";
our #line = "";
our $prefix = "";
our %hash;
our $key;
while ( $line = <STDIN> ) {
# NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
next if $line =~ /^NAMESPACE/;
#aleks-event-test redis-1-m06k0 1/1 Running 0 1d 172.26.0.138 The_Server_name
#line = split ' ', $line;
$line[1] =~ /(.*)-\d+-\w+$/ ;
$prefix = $1;
#print "$prefix $line[7]\n";
print "$prefix $line[7]\n";
$hash{$prefix}++;
}
foreach $key ( keys %hash ) {
if ( $hash{$key} / 2 ){
print "$key : $hash{$key} mod 2 \n"
}
else {
print "$key : $hash{$key} not mod 2 \n"
}
}
Python
#!python
import sys
import re
myhash = {}
for line in sys.stdin:
"""
Diese Projekte werden ignoriert
"""
if re.match('^NAMESPACE|logging|default',line):
continue
linesplited = line.split()
prefix = re.split('(.*)(-\d+)?-\w+$',linesplited[1])
#print linesplited[1]
print prefix[1]
myhash[prefix[1]] += 1
Your problem is using this line:
myhash = {}
# ... code ...
myhash[prefix[1]] += 1
You likely are getting a KeyError. This is because you start off with an empty dictionary (or hash), and if you attempt to reference a key that doesn't exist yet, Python will raise an exception.
A simple solution that will let your script work is to use a defaultdict, which will auto-initialize any key-value pair you attempt to access.
#!python
import sys
import re
from collections import defaultdict
# Since you're keeping counts, we'll initialize this so that the values
# of the dictionary are `int` and will default to 0
myhash = defaultdict(int)
for line in sys.stdin:
"""
Diese Projekte werden ignoriert
"""
if re.match('^NAMESPACE|logging|default',line):
continue
linesplited = line.split()
prefix = re.split('(.*)(-\d+)?-\w+$',linesplited[1])
#print linesplited[1]
print prefix[1]
myhash[prefix[1]] += 1

Get Python script to run binary each time

I am using a python script to run a command in the terminal. I have a program in C, numgen, that is simply supposed to randomly generate numbers. After compiling numgen I then simulate running numgen on different architectures.
The text to be input into the command line:
cmd= "~/simdir/build/ALPHA/gem5.opt ~/simdir/configs/example /se.py ~/First/NumGen/numgen -o "
I then use the following to run the command in a subshell
for x in range(5): //number of simulations
os.system(run)//run is cmd plus a string I created using a for loop because se.py requires an argument
I encounter a problem though. After running everything I output relevant data about numgen to a file using result_file.write(). This would be fine except all the output in this new file is identical. The numgen program is in C and is meant to generate random numbers. It does that fine, and when I run the binary, ./numgen, in the terminal I get lists of random numbers.
Within the script, it seems os.system is not running the binary each time in the subshell. Os.system runs the binary once in the beginning and prints out that same data for the rest of the simulations. Does anyone know how I can get the data to be truly random? I would like to have the binary be run before every simulation so the generated data can be different. Is it possible to do this within a script? I do have compilation flags and other text in the script, but I do not think those pertain to the problem like the above code does.
Essentially, os.system runs binary once for all the simulations. I would like to know if it is possible to run the binary for each different simulation. Any help is appreciated. Thanks
EDIT 1- Full Script:
from random import *
from array import *
import sys, string, os, subprocess, shlex, binascii, collections, operator
import re
import subprocess
plaintext = array('B', [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]);
loop = long(0);
loop_max = long(1000);
number = long(0);
byte_index = 0;
cmd = "~/simdir/build/ALPHA/gem5.opt ~/simdir/configs/example /se.py --num-cpus=1 --caches --l2cache --cpu-type=timing --cacheline_size=64 --l1d_size=32kB --l1d_assoc=2 --l1i_size=32kB --l1i_assoc=4 --l2_size=1MB --l2_assoc=2 --max-checkpoints=200 -c ~/First/NumGen/numgen -o "
string=""
Somekeywords = ['sim_freq','sim_ticks','host_mem_usage', 'system.cpu.numCycles', 'host_seconds', 'sim_insts' ]
table = [[0 for first in range(26)] for second in range(loop_max)]; #16 for plaintext bytes, 1 for ticks, 9 for L1D/L1I/L2 miss/hit/access
result =[[0 for first in range(26)] for second in range(loop_max)];
outfile = stats_text = open('Data.txt', 'w')
stats_text=open('RelStat', 'w')
numb=10
for x in range(numb):
number=number+1
outfile.write('%d\n' % number)
run = ""
string= ""
final_str = ""
for x in range(16):
plaintext[i] = randint(0, 255)
tmp = ""
#Dec -> HEX -> Str
for x in range(16):
tmp = hex(plaintext[i])[2:]
if len(tmp)==1:
tmp = "0"+tmp
string = string + tmp
run = cmd + string
subprocess.Popen(run, shell=True).wait() #Just changed this previously used os.system(run)
stats_text = open('newdir/analysis.txt', 'r')
count = 0;
for line in stats_text:
for index in range(3):
if Somekeywords[index] in line:
count = count + 1; #
line=re.sub('[^0-9]',' ', line)
outfile.write(line)
outfile.write("\n")
if count>10 and count<21:
result_tmp = re.findall(r'\b\d+\b',line);
print(loop);
result[loop][16+index] = int(result_tmp[0]);
for i in range(16):
result[loop][i] = plaintext[i];
loop = loop + 1;
if loop == loop_max:
loop = 0;
for i in range(loop_max):
for j in range(26):
final_str = final_str + str(result[i][j]) + ' ';
outfile.write("\n")
outfile.close
EDIT 2 - Numgen
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main()
{
FILE *of1;
of1=fopen("Data.txt", "w");
srand(time(NULL));
for (int z=0; z<16; z++)
{
int numgen=rand()%100;
fprintf(of1, "%d ", numgen);
}
fclose(of1);
}
Just a quick program to generate a random number. Also, the program is in C, not C++ as I previously stated. I must have been compiling the .cpp with gcc and not notice.

How to replace letters with numbers and re-convert at anytime (Caesar cipher)?

I've been coding this for almost 2 days now but cant get it. I've coded two different bits trying to find it.
Code #1
So this one will list the letters but wont change it to the numbers (a->1, b->2, ect)
import re
text = input('Write Something- ')
word = '{}'.format(text)
for letter in word:
print(letter)
#lists down
Outcome-
Write something- test
t
e
s
t
Then I have this code that changes the letters into numbers, but I haven't been able to convert it back into letters.
Code #2
u = input('Write Something')
a = ord(u[-1])
print(a)
#converts to number and prints ^^
enter code here
print('')
print(????)
#need to convert from numbers back to letters.
Outcome:
Write Something- test
116
How can I send a text through (test) and make it convert it to either set numbers (a->1, b->2) or random numbers, save it to a .txt file and be able to go back and read it at any time?
What youre trying to achieve here is called "caesar encryption".
You for example say normally you would have: A=1, a=2, B=3, B=4, etc...
then you would have a "key" which "shifts" the letters. Lets say the key is "3", so you would shift all letters 3 numbers up and you would end up with: A=4, a=5, B=6, b=7, etc...
This is of course only ONE way of doing a caesar encryption. This is the most basic example. You could also say your key is "G", which would give you:
A=G, a=g, B=H, b=h, etc.. or
A=G, a=H, B=I, b=J, etc...
Hope you understand what im talking about. Again, this is only one very simple example way.
Now, for your program/script you need to define this key. And if the key should be variable, you need to save it somewhere (write it down). Put your words in a string, and check and convert each letter and write it into a new string.
You then could say (pseudo code!):
var key = READKEYFROMFILE;
string old = READKEYFROMFILE_OR_JUST_A_NORMAL_STRING_:)
string new = "";
for (int i=0, i<old.length, i++){
get the string at i;
compare with your "key";
shift it;
write it in new;
}
Hope i could help you.
edit:
You could also use a dictionary (like the other answer says), but this is a very static (but easy) way.
Also, maybe watch some guides/tutorials on programming. You dont seem to be that experienced. And also, google "Caesar encryption" to understand this topic better (its very interesting).
edit2:
Ok, so basically:
You have a variable, called "key" in this variable, you store your key (you understood what i wrote above with the key and stuff?)
You then have a string variable, called "old". And another one called "new".
In old, you write your string that you want to convert.
New will be empty for now.
You then do a "for loop", which goes as long as the ".length" of your "old" string. (that means if your sentence has 15 letters, the loop will go through itself 15 times and always count the little "i" variable (from the for loop) up).
You then need to try and get the letter from "old" (and save it for short in another vairable, for example char temp = "" ).
After this, you need to compare your current letter and decide how to shift it.
If thats done, just add your converted letter to the "new" string.
Here is some more precise pseudo code (its not python code, i dont know python well), btw char stands for "character" (letter):
var key = g;
string old = "teststring";
string new = "";
char oldchar = "";
char newchar = "";
for (int i=0; i<old.length; i++){
oldchar = old.charAt[i];
newchar = oldchar //shift here!!!
new.addChar(newchar);
}
Hope i could help you ;)
edit3:
maybe also take a look at this:
https://inventwithpython.com/chapter14.html
Caesar Cipher Function in Python
https://www.youtube.com/watch?v=WXIHuQU6Vrs
Just use dictionary:
letters = {'a': 1, 'b': 2, ... }
And in the loop:
for letter in word:
print(letters[letter])
To convert to symbol codes and back to characters:
text = input('Write Something')
for t in text:
d = ord(t)
n = chr(d)
print(t,d,n)
To write into file:
f = open("a.txt", "w")
f.write("someline\n")
f.close()
To read lines from file:
f = open("a.txt", "r")
lines = f.readlines()
for line in lines:
print(line, end='') # all lines have newline character at the end
f.close()
Please see documentation for Python 3: https://docs.python.org/3/
Here are a couple of examples. My method involves mapping the character to the string representation of an integer padded with zeros so it's 3 characters long using str.zfill.
Eg 0 -> '000', 42 -> '042', 125 -> '125'
This makes it much easier to convert a string of numbers back to characters since it will be in lots of 3
Examples
from string import printable
#'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~ \t\n\r\x0b\x0c'
from random import sample
# Option 1
char_to_num_dict = {key : str(val).zfill(3) for key, val in zip(printable, sample(range(1000), len(printable))) }
# Option 2
char_to_num_dict = {key : str(val).zfill(3) for key, val in zip(printable, range(len(printable))) }
# Reverse mapping - applies to both options
num_to_char_dict = {char_to_num_dict[key] : key for key in char_to_num_dict }
Here are two sets of dictionaries to map a character to a number. The first option uses random numbers eg 'a' = '042', 'b' = '756', 'c' = '000' the problem with this is you can use it one time, close the program and then the next time the mapping will most definitely not match. If you want to use random values then you will need to save the dictionary to a file so you can open to get the key.
The second option creates a dictionary mapping a character to a number and maintains order. So it will follow the sequence eg 'a' = '010', 'b' = '011', 'c' = '012' everytime.
Now I've explained the mapping choices here are the function to convert between
def text_to_num(s):
return ''.join( char_to_num_dict.get(char, '') for char in s )
def num_to_text(s):
slices = [ s[ i : i + 3 ] for i in range(0, len(s), 3) ]
return ''.join( num_to_char_dict.get(char, '') for char in slices )
Example of use ( with option 2 dictionary )
>>> text_to_num('Hello World!')
'043014021021024094058024027021013062'
>>> num_to_text('043014021021024094058024027021013062')
'Hello World!'
And finally if you don't want to use a dictionary then you can use ord and chr still keeping with padding out the number with zeros method
def text_to_num2(s):
return ''.join( str(ord(char)).zfill(3) for char in s )
def num_to_text2(s):
slices = [ s[ i : i + 3] for i in range(0, len(s), 3) ]
return ''.join( chr(int(val)) for val in slices )
Example of use
>>> text_to_num2('Hello World!')
'072101108108111032087111114108100033'
>>> num_to_text2('072101108108111032087111114108100033')
'Hello World!'

Categories