Python replace plus sign from excel - python

The data I pull from DB comes in the following format:
+jacket
online trading account
+neptune
When I write this data to a CSV I end up with a #NAME? error. I tried adding single quote ' to the front of the values when I pull the data, however, this does not fix the issue. I need to write the values exactly as they come, with the plus sign at the front.

You simply need to format the desired output column as a text column. This will result in:
+jacket
online trading account
+neptune
being written to the file exactly as is. No more #NAME? errors.

Related

Convert bytestring containing line breaks inside quotes to CSV file

My target is to create a CSV file from an API call.
The problem is: The API returns a bytestring as content and I don't know how do convert it to a CSV file properly.
The content part of the response looks like this:
b'column_title_1, column_title_2, column_title_3, value1_1, value1_2, value1_3\nvalue2_1, value2_2(="Perfect!\nThank you"), value2_3\n, value3_1, value3_2, value3_3\n....'
How can I manage to get a clean CSV file from this? I tried Pandas, the CSV module and Numpy. Unfortunately, I was not able to handle the newline escapes which are sometimes within a string value (it is a column for comments) - see value2_2.
The result should look like this:
column_title_1, column_title_2, column_title_3
value1_1, value1_2, value1_3
value2_1, value2_2(="Perfect!\nThank you"), value2_3
value3_1, value3_2, value3_3
The closest of my results was this:
column_title_1, column_title_2, column_title_3
value1_1, value1_2, value1_3
value2_1, value2_2(="Perfect!
Thank you"), value2_3
value3_1, value3_2, value3_3
Even if I got close, I was not able to get rid of the \n within the values of some columns.
I did not figure out how to exclude the \n which are within "".

How to replace especific text dinamicaly inside DOC file using python

I'm wondering how I could replace a especific text cited by a parentheses: (wordExample), a double comma "wordExample" or any other marker.
Here's a example:
I would have a Example.doc file with the following text:
"Hello, this is just a dummy text written teo get Ethereums price: {cripto:ETH} dolars"
My python script would find the DOC file localy and get all the variables needed to be replaced. As we have the example:
{cripto:ETH}
Another script I have would find the variables asked for. Knowing it could haved asked for any set of cripto prices, like "cripto:BTC". Now I have the ETH's price and the following date.
Now it should generate another DOC file replacing {cripto:ETH} with ETH's price and a screenshot of it's graph.
Thats it! I just want to know how I can get do that sistem of getting the DOC file and replacing some of it's elements dinamicaly

Parsing contact info from a .pst (outlook) file

I have .pst (outlook) file, which contains old emails and email contacts (around 3980 of them), which I'd like to export to a machine readable format.
Outlook 2016 already has an option to export the contacts to a .csv file, but after the export operation is performed, one can see, that the file is not structured properly. The "Notes" field may contain a messagge, which might contain multiple new line characters. This, in turn, breaks the .csv format, since every entry should start with the value of the first contact field (but in these cases, the lines represent the successive content of the mentioned "Notes" field). When the "Notes" field is finished, the next line usually contains the rest of the values of the entry.
Example csv output:
"Title","First Name",... <- header field values of the exported .csv
"","John","","Travolta","","ValueX","","","ValueY",,,"ValueZ",... <- start of the contact entry
www.link1.com <- start of the "Notes" field (same contact)
.................. <- "Notes" field continued (same contact)
www.link2.com <- "Notes" field continued (same contact)
................... <- "Notes" field continued (same contact)
"asd","asdas","asdasd","asdasd" <- rest of the contact fields (same contact)
"","Nicolas","Cage","","","ValueX","","","ValueY",,,"ValueZ",... <- 2nd contact (in one line)
I'd like to fix the formatting of the exported file, so the "Notes" field would not stretch across multiple lines and each contact would be represented in the file as a single line.
I think I have two options here:
write a script (python), which goes over the lines and fixes the formatting (I'd like to avoid doing this, since the script might overlook something).
find an API for parsing .pst files and try to serialize the contacts in the suitable format (by specifying how to serialize the "Notes" field manually).
Does anybody know, if I'm overlooking something and if this could be solved in an easier way?
Kind regards.
EDIT: I'm talking about this issue.
The file exported from Outlook is not broken although it may appear to look like it is. In effect, a newline character inside quotes is considered part of the cell. So if cells have newlines, it would mean a single "row" will be be loaded from many lines in the file.
For example for a CSV say you have four cells in one row, a, b, c and d. This would look like:
a,b,c,d
Now change c to be c1\nc2, i.e. it has a newline in it:
a,b,"c1
c2",d
The cell is now quoted and appears on multiple lines. The standard Python CSV library will be able to correctly parse this, including a standard Outlook exported CSV contact file.
The following displays a name and home address from each contact given a standard contacts CSV file exported from Outlook:
import csv
with open('contacts.csv', 'r', newline='') as f_contacts:
csv_contacts = csv.DictReader(f_contacts)
for contact in csv_contacts:
print(contact['First Name'], contact['Last Name'])
print("{}{}{}".format(contact['Home Street'], contact['Home Street 2'], contact['Home Street 3']).replace('\n\n','\n'))
print()
This assumes you are using Python 3.x and was tested using a CSV file exported directly from Outlook.

How to change position of data columns using regex in a CSV file using comma as separator?

I am giving up with this, its almost due date. I enrolled to regex class this summer (biggest mistake of my life), and we have this topic (where we choose an old software and make updates to it), well I'm almost done with everything but, except this, I have a .txt document of database of monster attributes?
Anyways, the logic is each variable represent columns/keys and each column are separated by comma. And we need to delete/add/reposition the columns using any available tool (regex the only thing I know can help me? do you know anything? )
Here is the OLD form:
ID,Name,JName,LV,HP,SP,EXP,JEXP,Range1,ATK1,ATK2,DEF,MDEF,STR,AGI,VIT,INT,DEX,LUK,Range2,Range3,Scale,Race,Element,Mode,Speed,ADelay,aMotion,dMotion,Drop1id,Drop1per,Drop2id,Drop2per,Drop3id,Drop3per,Drop4id,Drop4per,Drop5id,Drop5per,Drop6id,Drop6per,Drop7id,Drop7per,Drop8id,Drop8per,MEXP,ExpPer,MVP1id,MVP1per,MVP2id,MVP2per,MVP3id,MVP3per
First, delete 7th column from the last (deleting all ExpPer entries):
Results to:
ID,Name,JName,LV,HP,SP,EXP,JEXP,Range1,ATK1,ATK2,DEF,MDEF,STR,AGI,VIT,INT,DEX,LUK,Range2,Range3,Scale,Race,Element,Mode,Speed,ADelay,aMotion,dMotion,Drop1id,Drop1per,Drop2id,Drop2per,Drop3id,Drop3per,Drop4id,Drop4per,Drop5id,Drop5per,Drop6id,Drop6per,Drop7id,Drop7per,Drop8id,Drop8per,MEXP,MVP1id,MVP1per,MVP2id,MVP2per,MVP3id,MVP3per
Second, duplicate JName column to next column:
Results to:
ID,Name,JName,Jname,LV,HP,SP,EXP,JEXP,Range1,ATK1,ATK2,DEF,MDEF,STR,AGI,VIT,INT,DEX,LUK,Range2,Range3,Scale,Race,Element,Mode,Speed,ADelay,aMotion,dMotion,Drop1id,Drop1per,Drop2id,Drop2per,Drop3id,Drop3per,Drop4id,Drop4per,Drop5id,Drop5per,Drop6id,Drop6per,Drop7id,Drop7per,Drop8id,Drop8per,MEXP,MVP1id,MVP1per,MVP2id,MVP2per,MVP3id,MVP3per
Third, pull the last 7 columns, put them starting to 31st column, i.e. from ...,dMotion,Drop1id,Drop1per,... to ...,dMotion,MEXP,...,MVP3per,Drop1id,...
Results to:
ID,Name,JName,Jname,LV,HP,SP,EXP,JEXP,Range1,ATK1,ATK2,DEF,MDEF,STR,AGI,VIT,INT,DEX,LUK,Range2,Range3,Scale,Race,Element,Mode,Speed,ADelay,aMotion,dMotion,MEXP,MVP1id,MVP1per,MVP2id,MVP2per,MVP3id,MVP3per,Drop1id,Drop1per,Drop2id,Drop2per,Drop3id,Drop3per,Drop4id,Drop4per,Drop5id,Drop5per,Drop6id,Drop6per,Drop7id,Drop7per,Drop8id,Drop8per
Fourth, Finally, add these columns to the last: ,0,0,DONE,1:
Results to:
ID,Name,JName,Jname,LV,HP,SP,EXP,JEXP,Range1,ATK1,ATK2,DEF,MDEF,STR,AGI,VIT,INT,DEX,LUK,Range2,Range3,Scale,Race,Element,Mode,Speed,ADelay,aMotion,dMotion,MEXP,MVP1id,MVP1per,MVP2id,MVP2per,MVP3id,MVP3per,Drop1id,Drop1per,Drop2id,Drop2per,Drop3id,Drop3per,Drop4id,Drop4per,Drop5id,Drop5per,Drop6id,Drop6per,Drop7id,Drop7per,Drop8id,Drop8per,0,0,DONE,1
Hence, if I run whatever or how many regex search/replace tool,
the original:
1052,ROCKER,Rocker,9,198,0,20,16,1,24,29,5,10,1,9,18,10,14,15,10,12,1,4,22,129,200,1864,864,540,940,5000,909,5500,2298,4,1402,80,520,10,752,5,703,3,4021,10,0,0,0,0,0,0,0,0
would result to:
1052,ROCKER,Rocker,Rocker,9,198,0,20,16,1,24,29,5,10,1,9,18,10,14,15,10,12,1,4,22,129,200,1864,864,540,0,0,0,0,0,0,0,940,5000,909,5500,2298,4,1402,80,520,10,752,5,703,3,4021,10,0,0,DONE,1
Hope somebody can help me, there are 500+ monsters in this old database .txt file.
Thanks!
Microsoft Excel has a Text Import Wizard to import data in any CSV format from any text file into an empty Excel worksheet. This wizard can be used for small CSV files to load the data, next delete/move/copy data columns and finally export/save the modified data again in CSV format into a file.
But the question was about reformatting the CSV file using a text editor with regular expression.
I used UltraEdit v21.20 with selecting Perl regular expression engine, but below should work with any text editor supporting Perl regular expressions. The regular expression search and replace strings should work also with Python.
Important:
The regular expressions below work only if CSV file does not contain commas in double quoted values.
First, delete 7th column from the last (deleting all ExpPer entries):
Search:   ,[^,\r\n]*?(,(?:[^,\r\n]*?,){5}[^,\r\n]*)$
Replace: \1
Second, duplicate JName column to next column:
Search:   ^((?:[^,\r\n]*?,){2})([^,\r\n]*?,)
Replace: \1\2\2
Third, pull the last 7 columns, put them starting to 31st column:
Search:   ^((?:[^,\r\n]*?,){30})((?:[^,\r\n]*?,){15}[^,]*?),((?:[^,\r\n]*?,){6}[^,\r\n]*)$
Replace: \1\3,\2
Fourth, finally, add ,0,0,DONE,1 to the last:
Search:   (.)$
Replace: \1,0,0,DONE,1
But those 4 replaces can be done also with a single regular expression replace:
Search:   ^((?:[^,\r\n]*?,){2})([^,\r\n]*?,)((?:[^,\r\n]*?,){26})((?:[^,\r\n]*?,){16})([^,\r\n]*?,)[^,\r\n]*?,((?:[^,\r\n]*?,){5}[^,\r\n]*)$
Replace: \1\2\2\3\5\6,\40,0,DONE,1

Python number getting rounded in csv file

I am gathering tweet data and writing it to a csv file. Everything works perfect when I print status ID #'s in IDLE:
with open('C:/location/filename.csv', 'wb') as acsv:
w = csv.writer(acsv)
w.writerow(('ID'))
for statusObj in results:
statid = statusObj.id
w.writerow((statid))
This prints a status ID as expected (e.g. 238669617898323968). But when I open the csv file to check it, the last 3 digits are rounded to 238669617898323000. What is going on here? Thanks!
And the answer is...don't trust Excel to display your data exactly as entered.
See this for the reason why, but it boils down to Excel only handling 15-16 digit numbers. I'm making an assumption here, but if you're pulling tweets, I'm assuming you're using the Twitter API? If so, there is an id_str field that will return the ID as a string, and you could then store it in your CSV and handle converting it at other point in your program (see here for more information).

Categories