Related
I am trying to write a function in python that essentially splits the original CSV file into multiple csv files and saves their names. The code that I have written so far looks like this:
def split_data(a, b, name_of_the_file):
part = df.loc[a:b-1]
filename = str(name_of_the_file)
part.to_csv(r'C:\path\to\file\filename.csv', index=False)
The main intention of the code is to name each file a different name, which is the input (name_of_the_file). The code seems to work, but only saves the file as filename.csv.
Your function is saving the file(s) with the name filename.csv because you only specify that name in the following line:
part.to_csv(r'C:\path\to\file\filename.csv', index=False)
To change the name you need to change the string to take the filename variable:
part.to_csv(f'C:\path\to\file\{filename}.csv', index=False)
Notice how the string has a f in the beginning of it -- this is called an f-string and it allows you to add Python variables directly into the string by using curly brackets ({filename}).
Welcome to Stackoverflow.
I think what you need to do is to user string interpolation.
https://www.programiz.com/python-programming/string-interpolation
Example from the linked page:
name = 'World'
program = 'Python'
print(f'Hello {name}! This is {program}')
In your case something like
def split_data(a, b, name_of_the_file):
part = df.loc[a:b-1]
filename = str(name_of_the_file)
part.to_csv(r'C:\path\to\file\{filename}.csv', index=False)
I have added the curly braces to your code as in my initial example.
I hope that helps.
Another approach to saving multiple csv files
# create a list of dataframes (example below assumes a dictionary of dataframes)
lst_of_dfs = [x for x in dfs]
# path to save files
path = 'c:/location_to_save_files/'
# create a list of filenames you would like to use
fnames = ['A', 'B', 'C', 'D', 'E']
# output the data
# this outputs the files using the index, i, + fnames
for i, j in enumerate(fnames, 0):
lst_of_dfs[i].to_csv(path'+str(i)+str(j)+'.csv')
Good evening, I want to create a list while reading a text file (historique.txt) which contains list of files associated to each taskid. Considering the following example: my text file contains these lines:
4,file1
4,file2
5,file1
5,file3
5,file4
6,file3
6,file4
(to explain more the content of the text file: 4 is an idtask and file1 is a file used by idtask=4, so basically, task 4 used (file1,file2).
I want to obtain list Transactions=[[file1,file2],[file1,file3,file4],[file3,file4]]
Any help and thank you.
This will not work if the input file is not ordered
Exactly the same idea as #mad_'s answer, just showing the benefit of turning file_data_list to be a list of lists instead of list of strings. We only need to .split each line once which is more readable and probably a bit faster as well.
Note that this can also be done while reading the file instead of after-the-fact like I show below.
from itertools import groupby
file_data_list = ['4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4']
file_data_list = [line.split(',') for line in file_data_list]
for k, v in groupby(file_data_list, key=lambda x: x[0]):
print([x[1] for x in v]) # also no need to convert v to list
After reading from the file e.g f.readlines() which will give a list similar to below
file_data_list=['4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4']
Apply groupby
from itertools import groupby
for k,v in groupby(file_data_list,key=lambda x:x.split(",")[0]):
print([i.split(",")[1] for i in list(v)])
Output
['file1', 'file2']
['file1', 'file3', 'file4']
['file3', 'file4']
you can also create a mapping dict
for k,v in groupby(file_data_list,key=lambda x:x.split(",")[0]):
print({k:[i.split(",")[1] for i in list(v)]})
Output
{'4': ['file1', 'file2']}
{'5': ['file1', 'file3', 'file4']}
{'6': ['file3', 'file4']}
As pointed out by #DeepSpace the above solution will work only if the ids are ordered. Modifying if it not ordered
from collections import defaultdict
d=defaultdict(list)
file_data_list=['4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4',
'4,file3']
for k,v in groupby(file_data_list,key=lambda x:x.split(",")[0]):
for i in list(v):
d[k].append(i.split(",")[1])
print(d)
Output
defaultdict(list,
{'4': ['file1', 'file2', 'file3'],
'5': ['file1', 'file3', 'file4'],
'6': ['file3', 'file4']})
We can use the csv module to process the lines into lists of values.
csv reads from a file-like object, which we can fake using StringIO for an example:
>>> from io import StringIO
>>> contents = StringIO('''4,file1
... 4,file2
... 5,file1
... 5,file3
... 5,file4
... 6,file3
... 6,file4''')
Just to note: depending upon the version of Python you are using you might need to import StringIO differently. The above code works for Python 3. For Python 2, replace the import with from StringIO import StringIO.
csv.reader returns an iterable object. We can consume the whole thing into a list, just to see how it works. Later we will instead iterate over the reader object one line at a time.
We can use pprint to see the results nicely formatted:
>>> import csv
>>> lines = list(csv.reader(contents))
>>> from pprint import pprint
>>> pprint(lines)
[['4', 'file1'],
['4', 'file2'],
['5', 'file1'],
['5', 'file3'],
['5', 'file4'],
['6', 'file3'],
['6', 'file4']]
These lists can then be unpacked into a task and filename:
>>> task, filename = ['4', 'file1']
>>> task
'4'
>>> filename
'file1'
We want to build lists of filenames having the same task as key.
To efficiently organise this we can use a dictionary. The efficiency is because we can ask the dictionary to find a list of values for a given key. It will store the keys in some sort of a tree and searching the tree is quicker than a linear search.
The first time we look to add a value to the dictionary for a particular key, we would need to check to see whether it already exists.
If not we would add an empty list and append the new value to it. Otherwise we would just add the value to the existing list for the given key.
This pattern is so common that Python's builtin dictionary has a method dict.setdefault to help us achieve this.
However, I don't like the name, or the non-uniform syntax. You can read the linked documentation if you like, but I'd rather use
Python's defaultdict instead. This automatically creates a default value for a key if it doesn't already exist when you query it.
We create a defaultdict with a list as default:
>>> from collections import defaultdict
>>> d = defaultdict(list)
Then for any new key it will create an empty list for us:
>>> d['5']
[]
We can append to the list:
>>> d['5'].append('file1')
>>> d['7'].append('file2')
>>> d['7'].append('file3')
I'll convert the defaultdict to a dict just to make it pprint more nicely:
>>> pprint(dict(d), width=30)
{'5': ['file1'],
'7': ['file2', 'file3']}
So, putting all this together:
import csv
from collections import defaultdict
from io import StringIO
from pprint import pprint
contents = StringIO('''4,file1
4,file2
5,file1
5,file3
5,file4
6,file3
6,file4''')
task_transactions = defaultdict(list)
for row in csv.reader(contents):
task, filename = row
task_transactions[task].append(filename)
pprint(dict(task_transactions))
Output:
{'4': ['file1', 'file2'],
'5': ['file1', 'file3', 'file4'],
'6': ['file3', 'file4']}
Some final notes: In the example we've used StringIO to fake the file contents. You'll probably want to replace that in your actual code with something like:
with open('historique.txt') as contents:
for row in csv.reader(contents):
... # etc
Also, where we take each row from the csv reader, and then unpack it into a task and filename, we could do that all in one go:
for task, filename in csv.reader(contents):
So your whole code (without printing) would be quite simple:
import csv
from collections import defaultdict
task_transactions = defaultdict(list)
with open('historique.txt') as contents:
for task, filename in csv.reader(contents):
task_transactions[task].append(filename)
If you want a list of transactions (as you asked in the question!):
transactions = list(task_transactions.values())
However, this may not be in the same order of tasks as the original file. If that's important to you, clarify the question, and comment so I can help.
An alternate solution without using the groupby library
(This solution does exactly what #mad_'s does, however it is more readable, especially for someone who is a beginner):
As #mad_ said, the read list will be as follows:
data=[
'4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4']
You could loop over the data, and create a dict
transactions = defaultdict(list)
for element in data: #data[i] is the idtask, data[i+1] is the file
id, file = element.split(',')
transactions[id].append(file)
Transactions will now contain the dictionary:
{'4': ['file1', 'file2']
'5': ['file1', 'file3', 'file4']
'6': ['file3', 'file4']}
Say that I have a JSON file whose structure is either unknown or may change overtime - I want to replace all values of "REPLACE_ME" with a string of my choice in Python.
Everything I have found assumes I know the structure. For example, I can read the JSON in with json.load and walk through the dictionary to do replacements then write it back. This assumes I know Key names, structure, etc.
How can I replace ALL of a given string value in a JSON file with something else?
This function recursively replaces all strings which equal the value original with the value new.
This function works on the python structure - but of course you can use it on a json file - by using json.load
It doesn't replace keys in the dictionary - just the values.
def nested_replace( structure, original, new ):
if type(structure) == list:
return [nested_replace( item, original, new) for item in structure]
if type(structure) == dict:
return {key : nested_replace(value, original, new)
for key, value in structure.items() }
if structure == original:
return new
else:
return structure
d = [ 'replace', {'key1': 'replace', 'key2': ['replace', 'don\'t replace'] } ]
new_d = nested_replace(d, 'replace', 'now replaced')
print(new_d)
['now replaced', {'key1': 'now replaced', 'key2': ['now replaced', "don't replace"]}]
I think there's no big risk if you want to replace any key or value enclosed with quotes (since quotes are escaped in json unless they are part of a string delimiter).
I would dump the structure, perform a str.replace (with double quotes), and parse again:
import json
d = { 'foo': {'bar' : 'hello'}}
d = json.loads(json.dumps(d).replace('"hello"','"hi"'))
print(d)
result:
{'foo': {'bar': 'hi'}}
I wouldn't risk to replace parts of strings or strings without quotes, because it could change other parts of the file. I can't think of an example where replacing a string without double quotes can change something else.
There are "clean" solutions like adapting from Replace value in JSON file for key which can be nested by n levels but is it worth the effort? Depends on your requirements.
Why not modify the file directly instead of treating it as a JSON?
with open('filepath') as f:
lines = f.readlines()
for line in lines:
line = line.replace('REPLACE_ME', 'whatever')
with open('filepath_new', 'a') as f:
f.write(line)
You could load the JSON file into a dictionary and recurse through that to find the proper values but that's unnecessary muscle flexing.
The best way is to simply treat the file as a string and do the replacements that way.
json_file = 'my_file.json'
with open(json_file) as f:
file_data = f.read()
file_data = file_data.replace('REPLACE_ME', 'new string')
<...>
with open(json_file, 'w') as f:
f.write(file_data)
json_data = json.loads(file_data)
From here the file can be re-written and you can continue to use json_data as a dict.
Well that depends, if you want to place all the strings entitled "REPLACE_ME" with the same string you can use this. The for loop loops through all the keys in the dictionary and then you can use the keys to select each value in the dictionary. If it is equal to your replacement string it will replace it with the string you want.
search_string = "REPLACE_ME"
replacement = "SOME STRING"
test = {"test1":"REPLACE_ME", "test2":"REPLACE_ME", "test3":"REPLACE_ME", "test4":"REPLACE_ME","test5":{"test6":"REPLACE_ME"}}
def replace_nested(test):
for key,value in test.items():
if type(value) is dict:
replace_nested(value)
else:
if value==search_string:
test[key] = replacement
replace_nested(test)
print(test)
To solve this problem in a dynamic way, I have obtained to use the same json file to declare the variables that we want to replace.
Json File :
{
"properties": {
"property_1": "value1",
"property_2": "value2"
},
"json_file_content": {
"key_to_find": "{{property_1}} is my value"
"dict1":{
"key_to_find": "{{property_2}} is my other value"
}
}
Python code (references Replace value in JSON file for key which can be nested by n levels):
import json
def fixup(self, a_dict:dict, k:str, subst_dict:dict) -> dict:
"""
function inspired by another answers linked below
"""
for key in a_dict.keys():
if key == k:
for s_k, s_v in subst_dict.items():
a_dict[key] = a_dict[key].replace("{{"+s_k+"}}",s_v)
elif type(a_dict[key]) is dict:
fixup(a_dict[key], k, subst_dict)
# ...
file_path = "my/file/path"
if path.exists(file_path):
with open(file_path, 'rt') as f:
json_dict = json.load(f)
fixup(json_dict ["json_file_content"],"key_to_find",json_dict ["properties"])
print(json_dict) # json with variables resolved
else:
print("file not found")
Hope it helps
I have a JSON File which looks like this:
{"one":"Some data", "two":"Some data",...}
and so on...
I want to split all the ID's into separate files according to the name of the ID, For example:
one.json
{"one":"Some data"}
two.json
{"two":"Some data"}
and so on.
I got a reference from this. But my problem is slightly different. What can I modify to achieve the separate text files?
I won't teach you how to do file I/O and assume you can do that yourself.
Once you have loaded the original file as a dict with the json module, do
>>> org = {"one":"Some data", "two":"Some data"}
>>> dicts = [{k:v} for k,v in org.items()]
>>> dicts
[{'two': 'Some data'}, {'one': 'Some data'}]
which will give you a list of dictionaries that you can dump to a file (or separate files named after the keys), if you wish.
After loading the JSON file you can treat it as a dictionary in python and then save the contents in file by looping through as you would in normal python dictionary.
Here is an example related to what you want to achieve
Data = {"one":"Some data", "two":"Some data"}
for item in Data:
name = item + '.json'
file = open(name, 'w')
file.write('{"%s":"%s"}' % (item, Data[item]))
file.close()
after getting the json data into a variable,do
a = {"one":"Some data", "two":"Some data"}
for k,v in a.items():
with open(k+".json","w") as f:
f.write('{"%s" : "%s"}' %(k,v))
and output is :
one.json => {"one":"Some data"}
and
two.json => {"two":"Some data"}
I am extracting data from the Google Adwords Reporting API via Python. I can successfully pull the data and then hold it in a variable data.
data = get_report_data_from_google()
type(data)
str
Here is a sample:
data = 'ID,Labels,Date,Year\n3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016\n3179461237,"[""SKWS"",""Broad""]",2016-05-16,2016\n3282565342,"[""SKWS"",""Broad""]",2016-05-16,2016\n'
I need to process this data more, and ultimately output a processed flat file (Google Adwords API can return a CSV, but I need to pre-process the data before loading it into a database.).
If I try to turn data into a csv object, and try to print each line, I get one character per line like:
c = csv.reader(data, delimiter=',')
for i in c:
print(i)
['I']
['D']
['', '']
['L']
['a']
['b']
['e']
['l']
['s']
['', '']
['D']
['a']
['t']
['e']
So, my idea was to process each column of each line into a list, then add that to a csv object. Trying that:
for line in data.splitlines():
print(line)
3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016
What I actually find is that inside of the str, there is a list: "[""SKWS"",""Exact""]"
This value is a "label" documentation
This list is formatted a bit weird - it has numerous parentheses in the value, so trying to use a quote char, like ", will return something like this: [ SKWS Exact ]. If I could get to [""SKWS"",""Exact""], that would be acceptable.
Is there a good way to extract a list object within a str? Is there a better way to process and output this data to a csv?
You need to split the string first. csv.reader expects something that provides a single line on each iteration, like a standard file object does. If you have a string with newlines in it, split it on the newline character with splitlines():
>>> import csv
>>> data = 'ID,Labels,Date,Year\n3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016\n3179461237,"[""SKWS"",""Broad""]",2016-05-16,2016\n3282565342,"[""SKWS"",""Broad""]",2016-05-16,2016\n'
>>> c = csv.reader(data.splitlines(), delimiter=',')
>>> for line in c:
... print(line)
...
['ID', 'Labels', 'Date', 'Year']
['3179799191', '["SKWS","Exact"]', '2016-05-16', '2016']
['3179461237', '["SKWS","Broad"]', '2016-05-16', '2016']
['3282565342', '["SKWS","Broad"]', '2016-05-16', '2016']
This has to do with how csv.reader works.
According to the documentation:
csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called
The issue here is that if you pass a string, it supports the iterator protocol, and returns a single character for each call to next. The csv reader will then consider each character as a line.
You need to provide a list of line, one for each line of your csv. For example:
c = csv.reader(data.split(), delimiter=',')
for i in c:
print i
# ['ID', 'Labels', 'Date', 'Year']
# ['3179799191', '["SKWS","Exact"]', '2016-05-16', '2016']
# ['3179461237', '["SKWS","Broad"]', '2016-05-16', '2016']
# ['3282565342', '["SKWS","Broad"]', '2016-05-16', '2016']
Now, your list looks like a JSON list. You can use the json module to read it.