How to Avoid Duplicate Data [closed]

How to Avoid Duplicate Data [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
while True:
if bbs_number > lately_number():
sys.stdout = open('date.txt','a')
bbs_lists = range(highest_number() +1, bbs_number +1)
for item in bbs_lists:
url_number = "url" + str(item)
try:
result = requests.get(url_number)
bs_number = BeautifulSoup(result.content, "lxml")
float_box = bs_number.find("div", {"class": "float_box"})
parameter_script = float_box
print("bs_obj()")
except AttributeError as e:
print("error")
with open('lately_number.txt', 'w') as f_last:
f_last.write(str(bbs_number))
Using the while statement above does not cause an error, but duplicate data will be output to date.txt.
I want to modify in the early stages of setting the range value, rather than removing duplicates in the later stages of typing in date.txt.
One possibility is that the existing lately_number() will output a duplicate range to date.txt, because sometimes it is not possible to enter the value correctly in the writing process of lately_number.txt.
I would be grateful if you can help me with a better function expression to add or replace.

The simplest way would be to read the date.txt into a set. Then, you can check the set to see if the date is already in there, and if it isn't, write the date to the date.txt file.
E.G.
uniqueDates = set()
#read file contents into a set.
with open("date.txt", "r") as f:
for line in f:
uniqueDates.add(line.strip()) #strip off the line ending \n
#ensure what we're writing to the date file isn't a duplicate.
with open("date.txt", "a") as f:
if("bs_obj()" not in uniqueDates):
f.write("bs_obj")
You'll probably need to adjust the logic a bit to fit your needs, but, I believe this is what you're trying to accomplish?

Related

Can't store txt file data in Python Dataframe [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 12 months ago.
Improve this question
I am following an article about image caption transformer model in tensor flow python. When I try to run the following code it does not show the data when I use the head function.
file = open(dir_Flickr_text,'r')
text = file.read()
file.close()
datatxt = []
for line in text.split('\n'):
col = line.split('\t')
if len(col) == 1:
continue
w = col[0].split("#")
datatxt.append(w + [col[1].lower()])
data = pd.DataFrame(datatxt,columns["filename","index","caption"])
data = data.reindex(columns =. ['index','filename','caption'])
data = data[data.filename !='2258277193_586949ec62.jpg.1']
uni_filenames = np.unique(data.filename.values)
data.head()
After running this I see three columns (index, filename , caption) with no data at all. While the real file contains enough data and the in the article they display the data too.

It doesn't show any data because the dataframe is empty, probably because datatext is empty. Try using a print() statement before data=pd.DataFrame(... to see what is going on.
It is hard for us to debug without the dataset.

How to export dataframe to csv using conditions? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I am merging data using different data sets and appending them in one single dataset. The problem is that sometimes dataset df_fours is empty. To deal with this I have used try and pass statements.
Now when df_fours_unique is bypassed and when I try to export the results in CSV it gives the error:
df_append3 is not defined
What I want is to have some conditional statement (or if there is something else) which will export df_append3 if it doesn't have any error. Otherwise it will just append df_append2. What I currently have is:
df_unique = pd.merge(df7,df6_1,on='DEL_KEY1',how='left')
df_twos = pd.merge(df9,df8_1,on='DEL_KEY1',how='left')
df_twos_unique = df_twos[df_twos.index % 2 == 0]
df_threes = pd.merge(df11,df10_1,on='DEL_KEY1',how='left')
df_threes_unique = df_threes[df_threes.index % 3 == 0]
try:
df_fours = pd.merge(df13,df12_1,on='DEL_KEY1',how='left')
df_fours_unique = df_fours[df_fours.index % 4 == 0]
except:
pass
df_append1 = df_unique.append(df_twos_unique)
df_append2 = df_append1.append(df_threes_unique)
try:
df_append3 = df_append2.append(df_fours_unique)
except:
pass
df_append3.to_csv('export.csv')
Couldn't attach the datasets due to confidentiality.

What I want is to have some conditional statement (or if there is
something else) which will export df_append3 if it doesn't have any
error. Otherwise it will just append df_append2.
There is, and you're alrady using it! It's called try/except... If there was no error (inside the try) - add df_append3. Otherwise (except), append df_append2:
try:
df_append3 = df_append2.append(df_fours_unique)
df_append3.to_csv('export.csv')
except:
df_append2.to_csv('export.csv')

https://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.DataFrame.empty.html
if not df_fours.empty:
#The dataframe is not empty, we can write it
and by same token
if not df_append3.empty:
#can write this too...
Better approach than try: except: I'd think.

Store data from text to mysql using python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
'NBN3W27800D1.NS'
3326.5
3515.6
3326.5
'2021-04-13'
3515.6
3904.0
3970.3
27800
'2021-04-15'
'NBN3W27800P1.NS'
6.55
13.0
4.1
'2021-04-13'
9.25
5.6
6.55
27800
'2021-04-15'
my text file contains these data. i want to store as 2 rows into mysql - db

Here is what I assume You want (get each line of text and store them in separate rows):
with open('your_file.txt') as file:
data = file.readlines()
for line in data:
print(line)
Except replace the print with whatever function You use to add to the database and just use a counter or enumerate() to set the row like so:
for row, line in enumerate(data):
print(row, line)
except again replace that print with insertion to database.
You may also want to add .strip() to remove the newlines:
for lines in data:
lines = lines.strip()
print(lines)
and again add the rest of the database code (which I don't know so I am giving You print fucntions)

Filter csv file without using panda or csv module PYTHON [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Without using any modules (panda, csv, etc) I need to filter this csv file (https://data.world/prasert/rotten-tomatoes-top-movies-by-genre/workspace/file?filename=rotten_tomatoes_top_movies_2019-01-15.csv), I would like to filter through ONLY the movies that are in the animation genre and drop the other movies.
I have used open, split and the for loop to read the data, but I am struggling to filter the movies into genres.
I have created a list called genres, and then appended it with genres.append(line.split(",") [4]), but this only gives a list of genres from the genre column rather than giving me info of each movie in a particular genre.
I know it is crazy to attempt this without the modules(this is for school), but is it even possible to do this without them?
Thanks in advance.

try this.
f = open("file_name", "r",encoding="utf-8")
new_list=[]
header=0
for line in f.readlines():
#if header is present in the file
if header==0:
new_list.append(line)
header=1
continue
#add Genre Name to filter
if line.split(',')[4]=='genre_name':
new_list.append(line)
#writing filtered list to output file.
out_flie=open('output.txt','w',encoding="utf-8")
for element in new_list:
out_flie.write(element)
out_flie.write('\n')
out_flie.close()

How can I take a value from a document and add another value to it? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I have a Python program I'm writing that I would like to have track statistics given inputs. I'd like it to have two documents set up, and be able to refer to each. In each, there would be a value, let's say it's x. The program would be able to generate a number, and I'd like to be able to update the number in a given document by adding the generated number. Right now, my code would be as follows:
f1 = open("player1records.txt", "a+")
f1.write(str(int(P1wins) + int(f1.read)))
This, however, raises the following:
TypeError: int() argument must be a string, a bytes-like object or a number,
not 'builtin_function_or_method'
How can I take that x and add another number to it, then update the document?

don't forget to add the () to the end of a function to call it:
f1.write(str(int(P1wins) + int(f1.read()))) # not f1.read

this sort of thing is difficult to do safely, one tends to end up with code that does:
from os import replace
def updateRecords(value, filename="player1records.txt"):
try:
with open(filename) as fd:
prev = int(fd.read())
except (FileNotFoundError, ValueError):
prev = 0
changed = prev + value
# write to a temp file in case of failure while doing this
tmpname = filename + '~'
with open(tmpname, 'w') as fd:
fd.write(str(changed))
# perform atomic rename so we don't end up with a half written file
replace(tmpname, filename)
all of this fiddling is why people tend to end up hiding this complexity behind a relational database. Python includes a relatively nice SQLite interface. if everything was set up, you'd be able to do:
with dbcon:
dbcon.execute(
"UPDATE player SET wins=wins + ? WHERE player_id = ?",
(P1wins, 1))
and have the SQLite library take care of platform specific fiddly bits…

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to Avoid Duplicate Data [closed] - python

Related

Can't store txt file data in Python Dataframe [closed]

How to export dataframe to csv using conditions? [closed]

Store data from text to mysql using python [closed]

Filter csv file without using panda or csv module PYTHON [closed]

How can I take a value from a document and add another value to it? [closed]

Categories

Resources