CSV file field re-formating
CSV file field re-formating
I have the following data in a CSV file. It has 8 columns. Here is the data:
Device,Interface,Description,Average Inbound,Max Inbound,Average Outbound,Max Outbound
Router,Cable0/0/0,,34.13M bps,88.57M bps,26.12M bps,80.39M bps
Router,Cable0/0/0-upstream0,Node1,10.60M bps,16.07M bps,0 bps,0 bps
Router,Cable0/0/0-upstream1,Node1,4.74M bps,13.71M bps,0 bps,0 bps
Router,Cable0/0/0-upstream2,Node1,2.14M bps,10.65M bps,0 bps,0 bps
Router,Cable0/0/0-upstream3,Node1,1.35M bps,8.33M bps,0 bps,0 bps
Router,Cable0/0/1-upstream0,Node2,6.67M bps,7.80M bps,0 bps,0 bps
Router,Cable0/0/1-upstream1,Node2,6.40M bps,8.22M bps,0 bps,0 bps
Router,Cable0/0/1-upstream2,Node2,5.21M bps,8.06M bps,0 bps,0 bps
Router,Cable0/0/1-upstream3,Node2,4.84M bps,7.80M bps,0 bps,0 bps
I don't need to do anything with Device
column or the Max Inbound
or Max Outbound
column. However, I need to be able to sum the Avg. Inbound
and Avg. Outbound
for each Node (Description column).
Device
Max Inbound
Max Outbound
Avg. Inbound
Avg. Outbound
So for example, my Python script would need to iterate through each row that has a specific Node (Node1, Node2, etc.) and then sum those Inbound and Outbound averages.The problem is the CSV file has garbage characters such the "M" for Million and "bps" for bits per second attached to the end of the values. I'd need to remove those characters and turn those into integers before I could perform math on them. Speaking of math, I'd also need to convert those values to Megabit's per second. Any ideas on best approaches would be greatly appreciated.
Thanks for the info. The code I've started with is this:
import csv
import string
nodes =
averages =
with open('csvfile') as csvDataFile:
csvReader = csv.reader(csvDataFile)
for row in csvReader:
averages.append(row[4])
print averages
It will give the following result:
['Max Inbound', '88.57M bps', '16.07M bps', '13.71M bps', '10.65M bps',
'8.33M bps', '7.80M bps', '8.22M bps', '8.06M bps', '7.80M bps']
How can I first search for and match on all the Interfaces that have the same value and then I can do the math on those values. For example, in the Interface column, I want to search for ALL the Cable0/0/0 interfaces (4 of them) and then I can do the math on the values for Inbound/Outbound max.
Can this be done using just the CSV module and re, OR do I need to use pandas? Can anyone help me figure out this step? In the example data above, I'd need to get ONLY values that matched Cable0/0/0 and add up the inbound/outbound. The next step would be get values for Cable0/0/1, etc.
Thanks so much in advance for any help.
Lots of sub-questions in there, but to start off with it might be useful to look at pandas dataframes which are great for working with mixed datatypes like you have. You can make use of
.rstrip
to remove trailing characters, and then multiply as desired on your individual values.– Sagasaki
Jun 28 at 21:05
.rstrip
Thanks for the input. I've updated the post with my starting code and output.
– RBlair57
Jun 28 at 23:18
1 Answer
1
Since you know that your header will always be included, you could simply slice your list with averages[1:]
. However, this isn't very Pythonic. I'd suggest using the following code:
averages[1:]
with open('csvfile') as csvDataFile:
csvReader = csv.reader(csvDataFile)
next(csvReader) # Skips the header row!
# Write the rest of your code here
The next() function is a built-in feature of Python that can be applied very well to csv.reader
objects. If you have several headers, then you can apply the next()
function several times consecutively to skip all those headers.
csv.reader
next()
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Where's your Python script? We can't figure out where you're going wrong without actually seeing what you did. Also please see How to Ask.
– rayryeng
Jun 28 at 20:56