Saturday, June 6, 2009

AWK equivalent in Windows Powershell

Powershell has lot of features and abilities for text parsing. AWK is one of very powerful commands available for text parsing in Unix/Linux. We do not have a Awk like cmdlet in Powershell. But we can do everything in Powershell that can be done with Awk.

Powershell combined with .Net Classes provide very powerful regular expressions for text parsing.

Now, lets play with Bash Awk and Powershell. I'm going to create a test.csv file for our testing and use that through out this topic.

Content of test.csv for our testing

1,Tony Passaquale,7920,20090222 21:59:00,800,4.78,3824,Follow-up
2,Nigel Shan Shanford,30316,20090405 16:34:00,400,9.99,3996,New-Opportunity
3,Selma Cooper,97455,20090405 16:31:00,1000,9.99,9990,Pre-Approach
4,Allen James,95140,20090405 16:31:00,1000,9.99,9990,New-Opportunity


So, Lets jump in

Display second field in test.csv


[jagadish.g@localhost Scripts]$ cat test.csv | awk -F, '{print $2}'
Tony Passaquale
Nigel Shan Shanford
Selma Cooper
Allen James


PS C:\Scripts> Get-Content .\test.csv | %{ $_.Split(',')[1]; }
Tony Passaquale
Nigel Shan Shanford
Selma Cooper
Allen James

Now, lets try getting the total value of third field in all the records in test.csv


[jagadish.g@localhost Scripts]$ cat test.csv | awk -F, '{total+=$3} END {print "Total: "total}'
Total: 230831


PS C:\Scripts> Get-Content .\test.csv | %{ [int]$total+=$_.Split(',')[2]; } ; Write-Host "Total: $total"
Total: 230831

Get no of fields in each record


[jagadish.g@localhost Scripts]$ cat test.csv | awk -F, '{print "No of fields in record "$1" = "NF }'
No of fields in record 1 = 8
No of fields in record 2 = 8
No of fields in record 3 = 8
No of fields in record 4 = 8


PS C:\Scripts> Get-Content .\test.csv | %{ $a=$_.Split(','); Write-Host "No of fields in record"$a[0]"="$a.length; }
No of fields in record 1 = 8
No of fields in record 2 = 8
No of fields in record 3 = 8
No of fields in record 4 = 8

Regular Expression matching in Awk and Powershell. Print a record if the last field contains any of these lowercase characters (a, b or c)


[jagadish.g@localhost Scripts]$ cat test.csv | awk -F, '{if ($NF ~ "[a-c]") print}'
3,Selma Cooper,97455,20090405 16:31:00,1000,9.99,9990,Pre-Approach


PS C:\Scripts> Get-Content .\test.csv | %{ if ($_.Split(',')[-1] -match "[a-c]") { $_; } }
3,Selma Cooper,97455,20090405 16:31:00,1000,9.99,9990,Pre-Approach

28 comments:

  1. Excellent, thanks for sharing

    ReplyDelete
  2. If you would like to use tab delimiters, you can use the following form:
    gc .\test.csv | %{ [regex]::split($_, '\t')[3]; }

    ReplyDelete
    Replies
    1. You could user {$_.split("`t")[3]}

      Delete
  3. Hello,
    If I manually run powershell "Get-Content .\ImagesList.txt | %{ $_.Split('')[3]; }" > BackupIdList.txt on the cmd it was completed very well, but if I insert this line on a CMD file, it return the error bellow:
    Expressions are only permitted as the first element of a pipeline.
    At line:1 char:51
    + Get-Content .\ImagesList.txt | { $_.Split('')[3]; } <<<<
    Anyone can help me? Thanks in advance.

    ReplyDelete
  4. Try enclosing the statement with braces as shown below.

    (Get-Content .\ImagesList.txt | %{ $_.Split('')[3]; }) > BackupIdList.txt

    If that didn't work, try this link. http://technet.microsoft.com/en-us/library/ee176927.aspx

    ReplyDelete
  5. Hello,

    I am new to programming and now presently using Windows Powershell. I have PS 1.0 installed and have .NET framework and .NET SDK also installed but when I type the method "Split" which I am desperately looking for is not getting recognized. Please help how can I import this method??

    Thanks,
    dek

    ReplyDelete
  6. Hi, thsi is really cool. How do I print multiple columns

    e.g
    cat file | awk -F: '{print $1,$3}'
    I've tried
    cat c:\numbers.txt | foreach {($_ -split ':')[0,2]}

    But that prints all of column 0, follwed by all of column 2 rather than side be side.

    Many Thanks

    ReplyDelete
  7. Try this,

    PS D:\> cat c:\numbers.txt | foreach {Write-Host ($_ -split ',')[0,2]}

    ReplyDelete
  8. Arrgh, no need to use cat when using awk:
    awk -F, '{print $2}' test.csv

    ReplyDelete
  9. I agree with Jagdish that AWK is one of very powerful commands available for text parsing in Unix/Linux. We do not have a Awk like cmdlet in Powershell. But we can do everything in Powershell that can be done with Awk.
    but I dont know why
    D:\> cat c:\numbers.txt | foreach {Write-Host ($_ -split ',')[0,2]}
    this doesn't worked for me

    ReplyDelete
  10. Hello Jagdish,

    Your article was very helpful and descriptive.
    I am stuck on a point in script I am writing.
    I have to search for Ips which are conencted from port 80 and in Established state after running this cmd" netstat -aonp".
    I tried above method but when saving output into csv format I am getting unnnessary details which is stopping me to go ahead."Export-Csv -Path D:\netstat.txt -Encoding UTF8 -NoTypeInformation
    Please help

    ReplyDelete
  11. While it is good to see Microsoft FINALLY saw a need for a scripting tool that will be useful.

    As a 20yr Unix veteran, I have to say make it Posix compliant, and make the damned commands the same, so those of us who love the simplicity of scripting in Unix can port scripts without re-writes. I don't want to go through the process of re-tooling my scripts.

    I have a boat load of scripts that move files from one location to another, to push files via SCP/SFTP and clearly without the features of a full blown Linux/Unix system..

    Awk needs to be awk, sed needs to be sed, cat needs to be cat..

    i see this notion of an attempt to mimic Awk, and I cringe at the bastardize of the simplicity, extending the length of a simple
    awk '{print $1, $4}' text.txt > text.csv

    into a
    "Export-Csv -Path D:\filename.txt -Encoding UTF8 -NoTypeInformation"

    seriously!? KISS Dennis Richie was correct:
    "UNIX is very simple, it just needs a genius to understand its simplicity." Dennis Ritchie.

    ReplyDelete
  12. Saw this and had to send it to the previous poster. I couldn't agree more.
    http://www.infoworld.com/t/unix/nine-traits-the-veteran-unix-admin-276

    We Unix guys see have witnessed the invariable need for Microsoft's reboots, and laugh heartily and the example of sloppy programming of an OS. While I've had Sparc's with serious memory leaks Microsoft takes the trophy for reboots because of it. While I can stumble my way through a product with a GUI, give me a command line vi, and few mature Unix utilities over an after thought like Powershell any day of the week.

    ReplyDelete
  13. I love awk, sed and all the other unix commands that helped me save many a days work... it is great that it is coming in ANY form to the Windows platform.

    Also, Export-Csv builds a csv to MS standard, i.e. with the BOM codes in place, so that a slight improvement for the windows excel that uses those.

    ReplyDelete
  14. That helped me a lot! Thanks!

    ReplyDelete
  15. Hi based on your example

    test.csv can you tell me how to search and get the string if one string line meets 2 conditions

    for the unix area i would do this
    cat test.txt | awk 'substr(%0,49,2)==99 && substr(%0,64,2)=="un" {print $0}'

    return

    4,Allen James,95140,20090405 16:31:00,1000,9.99,9990,New-Opportunity

    ReplyDelete
    Replies
    1. the powershell version above im looking for

      Delete
  16. awk '{ FS="|" } { print $2}' | sed s/" *"//




    anyone help me with the equivalent command in power shell

    ReplyDelete
  17. type fileName | foreach { ($_ -split "|")[1] -replace " *" "" }

    I have not tested this. Use at your own peril.

    ReplyDelete
  18. The point that UN*X people are missing about Powershell is that PS deals with objects. Strings are one type of object. Other examples of objects are :process, user, file etc., In fact It is used to orgainse VM's, databases and what-have-you.

    ReplyDelete
  19. How can PS split rows that have a column of concatenated (delimited) values into multiple rows for each value? e.g. if 'Tony Passaquale' had multiple values in a field, creating multiple rows for Tony Passaquale, one for each value?

    ReplyDelete
  20. **** Just a word for the *NIX Administrators of the world ****

    Guys, even though as per Dennis Ritchie, UNIX is simple and only geniuses can understand it's simplicity.. IT and computing are all about generosity and getting as close to a person as you can. The world is made up of billions or lame people like me who aren't geniuses and what's so great about UNIX if people don't understand it too easily. The words 'Open Source' (or free to everyone and available to everyone) do not make sense with the *NIX OSes as does the concept. So, get better and get close to people.

    Just an FYI, Microsoft does it the right way using Windows.

    Stop fighting!!

    ReplyDelete
  21. I pull this request from: http://stackoverflow.com/questions/9841036/awk-count-number-of-fields-and-add-accordingly
    but would like the Get-Content equivalent to AWK command below.

    In my database schema I have 33 fields, however some of the lines in my csv file has less than 33 fields therefore when I import the file, it complains about miss match.
    using awk how can I go about adding NULL fields spreader by in order to full up 33 rows


    This the unix AWK command to add empty fields at the end of line:
    awk -F'|' -v OFS='|' '{for(i=NF+1;i<=33;i++)$i=""}1' file.csv

    Can you give me the Get-Content equivalent please?

    your help is highly appreciated

    ReplyDelete
    Replies
    1. This works in PS 3.0 at least:

      PS>Import-Csv file.csv | Export-Csv file2.csv

      Note: This will change the CSV format to Microsoft's standard, "Quote Everything", style...

      My test file:

      ID,Name,Address,City,State,ZIP,Phone
      1,"Tom Cat","123 Some St","ACity","ST","12345","(888)555-1212"
      2,"Tina Smith","345 Your St"
      3,"Chris Johnson","5678 My St","TheCity","ST"

      The Import-Csv cmdlet reads this with no problem, inserting null fields to fill out the rows. Export-Csv then writes a new file with the correct number of fields in each row.

      Delete
  22. @ecnerwal72

    This works in PS 3.0 at least:

    PS>Import-Csv file.csv | Export-Csv file2.csv

    Note: This will change the CSV format to Microsoft's standard, "Quote Everything", style...

    ReplyDelete
    Replies
    1. This works in PS 3.0 at least:

      PS>Import-Csv file.csv | Export-Csv file2.csv

      Note: This will change the CSV format to Microsoft's standard, "Quote Everything", style...

      My test file:

      ID,Name,Address,City,State,ZIP,Phone
      1,"Tom Cat","123 Some St","ACity","ST","12345","(888)555-1212"
      2,"Tina Smith","345 Your St"
      3,"Chris Johnson","5678 My St","TheCity","ST"

      The Import-Csv cmdlet reads this with no problem, inserting null fields to fill out the rows. Export-Csv then writes a new file with the correct number of fields in each row.

      Delete
  23. Sorry I left off some information...

    My test file:

    ID,Name,Address,City,State,ZIP,Phone
    1,"Tom Cat","123 Some St","ACity","ST","12345","(888)555-1212"
    2,"Tina Smith","345 Your St"
    3,"Chris Johnson","5678 My St","TheCity","ST"

    The Import-Csv cmdlet reads this with no problem, inserting null fields to fill out the rows. Export-Csv then writes a new file with the correct number of fields in each row.

    ReplyDelete