<< ASP.NET Core 5.0.2 Hosting - HostForLIFE :: How To Use Postman With ASP.NET Core Web API Testing? | ASP.NET Core 5.0.2 Hosting - HostForLIFE :: Upload And Download Multiple Files Using Web API >>

ASP.NET Core 5.0.2 Hosting - HostForLIFE :: How to Load 5 Million Records from CSV and Process Them In Under Three Seconds?

March 15, 2021 06:54 by author

Peter

We have a scenario where we have to load 5 million records under 2 seconds from a CSV file using C#, then process it and return some processed records based on certain criteria too. This sounds like loading and processing may take more time but only if we do it in the wrong way.

This is what we will solve in the below code.

Let's dive in and do some processing ourselves. First download a file from the URL below, it is a sample Sales records CSV file with 5 million records.
http://eforexcel.com/wp/wp-content/uploads/2020/09/5m-Sales-Records.7z

Now we will do is load this CSV in our program and get the top ten sales records with maximum revenue in order.
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();
    //LOAD
    //Created a temporary dataset to hold the records
    List < Tuple < string, string, string >> listA = new List < Tuple < string, string, string >> ();
    using(var reader = new StreamReader(@ "C:\Users\Lenovo\Desktop\5m Sales Records.csv")) {
        while (!reader.EndOfStream) {
            var line = reader.ReadLine();
            var values = line.Split(',');
            listA.Add(new Tuple < string, string, string > (values[0], values[1], values[11]));
        }
    }
    //PROCESS
    var top10HigestRevenueSalesRecords = from salesrec in listA.Skip(0).Take(10)
    orderby salesrec.Item3
    select salesrec;
    //PRINT
    foreach(var item in top10HigestRevenueSalesRecords) {
        Console.WriteLine($ "{item.Item1} - {item.Item2} - {item.Item3}");
    }
    stopwatch.Stop();
    Console.WriteLine($ "Time ellapsed {stopwatch.ElapsedMilliseconds/1000}");
    Console.ReadLine();

Now all three main steps in the process Load, Process, and Print were done in under 2 seconds.

Adding Parallel. For or Foreach does not either work much for this scenario, in fact, it will slow it down a bit with again a difference in nanoseconds which is not to be considered much.

We can improve it futher down to one second by using some custom Nuget packages that decrease the downtime of loading large csv files.
    using LumenWorks.Framework.IO.Csv;
    using(CsvReader csv = new CsvReader(new StreamReader(@ "C:\Users\Lenovo\Desktop\5m Sales Records.csv"), true)) {
        while (csv.ReadNextRecord()) {
            listA.Add(new Tuple < string, string, string > (csv[0], csv[1], csv[11]));
        }
    }

Happy coding fellows.

Tags: asp.net core 5.0.2 hosting
Categories:
Actions: E-mail | Kick it! | Permalink | comment

Comments (0) | RSS comment feed

Comment RSS

ASP.NET Core 1.1 Hosting - HostForLIFE.eu :: Implement .NET Core CSV WriterImplement .NET Core CSV WriterImplement .NET Core CSV WriterImplement .NET Core CSV WriterASP.NET Core 5.0.2 Hosting - HostForLIFE :: InProcess Hosting Model In ASP.NET CoreIn this article, we will learn about one of the AspNetCoreHostingModel i.e. InProcess Hosting Model....European ASP.NET Core Hosting :: InProcess and OutOfProcess Hosting Model In ASP.NET CoreInProcess and OutOfProcess Hosting Model In ASP.NET Core

European ASP.NET 4.5 Hosting BLOG

ASP.NET Core 5.0.2 Hosting - HostForLIFE :: How to Load 5 Million Records from CSV and Process Them In Under Three Seconds?

About HostForLIFE

Month List

Other Important BLOGs

Featured on

European ASP.NET 4.5 Hosting BLOG

ASP.NET Core 5.0.2 Hosting - HostForLIFE :: How to Load 5 Million Records from CSV and Process Them In Under Three Seconds?

About HostForLIFE

Month List

Other Important BLOGs

Featured on

Tag cloud