Skip to main content

Extracting a single field from a very long json file

Trivial task, but still can save some time to somebody, so I am gladly sharing.
A friend of a mine has an huge json file, and she had to extract all unique value for a field called "title".  The file was too big to be processed from a notepad or an excel.

With those comands, I was able to obtain a clean, unique and sorted list list of all the content.


grep -o -E '"title":"[^"]+",' tmp.json | sort |uniq > output.txt

sed -i 's/"title":"//g' output.txt

sed -i 's/",//g' output.txt


Comments

Unknown said…
You can do the same with just one command:
sed -n "s/^.*\"title\":\"\([^\"]*\)\",.*/\1/p" tmp.json | uniq | sort > output.txt

Popular posts from this blog

Multiple controllers with Spring Boot

Remember, when you want to have multiple controllers with Spring Boot, you should always name them differently in the annotation, otherwise they will not work So these two together will NOT work (or just one of them will work) These two instead WILL work.

The "Code-Rich" Organization: How Automatic Code Generation Will Revolutionize Everything

  I put on paper a couple of thoughts about AI and Large Language Models (LLM) for automatic code generation in the development process. My thesis is that the current organizations are mostly “code-thin”, where only the basic Business Processes are modeled through software, while in the future we will have “code-rich" organizations boosted through the abundant and cheap AI and LLM Generated software code. These “code-rich” organizations, where every possible business process is software based - will outperform the usual "code-thin" organizations, where software is “hand-made” and expensive. Feedback would be very, very appreciated. https://www.linkedin.com/pulse/code-rich-organization-how-automatic-code-generation-de-collibus